On the way to the Internet with the Geneve and TI ================================================= by Michael Zapf Part I. The Internet --------------------- 1. Introduction --------------- There are computer networks for quite some decades now. Computer scientists soon realized the advantages of connecting computers to enhance calculations, distribute data, share expensive hardware between different participants and so on. At first there were not many services; you could at most send some messages, but gradually networks became very comfortable. Today networks often consist of a server that is equipped with many resources, e.g. large hard drives and fast processors, and many clients that are computers on their own but use the offered services from the server by loading files from it or dispatching difficult jobs to it. Computers are connected to each other in a network, and networks may also be connected to each other. This forms what is called an internet. As is often the case, military requirements are the starting point of many inventions. The U.S. Department of Defence instructed scientists to develop an internet on the American territory that is fail-save in case of a destruction of participating nodes. That means that there must not be a center where all the messages have to pass through; instead, the net should be able to re-route the data when connections become inaccessible. This led to the development of the ARPAnet. Further need of scientific know-how and a increasingly relaxing military situation in the world allowed more and more scientific sites to join the net that was now called the "Internet". The development gained speed at a dramatic rate: Protocols were defined for different services in the Internet; computers that were miles away could be used as if you were working with it directly. And then even the border was crossed, and the Internet started spreading around the world. For many years the Internet was mainly used for message exchange. Because of its analogy to the real world this exchange was called e-mail (electronic mail). Even today, e-mail is one of the most popular services of the Internet; what could be easier than to type in a few words, execute a send command; and only some minutes later a reply arrives - although your peer sits in an office on the other side of the world. And unlike telephoning, you need not make sure you can actually reach him - your message will be presented to him as soon as he takes a look in his mailbox. Another interesting institution is the USENET that was soon integrated into the Internet. The USENET consists of a collection of so-called newsgroups; today you will find many thousands of them. Messages sent to a newsgroup are routed to a special computer (also called news server) that stores them and distributes them to other news servers. Users can subscribe to these newsgroups at a news server of their choice and download all messages of the newsgroups they are interested in. This allows a lot of interesting discussions on every possible subject, about the TI-99/4A in comp.sys.ti, for instance. In order to enhance file transfer on the Internet, the File Transfer Protocol was specified. Now it was possible to store files at well-known locations in the net and retrieve them when necessary. While the development of distributed file access and remote execution was more interesting for subsections of the Internet (subnets), the global data exchange started to grow exponentially with the invention of the World Wide Web (WWW). In fact, it is still based on the Internet and uses a new protocol "on top", the HyperText Transfer Protocol (which you often see as "http" at the left side of WWW addresses). This allows to define files that appear as hypertext (text with links to other files) when they are downloaded to the user's computer. A special language (HyperText Markup Language, HTML) is used to compose these documents. Furthermore, multimedial elements are considered so that images, sounds, video clips and so on can be sent along with the text. With these features the Internet became interesting for virtually everyone. More and more companies try to advertise their products; newspapers and magazines offer excerpts of their printed products; there is entertainment, knowledge, connectivity, home banking, electronic commerce and much more. It is clear by now that this enhancement becomes a true threat to the effectiveness of the Internet for everyone, including its fathers, the scientists. The more participants start sending around their data, the slower the whole net becomes because the bandwidth (amount of data that can pass a network connection in a specific time) is limited. Even worse, the number of Internet addresses will be exhausted in the next few years so that a new addressing scheme had to be already defined that will replace the current one. 2. The infrastructure --------------------- If someone lays a bomb in the building where your favorite BBS is located, what happens? It will take quite some time before you can get online again - in case the BBS is ever restored some day. Not so with the Internet - if a server crashes, it will take only a short time for the adjacent net nodes to realize this and to revise their respective routing decisions. (The pratice shows that this does not always work very reliably - but at least it is possible.) This implies that participating hosts (network nodes) have many more things to do than simply to receive or send texts. And since there are so many applications that want to utilize the network functionality, the software must be very well designed to be usable in very different situations without constraining future developments. A good real-world example is the situation where the bosses of two companies want to arrange a meeting. Each one has a secretary who takes the messages from him or passes received messages to him, in this case there is even a clerk that delivers the messages inside the company. The secretary herself is free to choose a transmission medium to her peer at the other company; her boss does not care. She, by herself, does not care about the job the telecommunication service has to perform to transmit the fax that she decided to use. The telecom service, on the other hand, is not interested in the message itself, but only to deliver it as requested. Her peer notices her fax device throwing out a sheet; she takes it, checks it briefly to see if it was correctly transmitted but is not interested in the content. She just looks at the recipient and drops it in the appropriate box. Another clerk comes by, fetches all the papers in this box, and brings them to the boss. The advantage is that everyone just does a small job and is soon ready to continue with any other work. If somebody seems to work unsatisfactorily, he can easily be replaced. This seemed to be a model for the realization of the global Internet. The most successful strategy proved to be a paradigm that states that the network software should be organized in layers where only layers of the same depth understand each other. Each layer receives outgoing data from its next upper layer, modifies them and hands them over to the next lower layer. Incoming data is at first processed by a lower layer before it reaches the next one. This restriction that data can only be passed from one layer to the next one generates the impression of a stack that must at first be worked down, then be rebuilt. Therefore, we also use the term "protocol stack". A protocol is a template for the communication between different peers; beside the real data, it includes information for the recipients that have to process the data. In real life, people normally say "hello" to each other before they start a communication the first time; or, if they don't have visual contact, they call each other by name before. The layers are named by their functionality and do not prescribe a special protocol: Layer 4: Transport Layer Layer 3: Network Layer Layer 2: Data link layer Layer 1: Physical layer The lower the number, the closer to the phone line or network cable the layer can be found. Applications are set on top of this stack and communicate only with layer 4. Each layer adds its own header to the outgoing application data. In detail: Physical Layer: This layer is concerned with the transmission of the bits, the specification of the electrical values, the hardware (plugs), the transmission rate. It is specific to the kind of connection you are using; for serial transmission, it is the RS-232 specification. Outgoing byte strings from layer 2 are converted into bit strings; incoming data bits are converted to byte strings before they are sent to layer 2. Data Link Layer: The bytes of layer 1 are grouped in so-called frames of special length; a checksum is calculated that ensures a correct transmission. In case of an Ethernet where several hosts are connected to one wire, the header contains the network card addresses of the sender and receiver. This is of course not necessary with point-to-point protocols such as PPP or SLIP that are used among two hosts that use a serial connection (e.g. a modem). Flag bytes are used to decide whether the incoming data is to be passed up or control data for this layer. Network Layer: The data we got from layer 2 has been checked, now we need to see if we are the true recipient. This can again be found in the header, and it need not refer to the same host that can be found in the layer 2 header because layer 2 only knows about our local network but nothing about the world outside. If we are the final recipient, the data is passed up. Otherwise it continues its journey, and it is the task of this layer to decide where to forward the data. An example of a protocol is IP, the Internet protocol. Transport Layer: The network layer can only work with data strings of limited length, also called packets. This means that longer data strings are broken in suitable pieces and then sent to the network layer. In the other direction, the situation is more complicated. Nothing guarantees that the incoming packets are complete and in the right ordering. The transport layer cares for the completeness of the transmission; if we order 10 kilobytes, it will try to deliver them, regardless of the packet size prescribed by the lower levels. An example of a layer 4 protocol is TCP, the Transmission Control Protocol. You can see that this structure implies a lot of overhead on the transmitted data: Each layer (except layer 1 that will not be of further interest) adds its special header that enables the corresponding layer on the recipient's host to correctly process the data. From the upper layers to the lower layers, the amount of transmitted data increases; in the opposite direction, the amount decreases with every header being stripped away. To give you an example: In an Ethernet network which uses TCP/IP/IEEE-802.3, we have 18 bytes for data link, 20 bytes for IP, 20 bytes for TCP and then the payload bytes. One frame is normally 1500 bytes long. The inestimable advantage of this layer strategy is that each layer 1. can be replaced without influencing the others 2. can rely on a guaranteed service of the adjacent layers 1. means that if we decide to use Ethernet instead of a serial connection, it is only a matter of the data link layer, but the functionality of the network and transport layers still remain the same. 2. The higher layers have no idea what happens to their data in lower layers, nor what the meaning for higher layers might be. Our application (to be found in a layer greater than four) does not have to bother about the fragmentation of the data, checksums, or the correct ordering. It simply expects the transport layer to deliver exactly the amount of data that it wants and to do the right thing to the data it sends to the transport layer. On the other hand, the transport layer does not know what the meaning of the data is. LANs (Local Area Networks) are often composed in a simpler way, so you might ask why there is such a problem with the Internet. The reason is clear: The inventors of this protocol stack were wise enough not to require a special computer system that can take part. Since a world-wide system is difficult to change, the smaller the components are, the quicker they are replaced. And even if we have completely new transmission media or processor types, the Internet will continue to work. 3. Overview on the TCP/IP Protocol Stack ---------------------------------------- After the last section you will now be able to figure out what is meant by this term. The TCP/IP stack is the main protocol stack in the Internet; the data link layer is freely selectable, e.g. Ethernet or serial line (PPP, SLIP), the network layer is controlled by the Internet Protocol, the transport layer uses the Transmission Control Protocol. There are of course other important protocols that are, however, only used for special services, but they are nonetheless important and will be described later. I will only describe those protocols that are of major interest for us, and so I will not explain any further Ethernet issues. While I wrote this chapter I noticed that it's becoming longer and longer so that I will at first give an overview how the different protocols work without getting too far into the details. 3.0 Request for Comment: RFC ---------------------------- A strange name for a set of specifications, isn't it? Since the beginning of the Internet (in those days still the ARPAnet), the documentations for the various protocols and utilities, proposals (even jokes on April Fool's Day) were collected under this label at special Internet sites. Everyone who wants to find precise informations about a special subject must take a look in this list that comprises more than 2000 entries at the beginning of 1997. Many of these RFCs are updates to former ones which are obsolete. There are many ways how to get a copy of an RFC: Using FTP: The RFCs can be found in the "InterNIC Directory and Database Services" server at 'ds.internic.net'. Change to the directory 'rfc' and you will find the RFCs as text or postscript files. (Note to German users: You can use the server at 'nic2.nic.de') Using e-mail: It is possible to 'order' an RFC by e-mail. Just send the following message (nothing more; no subject) to 'mailserv@ds.internic.net': document-by-name rfcXXXX and replace XXXX by the corresponding RFC number. You can request more than one RFC by using 'document-by-name rfcXXXX, rfcYYYY' or separate lines. You should also get the index by typing document-by-name rfc-index I will provide the latest RFC number with each following subsection. I strongly encourage you to get the corresponding RFC because the informations that you can read below cannot cover all necessary aspects. 3.1 Point-to-Point Protocol (PPP) --- RFC 1661 ---------------------------------------------------- I'll start with the Point-to-Point Protocol because it's more widely used for modem connection than SLIP (serial line Internet Protocol, RFC 1055). Receiving (data flow from lower to higher numbered layers): Suppose our interface card has received a stream of bytes (in fact, out layer 1 program has received them) and sends it to the data link layer which we want to use PPP. The tasks for PPP are - group the bytes to maximum length strings (frames, approx. 1500 bytes) - check the consistency of the transmitted data by checking the CRC value - demultiplex the data for the different upper-layer services - negotiate transmission options with the other end Of every frame, the first five bytes are kept by PPP as well as the last three ones. The remaining bytes are passed on to the service that is determined by the fourth and fifth byte. Sending (data flow from higher to lower numbered layers): In the opposite direction, some upper layer passed data to this layer. What must be done is - calculate the CRC value - write the frame header before the data and the CRC and end byte after them - put the whole frame to the interface driver as soon as possible. As the data link layer driver is fixed to one interface, it is often considered to be the interface driver itself. If there are more interfaces, each one has its separate data link layer driver. 3.2 Internet Protocol (IP) version 4 --- RFC 791 ------------------------------------------------- Receiving: The data we received comes from layer 2 and is considered to be IP data. What must be done now is at first to check if this host is in fact the recipient. It is possible that our host is to forward the data because it has two connections to different networks (this is called a gateway). If this is the case, the next receiver is determined by using a special directory, called routing table. This table also tells which interface to use in order to reach a destination, so in case of PPP, the data is just put to the appropriate data link layer driver. If our host is indeed the recipient, the IP layer looks more closely at the data. Again there is a header (20 bytes) that describes among other things the - length of the packet - packet ID and offset - type of transmitted data (TCP, ICMP, UDP, ...) - source and destination IP address The IP layer strips off the header and sends the remainder to the selected service (according to the type). Sending: The data that came from an upper level is cut in suitable pieces (fragments) for the lower layer. As I already said, IP uses a so-called routing table to find out where to send the packets. A typical entry in the table contains the IP address of the destination host A, the IP address of a gateway B, and the interface name I. This tells IP: To send the packets to A, the driver of interface I needs to send it to B which will forward the data. If our host is a dead end with a PPP connection, there should be only one entry, namely the other side of the PPP connection as the gateway for 'default' delivery, which means any host. IP addresses are composed of four bytes to define a location in the Internet. The value is written as four decimal numbers, separated by dots: For example, one of the addresses of the InterNIC server that is mentioned above is '198.49.45.10'. The name 'ds.internic.net' is another way of addressing, but it must at first be translated to these numbers by the Domain Name Service (DNS). Only the numbers can be used in IP datagrams. The numbers are specifically structured, but I won't explain that in this overview. 3.3.1 Transmission Control Protocol (TCP) --- RFC 793 ------------------------------------------------------- The Internet Protocol is said to be connectionless and unreliable. This does not imply that it is badly working; it simply means that IP does neither care of the ordering of the transmitted packets, nor that all packets have arrived. As this is not acceptable for real applications like FTP where we would rather have files without ugly holes and not shuffled up, another protocol is used to guarantee this. TCP adds another 20 bytes as header, and when this one is stripped away, we eventually get our application data. In addition, several programs may require network access simultaneously. Even FTP needs two connections; one for the data, another for control bytes. In order to send the data to the correct application, the concept of 'ports' was introduced. Each application allocates as many ports as it wants, and if they are granted to it, it can start its communication. Since TCP is a bidirectional service, both sender and recipient need to define ports for their communication. The ports are just an operating system construct but no physical devices. Another important aspect of TCP is flow-control. This is achieved by using the 'sliding window' strategy: The receiver continously informs the sender about the size of its 'window'. If the receiver does not manage to process the data fast enough, the window 'closes', and when it is shut, the sender cannot proceed with the transmission. (Only data classified as 'urgent' can still be sent.) While the receiver processes the data, it lets the window slide open again. After the connection establishment took place where the participants exchanged their respective sequence numbers, each one is free to send data to the other side. An explicit termination procedure is required to close the connection. Receiving: The sequence number of the data from the IP layer tells TCP where to put the segment it has received into the buffer. If no segment is actually missing up to now, TCP sends an acknowledgement to the other side. Unless the application gets the data from the buffer, TCP decreases the window size. If there is a hole in the buffer, TCP continues to send acknowledgements for the end of the contiguous block from the start of the buffer. The sender, not receiving any acks for the latest segments, tries to retransmit the segment that seems to be missing. When the hole is closed, an ack of the whole block can be sent (the last segments of which could have been there quite long until the missing segments were filled in). This situation is more often encountered than you would possibly imagine. Especially when the connection is poor, the lower layers could have discarded some data so that some segments could not be reconstructed. The effect of IP's silently discarding packets with checksum errors is that the corresponding segment is not acknowledged. So the connectionless and unreliable character of IP is effectively worked around by the TCP protocol. Sending: During the connection establishment a maximum segment size is negotiated between the two participants. The data from the application is split into segments of this size, and the TCP header is written before it. If the last window size of the receiver is larger than the segment size, the segment is sent to the IP layer. 3.3.2 User Datagram Protocol (UDP) --- RFC 768 ------------------------------------------------ There are situations when the full-blown TCP machinery is not necessary; for example, when small packets shall be sent, when we are not interested that every packet does reach the destination or when the flow control is performed by the application itself. UDP is a very simple protocol (the RFC is only three pages long; TCP's is 85 pages). The IP packets (datagrams) are equipped with the already described source and destination ports; there is no connection establishment and no automated acknowledgement. If this is desired, it is up to the application to implement it. Although UDP seems to be of rare use, it is needed by the Domain Name Service (DNS) that translates textual Internet references like 'ds.internic.net' to four-byte IP addresses. 3.3.3 Internet Control Message Protocol (ICMP) --- RFC 792 (v4) ---------------------------------------------------------------- This is yet another important protocol that must be present in every TCP/IP implementation. By ICMP, hosts are transmitting messages that are of major importance for the current or future connections. ICMP messages are normally 4 bytes, followed by message-specific content bytes. The types of messages can be - echo request/reply - destination unreachable - time exceeded (measured by hop count) - redirect - source quench - router solicitation/advertisement - parameter error - timestamp request/reply - information request/reply - address mask request/reply The first three ones are used very often. The echo request from any host on the net must be answered by an echo reply; some nets consider hosts that do not reply as crashed and cut dial-up connections. 4. The FTP application --- RFC 959 ------------------------------------ Now I want to show you one very important application that works on the TCP/IP stack: the File Transfer Protocol application. Although the usage of HTTP is growing, FTP is still the major protocol for uploading and downloading files on the Internet. Beside the file transfer capabilities inside a subnet where users log on an FTP server by a password, there is another possibility for everyone to download files, called 'anonymous FTP'. If the system maintainer allows this kind of access, any user can log on by identifying himself as user 'anonymous' or 'ftp' and typing in his e-mail address as password. After that, a special part of the directory tree of the FTP host is available for browsing and downloading. Sometimes there is also a special subdirectory (usually called 'incoming') that is writable so that files can be uploaded; the maintainer should sort the files into appropriate directories. 4.1 A sample session with our FTP host -------------------------------------- At first we should watch an example of an FTP session by logging on the FTP server in my subnet. Whenever there is a it means that the user has to hit the Return key to continue after typing the text before. (some_prompt) ftp www.vsb.cs.uni-frankfurt.de Connected to diamant-atm.vsb.cs.uni-frankfurt.de. 220 www.vsb.cs.uni-frankfurt.de FTP server (Version 1.2.3 Fri Jan 10 12:02:30 MET 1997) ready. Name (www.vsb.cs.uni-frankfurt.de:anyone): anonymous Guest login ok, send your complete e-mail address as password. Password: anyone@some.where.out.there 230 Guest login ok, access restrictions apply. ftp> cd pub/people/mz 250 CWD command successful. ftp> binary 200 Type set to I. ftp> get fract20.xmo 200 PORT command successful. 150 Opening BINARY mode data connection for fract20.xmo (61739 bytes). 226 Transfer complete. local: fract20.xmo remote: fract20.xmo 61739 bytes received in 61.7 seconds (1 Kbytes/s) ftp> bye 221 Goodbye. (some_prompt) _ What we did was to download the file fract20.xmo from the FTP server at 'vsb.cs.uni-frankfurt.de' (it is the same as the WWW server, hence the name). As you see - no sign of segmentation, fragmentation, dropped frames or the like. It seems as if we were doing just an ordinary file copy or a familiar BBS file download, not involving all that fuss about this TCP/IP stack. But be sure, it was involved ever since the first . 4.2 A closer look at the FTP session ------------------------------------ After entering the 'cd' command that changes the directory to the one with the file and using 'binary' that tells the computer that the files to be transferred should not undergo any conversions, we are ready to download the file 'fract20.xmo' by using 'get'. We suppose that everything is set up correctly and that the FTP server has just received the FTP command "RETR fract20.xmo" (this is the actual command that is transmitted as that very string) and take a look at the actions of the different layers. FTP application (141.2.150.16): We have just received the command to send the contents of the file "fract20.xmo". The client should have said on which port it expects the data to arrive, so that we try to send the data to the client "socket" (IP address and port). As we do not (have to) care what the transfer details are, e.g. packet size, we just write the file to its ftp port. As this would be much faster than the network can transport the data, the write operation is blocking which serves as a brake. When there are no more data, the connection is closed. TCP (141.2.150.16): The application (whatever it is) continues to feed data into the layer. If the client told us that it cannot receive more data (closed window), we don't accept more bytes from the application (and let it block). If the client announced that it is able to receive data (open window), the data are at first partitioned in segments of a maximum length (as was negotiated at the start of the connection). Now that we know the receiver's IP address (e.g. 141.2.28.160) and port (e.g. 1048) we construct a header with these informations, segment sequence number and checksum for each segment and just drop these segments to the network layer. IP (141.2.150.16): The segments that arrive from the upper layer get another header before each one that contains our IP address (141.2.150.16) and the one of the recipient (141.2.28.160). But where's this 141.2.28.160? No idea, but our routing table says: Send it to 141.2.29.2, it probably knows more about it. And this one is reachable via interface en2. Ethernet (141.2.150.16): The protocol layer above us sent us some data for 141.2.29.2, we'll at first find out the identification of this Ethernet interface in this network (called "address resolution"). Now that we know it, we encapsulate the data once more and send it to this Ethernet address. Ethernet (141.2.29.2): There are some data for us. We check the frames and pass them up. IP (141.2.29.2): Ah, there are some packets. But - it's not for us, the recipient is 141.2.28.160. Taking a look in our routing table, we find that this one is connected to the PPP connection ppp160. So down again with the packet. PPP (141.2.29.2): The packets we receive must be destined for the one on the other side of my connection. So we encapsulate them again and put them on their way. PPP (141.2.28.160): There are some data for us. Up to the IP layer. IP (141.2.28.160): Check the recipient; OK, it's us, no more forwarding. What protocol must be used? The field in the header says it's TCP, so we strip off the header and pass it on to the TCP layer. TCP (141.2.28.160): There are segments coming up from the network layer that are obviously destined for an application above us. We take the segment number and put it in our buffer at the appropriate place. If it was the segment we expected, we'll send an ACK (acknowledge) of the segment to the server. If not, we just don't acknowledge this incoming segment and rely on the server eventually retransmitting missing segments. The application continues to read the data from our buffer (connected to the indicated port) so that our window is opening again. The other side must be informed of this. FTP application (141.2.28.160): After we sent the command, the data that arrives at the port (which number was transmitted to the FTP server before) is simply stored in a file that normally has the same name as the remote file. The data arrives asynchronously which means that we need to use a blocking read so that no data are lost. When the connection is closed by the server, there are no more data, and the transfer is complete. As you can see, each layer "speaks the same language" as its peer on the other side. And - as is often the case - there is another computer in between (141.2.29.2) that forwards the IP packets. ----------------------------------------------------------------------------- This concludes the first part of my Internet tutorial. The second part will examine implementation issues, especially what is needed to implement a minimal FTP client.