The TCP Connection (Revisited)

In the last three lectures, discussion of how an application protocol operates invariably starts with the phrase:

...the client process opens a TCP connection to a server, at port number xx...

In this lecture, we examine what this really means from the perspective of a programmer writing a TCP client or (more complex) a server.

Warning: In this lecture, some things will be assumed:

Students are vaguely competent "C" programmers. For those who have not yet really started programming seriously, this lecture probably won't make much sense. Come back and look at it again later in the semester.
Students have some idea of the low-level Unix file I/O operations (open, read, write, close). The standard (socket) interface to TCP was intentionally designed to resemble Unix file I/O. Note that most programming subjects do not actually cover these topics, but instead use higher-level abstractions (printf in C, cin and cout in C++)

The Socket Abstraction

The socket was introduced in BSD Unix as a way of extending the Unix file I/O model to handle network communications. Sockets are implemented in all current versions of Unix as system calls; that is, they are part of the operating system kernel. In other systems, they are often implemented as libraries (eg, winsock for PCs, GUSI for the Macintosh).

Socket operations resemble file operations in many respects:

A socket is created when needed, and the operating system provides a low-valued integer (a socket descriptor, exactly analogous to a file descriptor) which is used to refer to it later.
Data transfer operations on sockets work just like read and write operations on files.
A socket is closed, just like a file, when communications is finished.

Socket Creation

The system call to create a socket looks like:

result = socket(pf, type, protocol);

The pf argument specifies the protocol family of the socket. For TCP (Internet) sockets this is PF_INET.

The type argument specifies the type of communication desired. For a TCP reliable stream connection, it has value SOCK_STREAM.

The third (protocol) argument can, on some systems, further define the type of socket. For TCP, this is always (?) zero.

Once a socket has been created, the programmer can optionally call the bind system call to define a local address for it. This is normally only used in servers, not clients. - see later today.

bind(socket, localaddr, addrlen);

The socket argument is the small integer returned by the socket system call.

The localaddr argument is a pointer to a sockaddr structure containing the desired port number, and the IP address (see later lecture) of the local host. Addrlen is the length of the structure in bytes.

Socket Addresses and Connections

The sockaddr structure contains the following fields for a TCP connection:

The same structure is used by a client to establish a connection to a server, thus:

connect(socket, destaddr, addrlen);

In this case, the fields have the following meaning:

The IP address field is the 4-byte network address of the remote host where the desired server is running. This can be obtained using the gethostbyname() library function, which maps a domain name to an IP address.
The protocol port field is the 2-byte TCP port number on the remote host where the server process is known to be waiting for connections. See also getservbyname()

Socket Communications

Once a connection is established, a process uses standard file I/O system calls to perform communication, thus:

read(socket, buffer, length);

and

write(socket, buffer, length);

The first argument is the integer socket identifier. The second is a pointer to an area of memory containing the data to be read or written, to or from the connected socket. The length argument specifies, for a write, the number of bytes to be written to the socket. For a read, it specifies the maximum number of bytes which may be returned.

Note that there are many other operations (outside the scope of this subject) which can be performed on a TCP connected socket. Some of these include:

readv, writev, send, recv
getpeername, getsockname
getsockopt, setsockopt

Server Socket Operations

Until now, the socket operations we have seen relate to client processes (except bind). Several further operations are used by server processes to manage the fact that a server waits for a connection to be established.

listen(socket, qlength);

This call defines the number of incoming connections which the operating system will queue until the server process is able to receive the connection.

newsock = accept(sock, sockaddr, addrlen);

This is the call which defines "waiting for connection". It does not return until a connection is established, at which time it "fills in" the sockaddr structure with the details of the client from whom the connection was accepted. It also creates a new socket which the server can use to perform communications with the client, either consecutively or concurrently.

The new socket is bound to the same port as the original one used in the argument list to accept, allowing the server to continue to accept new connections at the same port number.

How this works is a very subtle point...

This lecture is also available in PostScript format. The tutorial for this lecture is Tutorial #06.

Phil Scott