File Transfer

The basic unit of storage in virtually all modern computer systems is the file. A fundamental requirement in networked systems is access to files which are stored on remote computers. There are at least two models which can be used for this:

User's files can be stored on a remote file server, which allows shared access. The files always remain stored on the server's disk, but are accessed as though they are stored locally. The fact of their remote storage is transparent to the user, and the fact that the netork is used is not (necessarily) obvious. This is the way students' home directories are stored on the Bendigo Unix systems.
Files can be shared by copying them from one system to another, performing any desired operations, and (maybe) copying them back. This is called "file transfer", although the thing that gets transferred is a copy of the file, not the actual file - it still exists on the remote system. The networked nature of file transfer is apparent in the commands which are used.

File Transfer - FTP

The FTP protocol has the following features:

Interactive access

The user's view of FTP is a kind of "shell" with a limited command set. Some user commands include:

get <filename>
put <filename>
cd
ls
help

File format specification.

FTP allows the user to specify conversions for (eg) text or image (binary) files and text file character set (ASCII or EBCDIC).

Authentication control

The FTP server requires clients to authenticate themselves by providing a valid username and password before allowing them to perform file transfers.

FTP is documented in RFC 959.

FTP Process Model

An FTP client process establishes a control connection to an FTP server, waiting (listening) at the standard FTP port 21. This connection carries all FTP commands and responses.

When a file is to be transferred, the client process opens a new connection to the server at port 20, the FTP data transfer port. This connection is used to carry the actual data in the file. In Unix-like systems, this is usually achieved by "forking" a child process at each end of the connection, thus:

One interesting aspect of FTP is that the control connection is implemented using a subset of the TELNET protocol, using the TELNET NVT.

Anonymous FTP

Many systems provide a special FTP service whereby the username "anonymous" allows access to a public file area:

redgum 27> ftp munnari.oz.au
Connected to munnari.oz.au.
220 munnari.OZ.AU FTP server (Version 5.91 95/11/08 02:44:16) ready.
Name (munnari.oz.au:pscott): anonymous
331 Guest login ok, send ident as password.
Password: p.scott@latrobe.edu.au
230 Guest login ok, access restrictions apply.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd rfc
250 CWD command successful.
ftp> get rfc959.Z
200 PORT command successful.
150 Opening BINARY mode data connection for rfc959.Z (48587 bytes).
226 Transfer complete.
48587 bytes received in 4.5 seconds (10 Kbytes/s)
ftp> close
221 Goodbye.
ftp> quit

FTP Control Messages

On the previous slide, some interesting aspects of FTP can be seen.

The messages presented to the user look like:
```
250 CWD command successful.
```
The three digit number is interpreted by the software. The remainder of the line is for the interest of the human. This format is used for all messages between the client and server processes. The three digits each have individual and contextual significance - can you see any patterns?
The PORT command reports that a new (random) TCP port number has been obtained for use in a data connection. This connection is then opened to transfer the file.
Other information (with no 3 digit identifier) is reported by the client process.
Note that binary (image) mode is set by default. Why?

File Sharing - NFS

The Network File System (NFS) was developed by Sun MIcrosystems, and is the defacto standard for transparent filesystem sharing in Unix and some other environments. NFS is embedded in the operating system of a client, thus:

Remote Procedure Call

NFS is built on two tools: a Remote Procedure Call (RPC) mechanism and a general purpose eXternal Data Representation (XDR) definition and library. These tools are also the basis of the OSF Distributed Computing Environment (DCE) specification.

RPC allows a programmer to write distributed systems while not being aware of the actual code and protocols which are used to implement them over the network. A remote procedure call (or function call for those sad souls who have only programmed in C) has semantics exactly the same as a local call, making such programming relatively straightfoward.

The XDR library code transparently performs a mapping between local data representations and a standard format which actually crosses the network in the RPC call and return messages. Possibly the best example of the need for this is the byte ordering problem which exists between the Intel/DEC world (little endian) and everyone else in the universe (big endian).

The utilities which build an RPC application automatically call XDR before transferring any data.

This lecture is also available in PostScript format. The tutorial for this lecture is Tutorial #04.

Phil Scott