Note that we cannot avoid all server traffic even when reading unchanged files. The stateless nature of the NFS protocol requires us to confirm with the server that the file has not changed since we last read it. However, when reading large files, the majority of the disk traffic is due to many read RPCs, all of which are eliminated.9
All cached file data exists only in disk-based structures. Each remote file from which a page is cached has a corresponding local cache file. The name of the local file is the concatenation of the server name, the server's superblock number, and the inode number of the file being cached. For example a local cache file called holden,2-23 stores page data from the remote file with inode number 23 on the server named ``holden'' in the partition whose superblock is on device 2. These files are all stored in a single-level directory structure (thus precluding disconnected operation, see section , on page ). We exploit ext2fs's ability to efficiently store a sparse file (i.e., one where only a small portion of the total blocks have had any data written to them), and store only those data blocks that the client has actually read (not whole files). Also note that after a read is satisfied out of the local disk cache, we send a setattr to the server to update its last-accessed time (atime) if the prior access was more than 30 seconds ago. Since the disk cache can persists for months or longer, it is important that the access times are accurate.10
Obviously, files may not be read in their entirety (e.g., executables, which are paged-in on demand). Thus, we must maintain in-kernel data structures to track which pages of each inode have been cached to local disk. We use a simple packed binary array representation, and also include the number of total pages, and the number valid, along with some information needed by nfsfillind to finishing reading a file after the NFS inode may have left memory. See Figure for details. When we recognize that a file has had its last page written to the local cache (done in constant time with the count of valid pages, not with the bitmap), we mark the cache file as complete using the u+x mode attribute bit, and can then deallocate the bitmap of valid pages.
In order to support the relationship between an inode for a file on the server and the inode of the file on the local disk that is caching the remote file, we made two significant changes to the kernel's data structures: 1) all inode's now have a pointer to a structure about the inode they are caching; and 2) all inode's can specify a clear_inode_hook to be called when that inode is chosen to be reused (``putting'' an inode to a 0 count does not remove it from memory). See Figure for details.
Because files may change on the server (due to either our machine or another client on the system), we must also invalidate cache entries occasionally. When we notice that the NFS inode that we are caching has a new modification time or file size, we mark our local cache as invalid by turning off its u+x bit (if it was complete) and updating the in-kernel data structures (e.g., removing the bitmap, and resetting the count of valid pages to zero). If the cache file's inode is later cleared from memory without having read sufficiently many pages to justify filling in, the cache file is unlinked, and the space is reused.
Ideally we would be able to update the disk cache for local writes. However, the NFS protocol has no way of letting a host know that it is the only writer to the file.11 After changing only a single byte of a 4MB file that exists in our local disk cache, all the NFS client can subsequently tell when it reads that file again is that the modification time (mtime) has changed. Since we cannot conclusively confirm that it was only our client that affected the change, we must invalidate all of the pages we had cached locally.12