Next: Asynchronous Writing and Lookup Up: An Enhanced Disk-Caching NFS Previous: Introduction

Design and Architecture

Our primary design goal was to improve performance of the NFS client by reducing the number of remote procedure calls through aggressive caching of lookups, attributes, and data pages. Our other design goals were to: 1) minimize the impact on non-NFS filesystem performance; 2) minimize the increase in kernel size and memory requirements; 3) allow cached data to persist across reboots; 4) not require a separate partition for the cache files; 5) permit the cache files to be stored on an arbitrary filesystem (i.e., to not assume ext2fs); 6) not decrease performance dramatically for any special cases (e.g., not use whole file caching similar to AFS); and 7) allow sufficient customization of the cache parameters at module insertion time. Our implementation achieves all of these goals.

Because our primary interest is performance, the bulk of our implementation is a module that resides in the Linux kernel. The asynchronous writing implementation, the caching of lookup results and attributes (section , on page ), and the caching of read data pages (section , on page ) to local disk are all performed inside that NFS module.

The read caching is implemented at the virtual filesystem's readpage level, so the granularity of the local disk cache is the page size (4 KB). New data pages from an NFS server are written to the local disk cache as they are read. The cached files on local disk may be sparse--pages of files in the cache are filled in as they are requested from the server. A simple in-kernel-memory data structure tracks the valid pages of each locally cached file. When a page is requested, if it has been cached locally, the read is served from the local disk after ensuring that the file has not changed on the server (by checking the modification time and file size).

Note that limitations in the NFS protocol make it wasteful to update the local disk cache on writes, since the next read will notice that the modification time on the server's inode has changed, and we have no way of knowing if we were the only client to have altered that inode in the interim. Thus, our cache file becomes invalid. Instead, we simply mark the cache file as invalid when we write to the remote file it is caching. See section , on page , for more discussion of this issue.

Ideally, locally cached data pages should be available even across reboots as long as the server's file has not changed. To support this feature, an implementation would need to persist the information about which pages are cached. However, this would complicate the implementation and hinder performance since more ``maintenance'' data would need to be written each time a remote file's page is copied to the local cache. Instead, we mark cached files as complete when we have cached all of its pages locally. We then permit only those marked files to be used after a reboot. This reduces the required maintenance upon reads of individual pages, but gives up the ability to cache files that are not read completely. However, executable binaries generally are paged-in dynamically--they are often not read in their entirety. If an NFS partition held mostly executables, our design as described might only be able to retain a small fraction of its cached data pages between reboots.

To combat the negative affects from partially-cached files that only rarely get read in their entirety, our implementation provides a kernel thread to ``fill in'' missing pages of cache files. When network traffic is low, our nfsfillind daemon reads previously unread (and therefore uncached) pages from the server, thus eliminating the holes in a cached file so that we can mark the cache file as complete (see section , on page ).

For ease of development, testing, and debugging, we have used privileged user-level programs where possible to simplify the code that must live in the kernel. Specifically, we have extended the user-level mount utility to understand our cache parameters (see section , on page ) and introduced a user-level daemon to free space in the local disk cache when it fills.

Our user-level NFS cache cleaner daemon is called nccd (see section , on page ). When cache space is exhausted, our basic nccd removes the least recently used cache files, thus freeing space for creating new cache files. The kernel and nccd communicate via two mechanisms: 1) the kernel informs the cleaner when it needs to remove some old files by writing to a distinguished file in the cache directory; and 2) the nccd informs the kernel of changes in the amount of disk space used for each remote via a pseudo-device /dev/nfs-cache-space (e.g., at startup to compute the usage of files already in the cache, and after cleaning to report the amount of space freed). Because it is a user-level program, the policy decisions made by nccd about how to manage the cache space are easy to experiment with and customize.

Next: Asynchronous Writing and Lookup Up: An Enhanced Disk-Caching NFS Previous: Introduction

Greg Badros
1998-04-23