Because our primary interest is performance, the bulk of our
implementation is a module that resides in the Linux kernel. The
asynchronous writing implementation, the caching of lookup results
and attributes (section , on page
), and the caching of
read data pages (section
, on page
) to local disk are all
performed inside that NFS module.
The read caching is implemented at the virtual filesystem's readpage level, so the granularity of the local disk cache is the page size (4 KB). New data pages from an NFS server are written to the local disk cache as they are read. The cached files on local disk may be sparse--pages of files in the cache are filled in as they are requested from the server. A simple in-kernel-memory data structure tracks the valid pages of each locally cached file. When a page is requested, if it has been cached locally, the read is served from the local disk after ensuring that the file has not changed on the server (by checking the modification time and file size).
Note that limitations in the NFS protocol make it wasteful to update the
local disk cache on writes, since the next read will notice that the
modification time on the server's inode has changed, and we have no
way of knowing if we were the only client to have altered that inode
in the interim. Thus, our cache file becomes invalid. Instead, we
simply mark the cache file as invalid when we write to the remote file
it is caching. See section , on page
, for more discussion
of this issue.
Ideally, locally cached data pages should be available even across reboots as long as the server's file has not changed. To support this feature, an implementation would need to persist the information about which pages are cached. However, this would complicate the implementation and hinder performance since more ``maintenance'' data would need to be written each time a remote file's page is copied to the local cache. Instead, we mark cached files as complete when we have cached all of its pages locally. We then permit only those marked files to be used after a reboot. This reduces the required maintenance upon reads of individual pages, but gives up the ability to cache files that are not read completely. However, executable binaries generally are paged-in dynamically--they are often not read in their entirety. If an NFS partition held mostly executables, our design as described might only be able to retain a small fraction of its cached data pages between reboots.
To combat the negative affects from partially-cached files that only
rarely get read in their entirety, our implementation provides a kernel
thread to ``fill in'' missing pages of cache files. When network
traffic is low, our nfsfillind daemon reads previously unread
(and therefore uncached) pages from the server, thus eliminating the
holes in a cached file so that we can mark the cache file as complete
(see section , on page
).
For ease of development, testing, and debugging, we have used privileged
user-level programs where possible to simplify the code that must live
in the kernel. Specifically, we have extended the user-level
mount utility to understand our cache parameters (see
section , on page
) and introduced a user-level daemon to free
space in the local disk cache when it fills.
Our user-level NFS cache cleaner daemon is called nccd (see
section , on page
). When cache space is exhausted, our basic
nccd removes the least recently used cache files, thus freeing
space for creating new cache files. The kernel and nccd
communicate via two mechanisms: 1) the kernel informs the cleaner when
it needs to remove some old files by writing to a distinguished file in
the cache directory; and 2) the nccd informs the kernel of
changes in the amount of disk space used for each remote via a
pseudo-device /dev/nfs-cache-space (e.g., at startup to compute
the usage of files already in the cache, and after cleaning to report
the amount of space freed). Because it is a user-level program, the
policy decisions made by nccd about how to manage the cache
space are easy to experiment with and customize.