With the rapid increase of processor speeds, file system latency is a critical issue in computer system performance . Standard Least Recently Used (LRU) based caching techniques offer some assistance, but by ignoring any relationships that exist between file system events, they fail to make full use of the available information.
We will show that many of the events in a file system are closely related. For example, when a user executes the program make, this will very often result in accesses to the files cc, as, and ld. Additionally, if we note an access to the files make and makefile then another sequence of accesses: program.c, program.h, stdio.h... is likely to occur. As a result, the file system behaves predictably. So a predictive caching algorithm that tracks file system events and notes predictive sequences could exploit such sequences by preloading data before it is required. This would increase the cache hit ratio and reduce the file system latency.
Just as in data compression where a model drives a coder, our predictive cache can be divided into two portions: the model that tracks the sequences of previous events and the selector that uses this information to determine likely future events and prefetch their data. Our model tracks previous file system events through a finite multi-order context modeling technique adapted from the data compression technique Prediction by Partial Match (PPM) . This model uses a trie  to store sequences of previous file system events and the number of times they have occurred. Our selector examines the most recently seen sequences and the counts of the events that have followed them to determine likely next events. Using these predictions, we augment an LRU cache by prefetching data that are likely to be accessed. The result is a predictive caching algorithm that in our simulations averaged hit ratios better than and LRU cache that is 20 times its size.
The rest of this article is organized as follows: §2 details the method used to model events and select events to prefetch, §3 presents our simulations and results, §4 describes related work, §5 discusses future work, and §6 concludes the paper.
Tom M. Kroeger Tue Nov 14 00:27:35 PST 1995