Prediction and Adaptation

Gaining a better understanding of the nature of data access behavior can help us build smarter, more manageable, more efficient, and higher-performance storage systems.

OVERVIEW

Data is requested by users from the programs they run. Neither the users nor the programs are random in their behavior, as they are driven by structured goals and code respectively. While there is a degree of randomness, it is often true that requests for data can be predicted. This is true at different levels of the storage hierarchy, from entire storage centers, down to individual systems and files, and further down to the level of individual data blocks on a storage device. Efficiently identifying this predictability and exploiting it effectively is our goal. We have developed and described some of the simplest and most efficient data access predictors, and are continuing our work exploiting these predictors in a range of ways: from improving data access response, to reducing energy consumption and enabling self-optimizing adaptive systems algorithms.



PREDICTION

Practical prediction techniques can improve performance beyond obvious limits.


We cannot move data faster than the speed of light, and are therefore at the mercy of a minimum 60msec delay when transferring data from London to Sydney (or halfway around the globe). With the added distance of relaying communication through satellites, we experience the sometimes comical delays in the reaction of television news correspondents reporting from overseas on live television. But it’s possible for the postal service to overcome this apparently unavoidable delay, if the reporter already knows the questions he will be asked (because they were mailed a day or two previously). In other words, predicting future data access requests, and guaranteeing that the data is nearby before it is requested, overcomes access latency. This is the goal of any data management or caching scheme that attempts to move data “nearer” to the consumer, and while most such schemes are reactive in nature, it’s possible to build models of future behavior and proactively act to assure data is more conveniently placed before it is requested.


Effective predictors are more than simply accurate predictors of future behavior, they must be practical to implement and offer useful and timely results. To this end we have designed simple access predictors that incur low overheads (both in terms of computation and storage overheads), and which offer predictions that can be combined to allow the identification of likely future accesses tailored to the application in terms of: prediction confidence, range of foresight, and with minimum intrusion on the demand path.


  1. David Essary, Ahmed Amer, “Avoiding State-Space Explosion of Predictive Metadata with SESH,” Proceedings of the IEEE International Performance, Computing and Communications Conference (IPCCC 2009), December 2009.

  2. David Essary, Ahmed Amer. “Predictive Data Grouping: Defining the bounds of energy and latency reduction through predictive data grouping and replication,” ACM Transactions on Storage,  4(1): pp.1-23, May 2008. (DOI)

  3. Ahmed Amer, Darrell D. E. Long, Jehan-Francois Paris, and Randal C. Burns. “File Access Prediction with Adjustable Accuracy,” Proceedings of the International Performance Computing and Communications Conference (IPCCC 2002), Phoenix, AZ, USA: IEEE, April 2002, pp.131-140.

  4. Ahmed Amer, Darrell D. E. Long, Randal C. Burns. “Group-Based Management of Distributed File Caches,” Proceedings of the 22nd IEEE International Conference on Distributed Computing Systems (ICDCS 2002), Vienna, Austria: IEEE, July 2002, pp.525-534. (DOI, DBLP BibTeX)

  5. Ahmed Amer, Darrell D. E. Long. “Aggregating Caches: A Mechanism for Implicit File Prefetching,” Proceedings of the Ninth International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2001), Cincinnati, OH, USA: IEEE, August 2001, pp.293-301. (DOI, DBLP BibTeX)


In addition to offering improved cache performance, simple predictors were also found to be resilient to the effects of cache filtering. While an intervening cache may make a workload problematic for recency-based caching algorithms, it does little to mask the inherent predictability of the workload.


  1. Ahmed Amer, Alison Luo, Newton Der, Darrell D. E. Long, Alex Pang. "Visualizing Cache Effects on I/O Workload Predictability," Proceedings of the International Performance Computing and Communications Conference (IPCCC 2003), Phoenix, AZ, USA: IEEE, April 2003, pp.417-424. (PDF)

  2. Ahmed Amer, Darrell D. E. Long. “Adverse Filtering Effects and the Resilience of Aggregating Caches,” Proceedings of the Workshop on Caching, Coherence and Consistency (WC3 2001), Sorrento, Italy: ACM, June 2001. (PDF)


ADAPTATION - Dynamic Adaptation and Caching

“A fixed optimal setting can be improved upon when it is no longer fixed.


When an algorithm has optimal parameter settings that differ with changes in the workload, it is possible to exceed an optimal a priori setting of such parameters by using an algorithm that dynamically adjusts the parameter(s) in response to changes (or expected changes) in the workload. Predicting future requests for data allows us to build more effective caches, and exceed the performance of an optimal cache replacement algorithm (since prefetching goes beyond selecting what items to evict from a cache, and considers what items to fetch). The prediction of future access requests can also be used to construct self-optimizing data management algorithms. From our earliest work with ACME, to our latest work on self-tuning caching, the main idea is to consider eviction policies as experts that predict the least useful item in a cache (i.e., predicting what will not be accessed in the near future).


  1. Ganesh Santhanakrishnan, Ahmed Amer, Panos K. Chrysanthis. “Self-Tuning Caching: The Universal Caching Algorithm,” Software Practice and Experience (SPE), 36(11-12): pp.1179-1188, September-October 2006. (DOI)

  2. Ismail Ari, Ahmed Amer, Robert Gramacy, Ethan L. Miller, Scott A. Brandt, Darrell D. E. Long. “ACME: adaptive caching using multiple experts,” Distributed Data & Structures 4 (Proceedings in Informatics 14), pp. 143–158. W. Litwin and G. Lèvy, Eds. Paris, France: Carleton Scientific, pp. 143-158, March 2002. (PDF)

  3. Ismail Ari, Ahmed Amer, Ethan L. Miller, Scott Brandt, Darrell D. E. Long. “Who is more Adaptive? ACME:Adaptive Caching Using Multiple Experts,” Proceedings of the Workshop on Distributed Data & Structures (WDAS 2002), Paris, France: March 2002, pp.143-158. (PDF, DBLP BibTeX)


We have successfully applied similar techniques to web caching, predictor optimization, and power management. In addition to allowing caching algorithms and predictors to adapt to workload changes, we also attempt to simplify systems management by leaving more meaningful parameters to system administrators than are typically offered by specific algorithms.


  1. Jeffrey P. Rybczynski, Darrell D. E. Long, Ahmed Amer. “Adapting Predictions and Workloads for Power Management,Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS 2006), Monterey, CA, USA: IEEE, September 2006, pp.3-12. (DOI)

  2. Ganesh Santhanakrishnan, Ahmed Amer and Panos K. Chrysanthis. “Towards Universal Mobile Caching,” Proceedings of the 4th ACM International Workshop on Data Engineering for Wireless and Mobile Access (MobiDE 2005) (held in conjunction with the ACM SIGMOD 2005), Baltimore, MD, USA: ACM, June 2005, pp.73-80. (DOI, DBLP BibTeX)

  3. Ganesh Santhanakrishnan, Qinglan Li, Jonathan Beaver, Panos Chrysanthis, Ahmed Amer, Alexandros Labrinidis. “Multi-criteria Routing in pervasive environment with sensors.” Proceedings of the IEEE International Conference on Pervasive Services (ICPS 2005), Santorini, Greece: IEEE, July 2005, pp.7-16.

  4. Qinglan Li, Jonathan Beaver, Ahmed Amer, Panos K. Chrysanthis, Alexandros Labrinidis, Ganesh Santhanakrishnan. “Multi-Criteria Routing in Wireless Sensor-Based Pervasive Environments,” Journal of Pervasive Computing and Communications (JPCC), 1(4): pp.313-326, December 2005.

  5. Ganesh Santhanakrishnan, Ahmed Amer, Panos K. Chrysanthis, Dan Li. “GD-GhOST: a goal-oriented self-tuning caching algorithm,” Proceedings of the 2004 ACM Symposium on Applied Computing (SAC 2004), Nicosia, Cyprus: ACM,  March 2004, pp.1141-1145. (DOI)

Adaptation is useful and necessary