Using Comprehensive Analysis for Performance Debugging in Distributed Storage Systems

Abstract

Achieving performance, reliability, and scalability presents a unique set of challenges for large distributed storage. To identify problem areas, there must be a way for developers to have a comprehensive view of the entire storage system. That is, users must be able to understand both node specific behavior and complex relationships between nodes. We present a distributed file system profiling method that supports such analysis. Our approach is based on combining node-specific metrics into a single cohesive system image. This affords users two views of the storage system: a micro, per-node view, as well as, a macro, multi- node view, allowing both node-specific and complex inter- nodal problems to be debugged. We visualize the storage system by displaying nodes and intuitively animating their metrics and behavior allowing easy analysis of complex problems.

Publication
Proceedings of the 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007)