Research Statement

 

Petabyte-scale Storage Systems:
Generative Allocation and Intelligent Devices (2005-present)

Storage systems have to keep track of where data is stored. Traditional file systems are using allocation tables to look up the location of a file. In very large scale, storage clusters managing this bookkeeping information becomes the bottleneck for systems performance and scalability. Generative allocation calculates the location of data instead of storing and managing large allocation tables. This allows much more efficient communication of system state and enables novel distributed architectures where, for example, load balancing and failure recovery are the result of collaborating, intelligent storage devices, instead of a centralized system component.

In spring 2005 I started a project to create a working file system prototype based on existing research in object-based file systems at UCSC [17, 27, 28, 26, 19, 18, 7, 6] with the goal to identify and research the design of novel system components emerging from the integration effort. I called the emergent prototype Ceph [20] which was developed by Sage Weil, a graduate student that I mentored. The key enabling technology is the allocation function CRUSH [21] that supports the specification of placement restrictions (for example to account for failure domains when placing replicas) and provides alternative placement when a device becomes overloaded or unavailable. The integration of this method in Ceph showed that the compact representation of allocation enables unprecedented scalability, especially in metadata management.

The publication on Ceph [20] in particular has gained high visibility by being presented at OSDI, the top venue in computer science in terms of impact. Ceph continues to draw great interest from governmental research labs (e.g. LLNL, LANL, and SNL), international research organizations (e.g. CERN and the Max Planck Institute for Gravitational Physics), and by companies such as EMC, Netapp, Yahoo, Symantec, and Apple Computers. The principal student Sage Weil has graduated and is continuing work on Ceph as an open source project.

REFERENCES