Research Statement

 

Scalable File System Interfaces: Overcoming a 20-year-old Legacy (2007-present)

Storage systems design is currently in a state of crisis, as the exponential increase in storage needs rapidly outgrows the system interfaces that were standardized over 20 years ago. Some of today’s large file systems store 10-100s of petabytes of enterprise-related or scientific data. These file systems contain many thousands of files per single directory or have deep directory trees with large numbers of excessively long paths. Almost all Petabyte-scale file systems use the POSIX file system interface, a standard from 1988 that emerged from a project that began in 1985. This standard was designed in the context of file systems that were orders of magnitude smaller in terms of total size, number of files per directory, and size of directory trees. In particular POSIX does not offer any query mechanism other than accessing the list of a given directory.

I’m involved in a number of projects that aim to overcome the limitation of POSIX by introducing various aspects of a query mechanism: Graffiti is a distributed, user-level file tagging system that can be used with any file system that provides a file system event service and allows controlled sharing of tags across machines and users, and supports tag recommendations [12]. QUASAR is a path-based query language that subsumes the POSIX path name space and supports the specification of complex relationships among files [1]. ViewFS is exploring a search interface using a recently developed information retrieval technique called “faceted search” [9]. Finally, I am working on scalable and robust indexing by extending the metadata server of Ceph [22] to support indexing not only of inodes (the traditional function of file system metadata management) but also of other, arbitrary properties of files.

REFERENCES