Fishbowl Panel

Programmable File Systems: What, Why, How?

Moderator: Brent Welch (Google)

 

The format of this panel is an open fishbowl conversation to encourage participation by the audience. Four to five chairs are arranged in front of the audience and filled with participants initially selected by the moderator except for one chair. Any member of the audience can, at any time, occupy the empty chair and join the discussion. When this happens, an existing member of the fishbowl must voluntarily leave the fishbowl and free a chair. The idea is that the discussion continues with participants frequently entering and leaving the fishbowl and take part in the discussion. [adapted from Wikipedia’s “Fishbowl (Conversation)”].


Topics discussed at this panel are about the definition, motivation, and prospect of Programmable File Systems: What distinguishes a Programmable File System from a regular file systems? Why are Programmable File Systems interesting now? How are we making Programmable File Systems a reality?


The background:  major milestone in the evolution of digital computers was the development of the stored-program concept and the design of Turing-complete machines as opposed to fixed-program computers. Yet, we still treat an increasingly important subsystem of computers largely as a fixed-program computer: file and storage systems. Among the key reasons for this history is the justified fear that (1) any interface changes in file and storage systems will make legacy data inaccessible and locks the data to a particular system and (2) programmability will increase the probability of data loss.


Yet with the advent of open source file systems a new usage pattern emerges: users isolate subsystems of these file systems and put them in contexts not foreseen by original designers. Examples are: (1) an object-based storage back end gets a new RESTful front end to become a Amazon Web Service's S3 compliant key value store, (2) a data placement function is used as a placement function for customer accounts, and (3) the HDF5 scientific data access library is embedded into parallel storage systems. This trend shows a desire for the ability to use existing file system services and compose them to implement new services — a desire, however, that is frequently stumped by the difficulty of bringing new services of advanced functionality up to production quality and sufficiently low probability of data loss. At the same time government and industry are heavily investing into the development of new, extremely scalable, and highly efficient, distributed I/O stacks that largely abandon traditional file and storage system interfaces.


Designing programmability into file and storage systems has the following benefits: (1) we are achieving greater separation of storage performance engineering from storage reliability engineering, making it possible to optimize storage systems in a wide variety of ways without risking years of investments into code hardening; (2) we are creating an environment that encourages people to create a new stack of storage systems abstractions, both domain-specific and across domains, including sophisticated optimizers that rely on machine learning techniques; (3) we are informing commercial parallel file system vendors on the design of low-level APIs for their products so that they match the versatility of open source storage systems without having to release their entire code into open source; and (4) we are using this historical opportunity to leverage the tension between the versatility of open source storage systems and the reliability of proprietary systems to lead the community of storage system designers.