Programmable Storage Systems
Last updated on
Jan 19, 2020
Website: programmability.us
Press Coverage: The Next Platform (8/1/17)
Funding: CROSS, DOE SSIO SIRIUS, NSF CNS-1705021
Workshop: 1st Programmable File Systems Workshop (PFSW) held in conjunction with ACM HPDC’14
A programmable storage system exposes internal subsystem abstractions as “interfaces” to enable the creation of higher-level services via composition. Malacology is a programmable storage system that enables the programmability of internal abstractions in Ceph. Using Malacology, we built the Mantle and ZLog services.
Carlos Maltzahn
Adjunct Professor, Sage Weil Presidential Chair for Open Source Software, Founder & Director of CROSS, OSPO
My research interests include programmable storage systems, big data storage & processing, scalable data management, distributed systems performance management, and practical reproducible research.
Publications
Jayjeet Chakraborty,
Ivo Jimenez,
Sebastiaan Alvarez Rodriguez,
Alexandru Uta,
Jeff LeFevre,
Carlos Maltzahn.
Skyhook: Towards an Arrow-Native Storage System.
CCGrid22,
2022.
Jayjeet Chakraborty,
Carlos Maltzahn,
David Li,
Tom Drabas.
Skyhook: Bringing Computation to Storage with Apache Arrow .
Apache Arrow Blog, January 31, 2022,
2022.
Sebastiaan Alvarez Rodriguez,
Jayjeet Chakraborty,
Aaron Chu,
Ivo Jimenez,
Jeff LeFevre,
Carlos Maltzahn,
Alexandru Uta.
Zero-Cost, Arrow-Enabled Data Interface for Apache Spark.
arXiv:2106.13020 [cs.DC],
2021.
Jianshen Liu,
Carlos Maltzahn,
Craig Ulmer,
Matthew Leon Curry.
Performance Characteristics of the BlueField-2 SmartNIC.
arXiv:2105.06619 [cs.NI],
2021.
Jayjeet Chakraborty,
Ivo Jimenez,
Sebastiaan Alvarez Rodriguez,
Alexandru Uta,
Jeff LeFevre,
Carlos Maltzahn.
Towards an Arrow-native Storage System.
arXiv:2105.09894 [cs.DC],
2021.
Aaron Chu,
Jeff LeFevre,
Carlos Maltzahn,
Aldrin Montana,
Peter Alvaro,
Dana Robinson,
Quincey Koziol.
Mapping Scientific Datasets to Programmable Storage.
EPJ Web Conf.,
2020.
Jeff LeFevre,
Carlos Maltzahn.
SkyhookDM: Data Processing in Ceph with Programmable Storage.
USENIX ;login:,
2020.
Jeff LeFevre,
Carlos Maltzahn.
Scaling databases and file APIs with programmable Ceph object storage.
2020 Linux Storage and Filesystems Conference (Vault'20, co-located with FAST'20 and NSDI'20),
2020.
Aaron Chu,
Ivo Jimenez,
Jeff LeFevre,
Carlos Maltzahn.
SkyhookDM: Programmable Storage for Datasets.
Poster at IRIS-HEP Poster Session,
2020.
Aaron Chu,
Jeff LeFevre,
Carlos Maltzahn,
Aldrin Montana,
Peter Alvaro,
Dana Robinson,
Quincey Koziol.
SkyhookDM: Mapping Scientific Datasets to Programmable Storage.
24th International Conference on Computing in High Energy & Nuclear Physics,
2019.
Kathryn Dahlgren,
Jeff LeFevre,
Ashay Shirwadkar,
Ken Iizawa,
Aldrin Montana,
Peter Alvaro,
Carlos Maltzahn.
Towards Physical Design Management in Storage Systems.
4th International Parallel Data Systems Workshop (PDSW 2019, co-located with SC'19),
2019.
Jeff LeFevre,
Noah Watkins,
Michael Sevilla,
Carlos Maltzahn.
Skyhook: Programmable storage for databases.
2019 Linux Storage and Filesystems (Vault'19, co-located with FAST'19),
2019.
Carlos Maltzahn.
Should Storage Devices Stay Dumb or Become Smart?.
Breakouts Session abstract at 10th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage'18, co-located with USENIX ATC'18),
2018.
Michael A. Sevilla,
Reza Nasirigerdeh,
Carlos Maltzahn,
Jeff LeFevre,
Noah Watkins,
Peter Alvaro,
Margaret Lawson,
Jay Lofstead,
Jim Pivarski.
Tintenfisch: File System Namespace Schemas and Generators.
HotStorage ‘18,
2018.
Michael A. Sevilla,
Ivo Jimenez,
Noah Watkins,
Jeff LeFevre,
Peter Alvaro,
Shel Finkelstein,
Patrick Donnelly,
Carlos Maltzahn.
Cudele: An API and Framework for Programmable Consistency and Durability in a Global Namespace.
IPDPS 2018,
2018.
Michael A. Sevilla,
Carlos Maltzahn,
Peter Alvaro,
Reza Nasirigerdeh,
Bradley W. Settlemyer,
Danny Perez,
David Rich,
Galen M. Shipman.
Programmable Caches with a Data Management Language & Policy Engine.
CCGRID ‘18,
2018.
Zhihao Jia,
Sean Treichler,
Galen Shipman,
Michael Bauer,
Noah Watkins,
Carlos Maltzahn,
Pat McCormick,
Alex Aiken.
Integrating External Resources with a Task-Based Programming Model.
HiPC 2017,
2017.
Latchesar Ionkov,
Carlos Maltzahn,
Michael Lang.
Optimized Scatter/Gather Data Operations for Parallel Storage.
PDSW-DISCS 2017 at SC17,
2017.
Noah Watkins,
Michael A. Sevilla,
Ivo Jimenez,
Kathryn Dahlgren,
Peter Alvaro,
Shel Finkelstein,
Carlos Maltzahn.
DeclStore: Layering is for the Faint of Heart.
HotStorage ‘17,
2017.
Michael A. Sevilla,
Noah Watkins,
Ivo Jimenez,
Peter Alvaro,
Shel Finkelstein,
Jeff LeFevre,
Carlos Maltzahn.
Malacology: A Programmable Storage System.
EuroSys ‘17,
2017.
Noah Watkins,
Zhihao Jia,
Galen Shipman,
Carlos Maltzahn,
Alex Aiken,
Pat McCormick.
Automatic and transparent I/O optimization with storage integrated application runtime support.
PDSW'15,
2015.
Michael Sevilla,
Noah Watkins,
Carlos Maltzahn,
Ike Nassi,
Scott Brandt,
Sage Weil,
Greg Farnum,
Sam Fineberg.
Mantle: A Programmable Metadata Load Balancer for the Ceph File System.
SC ‘15,
2015.
Joe Buck,
Noah Watkins,
Greg Levin,
Adam Crume,
Kleoni Ioannidou,
Scott Brandt,
Carlos Maltzahn,
Neoklis Polyzotis,
Aaron Torres.
SIDR: Structure-Aware Intelligent Data Routing in Hadoop.
SC ‘13,
2013.
Noah Watkins,
Carlos Maltzahn,
Scott Brandt,
Ian Pye,
Adam Manzanares.
In-Vivo Storage System Development.
BigDataCloud ‘13 (in conjunction with EuroPar 2013),
2013.
Latchesar Ionkov,
Mike Lang,
Carlos Maltzahn.
DRepl: Optimizing Access to Application Data for Analysis and Visualization.
MSST ‘13,
2013.
Noah Watkins,
Carlos Maltzahn,
Scott A. Brandt,
Adam Manzanares.
DataMods: Programmable File System Services.
PDSW'12,
2012.
Joe Buck,
Noah Watkins,
Jeff LeFevre,
Kleoni Ioannidou,
Carlos Maltzahn,
Neoklis Polyzotis,
Scott A. Brandt.
SciHadoop: Array-based Query Processing in Hadoop.
SC ‘11,
2011.
Scott A. Brandt,
Carlos Maltzahn,
Neoklis Polyzotis,
Wang-Chiew Tan.
Fusing Data Management Services with File Systems.
Proceedings of the 2009 ACM Petascale Data Storage Workshop (PDSW 09),
2009.
Joe Buck,
Noah Watkins,
Carlos Maltzahn,
Scott A. Brandt.
Abstract Storage: Moving file format-specific abstractions into petabyte-scale storage systems.
2nd International Workshop on Data-Aware Distributed Computing (in conjunction with HPDC-18),
2009.