storage

Towards an Arrow-native Storage System

With the ever-increasing dataset sizes, several file formats like Parquet, ORC, and Avro have been developed to store data efficiently and to save network and interconnect bandwidth at the price of additional CPU utilization. However, with the advent …

SkyhookDM: Data Processing in Ceph with Programmable Storage

My schedule at Vault/FAST/NSDI 2020

Looking forward to meeting friends and colleagues this week. Here is my schedule.

Scaling databases and file APIs with programmable Ceph object storage

SkyhookDM: Programmable Storage for Datasets

Towards Physical Design Management in Storage Systems

In the post-Moore era, systems and devices with new architectures will arrive at a rapid rate with significant impacts on the software stack. Applications will not be able to fully benefit from new architectures unless they can delegate adapting to …

MBWU: Benefit Quantification for Data Access Function Offloading

The storage industry is considering new kinds of storage de- vices that support data access function offloading, i.e. the ability to perform data access functions on the storage device itself as opposed to performing it on a separate compute system …

MBWU (MibeeWu): Quantifying benefits of offloading data management to storage devices

Skyhook: Programmable storage for databases

Ceph is an open source distributed storage system that is object-based and massively scalable. Ceph provides developers with the capability to create data interfaces that can take advantage of local CPU and memory on the storage nodes (Ceph Object …

Should Storage Devices Stay Dumb or Become Smart?