Jeff LeFevre, PhD
Adjunct Professor, Computer Science and Engineering
Jack Baskin School of Engineering
University of California, Santa Cruz
Office: Engineering 2, room 541A
jlefevre@ucsc.edu

I am currently on leave (2023-)
My research and experience focuses broadly on cloud databases and storage systems. More specifically, I take a database physical design approach toward improving database workload performance. Database physical design refers to the task of manipulating the physical configuration of the data storage in order to improve performance by statically or dynamically adapting the configuration to a given workload. Adaptations can include modifying the data layout as well as adding or removing auxiliary data structures such as indexes and materialized views under a set of resource constraints (e.g., time, storage space, network bandwidth, etc.).

I also work toward offloading database processing tasks to storage servers, e.g., SELECT, PROJECT, AGGREGATE tasks. A generic approach requires communicating and interpreting both data semantics and processing tasks directly within all levels of the storage heirarchy. We consider technologies such as (1) Web Assembly (wasm), RocksDB, DuckDB, and Ceph extension classes for processing, (2) Apache Arrow, Parquet, and Flatbuffers for data format and semantics, and (3) Apache Flight and Substrait for task communication.

I led the Skyhook Data Management project (2016-2021) as part of the Center for Research on Open Source Software at UC Santa Cruz, where I was awarded a Fellowship. SkyhookDM takes a 'programmable storage' approach that extends open source software projects (Apache Arrow and Ceph distributed object storage) toward in-storage data processing and management through Ceph's built-in extensions framework ('cls'). Our extensions embed Arrow libraries within Ceph objects, enabling data processing functions as well as physical design manipulations of local object data such as data layouts or indexing.

I am pleased to announce that Skyhook has been merged into Apache Arrow mainline in October 2021! Many thanks to all who contributed. Please see our announcements.

Through the CROSS , CERN/HSF , and Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP) organizations, the Skyhook project earned several spots in Google Summer of Code (GSoC) for which I was a faculty mentor (see below).

I received my PhD in 2014 and subsequently joined the Vertica database R&D group at Hewlett Packard in Palo Alto, where I worked for Dr. Meichun Hsu and Dr. Malu Castellanos on integrating Vertica with external database and machine learning (ML) engines such as Distributed-R and Apache Spark.
Education

Experience
Throughout graduate school, I spent time working at the following orgs.
Selected Publications
Full list of publications available on Google Scholar.

Courses Taught

Other Teaching

Graduate students supervised

Professional Service: External Student Mentoring

Professional Service: Invited Peer Reviewer

Patents Granted