Jeff LeFevre, PhD
Adjunct Professor, Computer Science and Engineering
Jack Baskin School of Engineering
University of California, Santa Cruz
Office: Engineering 2, room 541A
jlefevre@ucsc.edu
I am currently on leave (2023-)
My research and experience focuses broadly on cloud databases and storage systems.
More specifically, I take a database physical design approach toward improving database workload performance.
Database physical design refers to the task of manipulating the physical configuration of the data storage in order to improve performance by statically or
dynamically adapting the configuration to a given workload.
Adaptations can include modifying the data layout as well as adding or removing auxiliary data structures such as indexes and materialized views under a set of resource constraints (e.g., time, storage space, network bandwidth, etc.).
I also work toward offloading database processing tasks to storage servers, e.g., SELECT, PROJECT, AGGREGATE tasks.
A generic approach requires communicating and interpreting both data semantics and processing tasks directly within all levels of the storage heirarchy.
We consider technologies such as
(1) Web Assembly (wasm), RocksDB, DuckDB, and Ceph extension classes for processing,
(2) Apache Arrow, Parquet, and Flatbuffers for data format and semantics, and
(3) Apache Flight and Substrait for task communication.
I led the
Skyhook Data Management project (2016-2021) as part of the
Center for Research on Open Source Software
at UC Santa Cruz, where I was awarded a Fellowship.
SkyhookDM takes a 'programmable storage' approach that extends open source software projects (Apache Arrow and Ceph distributed object storage)
toward in-storage data processing and management through Ceph's built-in extensions framework ('cls').
Our extensions embed Arrow libraries within Ceph objects, enabling data processing functions as well
as physical design manipulations of local object data such as data layouts or indexing.
I am pleased to announce that
Skyhook has been merged into Apache Arrow mainline in October 2021! Many thanks to all who contributed.
Please see our announcements.
Through the CROSS ,
CERN/HSF , and
Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP)
organizations, the Skyhook project earned several spots in
Google Summer of Code (GSoC) for which I was a faculty mentor (see below).
I received my PhD in 2014 and subsequently joined the
Vertica database R&D group
at Hewlett Packard in Palo Alto, where I worked for
Dr. Meichun Hsu and
Dr. Malu Castellanos on integrating Vertica with external
database and machine learning (ML) engines such as Distributed-R and Apache Spark.
Education
Experience
- Adjunct Professor, Computer Science & Engineering (2018-present), University of California, Santa Cruz
- Research Scientist and Fellow (2017), Center for Research in Open Source Software, University of California, Santa Cruz
- R&D Engineer, Big Data (2014-2016), Hewlett Packard (Vertica database)
Throughout graduate school, I spent time working at the following orgs.
Selected Publications
- A. Montana, Y. Xue, J. LeFevre, C. Maltzahn, J. Stuart, P. Kufeldt, P. Alvaro,
"A Moveable Beast: Partitioning Data and Compute for Computational Storage",
(pre-print) 2023.
-
J. Chakraborty, I. Jimenez, S. A. Rodriguez, A. Uta, J. LeFevre, C. Maltzahn,
"Skyhook: Towards an Arrow-Native Storage System",
CCGrid 2022
(pdf).
-
S.A. Rodriguez, J. Chackraborty, A. Chu, I. Jimenez, J. LeFevre, C. Maltzahn, A. Uta
"Zero-Cost, Arrow-Enabled Data Interface for Apache Spark",
IEEE Big Data 2021.
-
J. LeFevre, C. Maltzahn,
"Scaling Databases and File APIs with Programmable Ceph Object Storage",
Vault 2020.
-
K. Dahlgren, J. LeFevre, A. Shirwadkar, K. Iizawa, A. Montana, P. Alvaro, C. Maltzahn,
"Towards Physical Design Management in Storage Systems",
IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW) 2019.
-
J. LeFevre, N. Watkins, M. Sevilla, C. Maltzahn,
"Skyhook: programmable storage for databases",
Vault 2019.
-
M. Sevilla, R. Nasirigerdeh, C. Maltzahn, J. LeFevre, N. Watkins, P. Alvaro, M. Lawson, J. Lofstead, J. Pivarski,
"Tintenfisch: File System Namespace Schemas and Generators",
Hot Storage 2018.
-
M. Sevilla, N. Watkins, I. Jimenez, P. Alvaro, S. Finkelstein, J. LeFevre, C. Maltzahn,
"Malacology: A Programmable Storage System",
EuroSys 2017.
-
J. LeFevre, R. Liu, C. Inigo, M. Castellanos, L. Paz, E. Ma, M. Hsu,
"Building the Enterprise Fabric for Big Data with Vertica and Spark",
SIGMOD 2016.
-
S. Prasad, A. Fard, V. Gupta, J. Martinez, J. LeFevre, V. Xu, M. Hsu, I. Roy,
"Large-scale Predictive Analytics in Vertica: Fast Data Transfer, Distributed Model Creation, and In-database Prediction",
SIGMOD 2015.
-
J. LeFevre, J. Sankaranarayanan, H. Hacigumus, J. Tatemura, N. Polyzotis, M.J. Carey,
"MISO: Souping Up Big Data Query Processing with a Multistore System",
SIGMOD 2014.
pdf.
-
J. LeFevre, J. Sankaranarayanan, H. Hacigumus, J. Tatemura, N. Polyzotis, M.J. Carey,
"Opportunistic Physical Design for Big Data Analytics",
SIGMOD 2014.
pdf.
-
J. LeFevre, J. Sankaranarayanan, H. Hacigumus, J. Tatemura, N. Polyzotis,
"Towards a Workload for Evolutionary Analytics",
2nd Workshop on Data Analytics in the Cloud, (at SIGMOD 2013).
(extended version).
-
H. Hacigumus, J. Sankaranarayanan, J. Tatemura, J. LeFevre, N. Polyzotis,
"Odyssey: a multistore system for evolutionary analytics",
VLDB 2013.
-
M.P. Consens, K. Ioannidou, J. LeFevre, N. Polyzotis,
"Divergent Physical Design Tuning for Replicated Databases",
SIGMOD 2012. pdf.
-
I. Jimenez, J. LeFevre, N. Polyzotis, H. Sanchez, K. Schnaitter,
"Benchmarking Online Index-Tuning Algorithms",
IEEE Data Engineering Bulletin 34(4) 2011.
-
J. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Polyzotis, S. Brandt,
"SciHadoop: Array-based Query Processing in Hadoop",
SC 2011.
-
D. Kephart, J. LeFevre,
"CodeGen: The Generation and Testing of DNA Code Words",
IEEE Evolutionary Computation 2004.
pdf.
Full list of publications available on Google Scholar.
Courses Taught
- CSE 280S Graduate Seminar on Computer Systems
Fall 2022
- CSE 280S Graduate Seminar on Computer Systems
Winter 2020
- CMPS 280S Graduate Systems Research Seminar
Fall 2019
- CMPS 181 Database Systems II
Winter 2014
Other Teaching
- Guest Lecturer:
- Teaching Assistant:
- CMPS 111 Operating Systems (ugrad),
Fall 2008
Graduate students supervised
Professional Service: External Student Mentoring
- Faculty Mentor: Google Summer of Code
- 2021
(CROSS)
Yash Jipkate, Indian Institute of Technology, Varanasi, India
- 2020
(HSF/CERN)
Aditi Gupta, National Institute of Technology, Surathkal, India
- 2019
(CROSS)
Ashay Shirwadkar, University of California, Irvine, California, USA
- Faculty Mentor: IRIS-HEP Summer Fellowship
- 2021
Jayjeet Chakraborty, National Institute of Technology, Duragpur, India
- 2020
Xiongfeng Song, Rice University, Texas, USA
Professional Service: Invited Peer Reviewer
Patents Granted
- 08-31-2021 US11106672B2 "Queries based on ranges of hash values."
- 02-02-2021 US10909119B2 "Accessing electronic databases."
- 12-10-2019 US10503718B2 "Parallel transfers of electronic data."
- 02-14-2017 US9569491B2 "Multistore online tuning system."
- 10-25-2016 US9477708B2 "System for multistore execution environments with storage constraints."
- 11-10-2015 US9183253B2 "System for evolutionary analytics."