Jeff LeFevre, PhD
Adjunct Professor
Department of Computer Science and Engineering
University of California, Santa Cruz
Office: Engineering 2, room 541A
jlefevre@ucsc.edu
I currently on leave (Spring 2023). My research interests are in cloud databases, database physical design, and storage systems.
I lead the
Skyhook Data Management
project as part of the
Center for Research on Open Source Software
at UC Santa Cruz, where I was awarded a CROSS Incubator Fellowship.
SkyhookDM takes a 'programmable storage' approach that extends opens source software Apache Arrow and Ceph distributed object storage
toward in-storage data processing and management through Ceph's built-in extensions framework ('cls').
Our extensions embed Arrow libraries within Ceph objects, enabling data processing functions as well
as physical design manipulations of local object data such as data layouts or indexing.
I am pleased to announce that
Skyhook has been merged into Apache Arrow mainline in October 2021!
Please see our announcements page for the latest news.
Through the CROSS and
CERN/HSF
organizations, the Skyhook project has earned several spots in
Google Summer of Code (GSoC) for which I was a mentor:
2019,
2020,
2021.
Through the
Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP),
Skyhook has also been awarded three fellowships for which I was a mentor:
2020 ,
2021
2021 .
I received my PhD in 2014 from
the UC Santa Cruz Database group.
My PhD advisor was
Neoklis Polyzotis.
I subsequently joined
HP Vertica database R&D group
at Hewlett Packard in Palo Alto, where I worked on integrating Vertica with external analtyics engines such as Distributed-R and Apache Spark.
During graduate school I spent several summers at
Education
- B.S. Computer Science and Engineering, University of South Florida, 2004. Thesis: "The generation and testing of DNA codewords."
- M.S. Computer Science and Enginering, University of California San Diego, 2009. Thesis: "Improving disk array performance and reliability".
- Ph.D. Computer Science and Engineering, University of California Santa Cruz, 2014. Thesis:
"Physical design tuning methods for emerging system architectures".
Selected Publications
- A. Montana, Y. Xue, J. LeFevre, C. Maltzahn, J. Stuart, P. Kufeldt, P. Alvaro,
"A Moveable Beast: Partitioning Data and Compute for Computational Storage",
(pre-print) 2023.
-
J. Chakraborty, I. Jimenez, S. A. Rodriguez, A. Uta, J. LeFevre, C. Maltzahn,
"Skyhook: Towards an Arrow-Native Storage System",
CCGrid 2022
(pdf).
-
S.A. Rodriguez, J. Chackraborty, A. Chu, I. Jimenez, J. LeFevre, C. Maltzahn, A. Uta
"Zero-Cost, Arrow-Enabled Data Interface for Apache Spark",
IEEE Big Data 2021.
-
J. LeFevre, C. Maltzahn,
"Scaling Databases and File APIs with Programmable Ceph Object Storage",
Vault 2020.
-
K. Dahlgren, J. LeFevre, A. Shirwadkar, K. Iizawa, A. Montana, P. Alvaro, C. Maltzahn,
"Towards Physical Design Management in Storage Systems",
IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW) 2019.
-
J. LeFevre, N. Watkins, M. Sevilla, C. Maltzahn,
"Skyhook: programmable storage for databases",
Vault 2019.
-
M. Sevilla, R. Nasirigerdeh, C. Maltzahn, J. LeFevre, N. Watkins, P. Alvaro, M. Lawson, J. Lofstead, J. Pivarski,
"Tintenfisch: File System Namespace Schemas and Generators",
Hot Storage 2018.
-
M. Sevilla, N. Watkins, I. Jimenez, P. Alvaro, S. Finkelstein, J. LeFevre, C. Maltzahn,
"Malacology: A Programmable Storage System",
EuroSys 2017.
-
J. LeFevre, R. Liu, C. Inigo, M. Castellanos, L. Paz, E. Ma, M. Hsu,
"Building the Enterprise Fabric for Big Data with Vertica and Spark",
SIGMOD 2016.
-
S. Prasad, A. Fard, V. Gupta, J. Martinez, J. LeFevre, V. Xu, M. Hsu, I. Roy,
"Large-scale Predictive Analytics in Vertica: Fast Data Transfer, Distributed Model Creation, and In-database Prediction",
SIGMOD 2015.
-
J. LeFevre, J. Sankaranarayanan, H. Hacigumus, J. Tatemura, N. Polyzotis, M.J. Carey,
"MISO: Souping Up Big Data Query Processing with a Multistore System",
SIGMOD 2014.
pdf.
-
J. LeFevre, J. Sankaranarayanan, H. Hacigumus, J. Tatemura, N. Polyzotis, M.J. Carey,
"Opportunistic Physical Design for Big Data Analytics",
SIGMOD 2014.
pdf.
-
J. LeFevre, J. Sankaranarayanan, H. Hacigumus, J. Tatemura, N. Polyzotis,
"Towards a Workload for Evolutionary Analytics",
2nd Workshop on Data Analytics in the Cloud, (at SIGMOD 2013).
(extended version).
-
H. Hacigumus, J. Sankaranarayanan, J. Tatemura, J. LeFevre, N. Polyzotis,
"Odyssey: a multistore system for evolutionary analytics",
VLDB 2013.
-
M.P. Consens, K. Ioannidou, J. LeFevre, N. Polyzotis,
"Divergent Physical Design Tuning for Replicated Databases",
SIGMOD 2012. pdf.
-
I. Jimenez, J. LeFevre, N. Polyzotis, H. Sanchez, K. Schnaitter,
"Benchmarking Online Index-Tuning Algorithms",
IEEE Data Engineering Bulletin 34(4) 2011.
-
J. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Polyzotis, S. Brandt,
"SciHadoop: Array-based Query Processing in Hadoop",
SC 2011.
-
D. Kephart, J. LeFevre,
"CodeGen: The Generation and Testing of DNA Code Words",
IEEE Evolutionary Computation 2004.
pdf.
Google Scholar profile
Courses Taught
- CSE 280S Seminar on Computer Systems (grad),
Fall 2022
- CSE 280S Seminar on Computer Systems (grad),
Winter 2020
- CMPS 280S Systems Research Seminar (grad),
Fall 2019
- CMPS 181 Database Systems II (ugrad),
Winter 2014
Other Teaching
- Guest Lecturer:
- Teaching Assistant:
- CMPS 111 Operating Systems (ugrad),
Fall 2008.
Professional Service, Mentoring
- Faculty Mentor: Google Summer of Code
Professional Service, Peer Review
Patents Granted
- 11-10-2015 US9183253B2 "System for evolutionary analytics."
- 10-25-2016 US9477708B2 "System for multistore execution environments with storage constraints."
- 02-14-2017 US9569491B2 "Multistore online tuning system."
- 12-10-2019 US10503718B2 "Parallel transfers of electronic data."
- 02-02-2021 US10909119B2 "Accessing electronic databases."
- 08-31-2021 US11106672B2 "Queries based on ranges of hash values."