Michał Dereziński

Email: mderezin at berkeley edu
Website: http://users.soe.ucsc.edu/~mderezin

I am a research fellow at the Simons Institute for the Theory of Computing (Fall 2018, Foundations of Data Science program). I am also a postdoc at the Foundations of Data Analysis (FODA) Institute, UC Berkeley.

I obtained my Ph.D. in Computer Science at the University of California, Santa Cruz, advised by professor Manfred Warmuth. In my research, I develop efficient data sampling techniques with applications to learning theory and optimization. Prior to UCSC, I completed Master's degrees in mathematics and computer science at the University of Warsaw. I also interned at a variety of Silicon Valley research labs (e.g. Microsoft, Yahoo, eBay), working on projects ranging from online learning to large-scale distributed optimization.

Education and Research


Ph.D. in Computer Science, University of California, Santa Cruz.
Thesis: Volume Sampling for Linear Regression  PDF

M.S. in Computer Science, University of Warsaw.
Thesis: On Generating Concept Hierarchies with Fuzzy Data  PDF

M.S. in Mathematics, University of Warsaw.
Thesis: Isomorphic Properties of Function Space BV on Simply Connected Planar Sets  PDF

Leveraged volume sampling for linear regression
M. Dereziński, M. Warmuth, D. Hsu
2018  arXiv
Proposed a new variant of volume sampling, which produces the first unbiased least squares estimator with strong loss bounds for linear regression. Also, proposed a new faster joint sampling strategy called determinantal rejection sampling, and developed new techniques for proving tail bounds for the sums of dependent random matrices that arise from volume sampling.

Reverse iterative volume sampling for linear regression
M. Dereziński, M. Warmuth
JMLR 2018  arXiv
Introduced the reverse iterative volume sampling framework, used to develop new expectation formulas (e.g. unbiasedness of volume-sampled least squares estimator), a regularized extension of volume sampling with statistical guarantees, and state-of-the-art sampling algorithms.

Subsampling for Ridge Regression via Regularized Volume Sampling
M. Dereziński, M. Warmuth
AISTATS 2018  arXiv  Poster  Talk
Proposed a new efficient procedure for solving a ridge regression task over a large unlabeled dataset, when computing labels is expensive. This method, called regularized volume sampling, is fast, easy to implement, and offers strong statistical guarantees.

Batch-Expansion Training: An Efficient Optimization Framework
M. Dereziński, D. Mahajan, S. S. Keerthi, S.V.N. Vishwanathan, M. Weimer
AISTATS 2018  arXiv  Poster  Talk
Proposed a parameter-free batch optimization method running on a gradually expanding dataset, which outperforms standard batch and stochastic approaches in large-scale settings.

Discovering Surprising Documents with Context-Aware Word Representations
M. Dereziński, K. Rohanimanesh, A. Hydrie
IUI 2018  PDF  Talk
Developed a information-theoretic model for discovering surprise in text, that outperforms deep learning in a practical task.

Unbiased estimates for linear regression via volume sampling
M. Dereziński, M. Warmuth
NIPS 2017  arXiv  Poster  Spotlight
Showed that volume sampling provides an unbiased estimate of the least squares solution from a subset of labels, which is close to the optimum solution up to a multiplicative factor. Also, proposed an efficient algorithm for volume sampling.

Anticipating Concept Drift in Online Learning
M. Dereziński, B. N. Bhaskar
LFED Workshop at NIPS 2015  PDF
Developed and analyzed online optimization algorithms that dynamically adapt to concept drift.

The limits of squared Euclidean distance regularization
M. Dereziński, M. Warmuth
NIPS 2014  PDF  Spotlight
Proved a lower bound for the loss of learning algorithms that regularize with squared Euclidean distance, for a certain family of learning problems. Proposed and experimentally tested a conjecture extending the result to deep learning.

Active Semi-Supervised Concept Hierarchy Refinement
M. Dereziński
LAWS 2012 workshop  PDF
Developed and tested a new algorithm for refining concept hierarchies generated from unsupervised data, by actively querying a human expert.


Research Intern, Microsoft Research, Sunnyvale, CA
Developed a new optimization framework combining the benefits of stochastic and batch methods.

Intern Research Scientist, Yahoo Inc., Sunnyvale, CA
Developed optimization algorithms for recommendation systems that dynamically adapt to changing environment.

Intern Research Scientist, eBay Inc., San Jose, CA
Developed and implemented an unsupervised recommendation system based on topic modeling, for finding interesting products.

Software engineer Intern, Nvidia, Santa Clara, CA.

Intern, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw, Poland.
Researched algorithms related to concurrent programming.