Guest Speakers

 

11/4/13    Sasha Ames, Postdoctoral Researcher, LLNL


Design and Optimization of a Metagenomics Analysis Workflow for NVRAM


Abstract: Metagenomic analysis, the study of microbial communities found in environmental samples, presents considerable chal- lenges in quantity of data and computational cost. We present a novel metagenomic analysis pipeline that leverages emerging large address space compute nodes with NVRAM to hold a searchable, memory-mapped “k-mer” database of all known genomes and their taxonomic lineage. We describe challenges to creating the many hundred gigabytes sized databases and describe database organization optimizations that enable our Livermore Metagenomic Analysis Toolkit (LMAT) software to effectively query the k-mer key-value store, which resides in high performance flash storage, as if fully in memory.



11/22/10    K. John Wu, Staff Computer Scientist, LBL


FastBit Indexing and Scientific Discoveries with Exabytes


Abstract: As computers become more integrated into research tools, many scientific studies are becoming inundated with data.  For example, high-energy physics experiments, astronomical observations, and climate simulations are all producing many petabytes of data, and they are expected to produce exabytes of data in the next decade.  Effective data management is essential for making scientific discoveries in these fields.  In this presentation, we will discuss a set of efficient indexing techniques in the FastBit software, and describe how they are used in a number of different scientific and commercial applications.  We will also provide a broad overview of the activities in the SciDAC Scientific Data Management Center.


Bio: Dr. Wu has worked on a broad range of topics in scientific data management and scientific computing.  His current research primarily focuses on distributed data analysis and data management.  He has developed a number of bitmap indexes techniques for accelerating searches on large datasets.  He has also developed a number of open-source software packages including FastBit for indexing large structured datasets and TRLan for computing eigenvalues.  The FastBit software has received an R&D 100 Award and is used by a number of organizations including Yahoo! to efficiently search terabytes of data.  He holds a Ph.D. in computer science from University of Minnesota and is a member of ACM and IEEE.



9/27/10    Manfred Warmuth, Professor of Computer Science, UCSC


Designing adaptive online algorithms by maintaining a mixture over a set of experts


Abstract: I will discuss how to measure the online nature of a data stream and how to exploit this onlineness with an adaptive learning algorithm. This algorithm maintains a mixtures over many specialized algorithms (often called experts) that are good for certain types of data streams. The "master algorithm" adaptively gives those experts high weight that are good on the current data stream. We discuss these concepts in two domains: online caching and adaptive algorithms for spinning down the disk on a laptop.


Bio: Manfred Warmuth is a researcher and professor at the University of California, Santa Cruz. His main research interest is computational learning theory with a special focus on online learning algorithms. Manfred is known for the Weighted Majority Algorithm (and for his other interests).