Guest Speakers

11/30 James P. Ahrens, Visualization Team Leader, LANL

Data-Intensive Analysis and Visualization on Numerically-Intensive Supercomputers

Abstract: With the advent of the era of petascale supercomputing, via the delivery of the Roadrunner supercomputing platform at Los Alamos National Laboratory, there is a pressing need to address the problem of visualizing massive petascale-sized results. In this presentation, I discuss progress on a number of approaches including in-situ analysis, multi-resolution out-of-core streaming and interactive rendering on the supercomputing platform. These approaches are placed in context by the emerging area of data-intensive supercomputing.

Bio: James Ahrens graduated with his Ph.D. in Computer Science from the University of Washington. His dissertation topic was on a high-performance scientific visualization and experiment management system. After graduation he joined Los Alamos National Laboratory as a staff member working for the Advanced Computing Laboratory (ACL). He is currently the visualization team leader in the ACL. His research areas of interest include methods for visualizing extremely large scientific datasets, distance visualization and quantitative/comparative visualization.

11/22 K. John Wu, Staff Computer Scientist, Berkeley Lab

FastBit Indexing and Scientific Discoveries with Exabytes

Abstract: As computers become more integrated into research tools, many scientific studies are becoming inundated with data. For example, high-energy physics experiments, astronomical observations, and climate simulations are all producing many petabytes of data, and they are expected to produce exabytes of data in the next decade. Effective data management is essential for making scientific discoveries in these fields. In this presentation, we will discuss a set of efficient indexing techniques in the FastBit software, and describe how they are used in a number of different scientific and commercial applications. We will also provide a broad overview of the activities in the SciDAC Scientific Data Management Center. (Slides)

Bio: Dr. Wu has worked on a broad range of topics in scientific data management and scientific computing. His current research primarily focuses on distributed data analysis and data management. He has developed a number of bitmap indexes techniques for accelerating searches on large datasets. He has also developed a number of open-source software packages including FastBit for indexing large structured datasets and TRLan for computing eigenvalues. The FastBit software has received an R&D 100 Award and is used by a number of organizations including Yahoo! to efficiently search terabytes of data. He holds a Ph.D. in computer science from University of Minnesota and is a member of ACM and IEEE.

11/2 Jacek Becla, Information Systems Specialist, SLAC

Real life data intensive applications - challenges and solutions

Abstract: Very few have experienced the petascale reality, but soon everybody will. Because there are no clear solutions or standards, it is crucial to understand the current best practices. This talk will cover emerging trends that are practically essential for petascale computing such as pushing computation to data, distributing data horizontally, decentralization, uninterrupted operation under faults and full automation. It will discuss the challenges, today's practices and solutions applicable to data-intensive scientific analytics, with focus on real life examples from astronomy, high energy physics and others. (Slides)

Bio: Jacek Becla has spent over ten years working with different scientific communities ranging from high energy physics, through astronomy to photon sciences, helping them use database technology for managing and analyzing their massive data sets. He was one of the key people that designed and built world's largest database for BaBar, and he now leads the design of the 100 petabyte database for the next generation astronomical survey — LSST. Prior to joining SLAC National Accelerator Laboratory / Stanford University back in 1997, he worked at CERN in Geneva, Switzerland on researching database technologies for the LHC experiment.

Jacek is very active in trying to bridge the gap between science and industry. He initiated a series of Extremely Large Databases (XLDB) workshops to stimulate collaboration between scientific and industrial users, database vendors and academia. He authored many papers, mainly on managing large scientific data sets. He served on several review committees for large database and IT projects. Jacek received a M.Sc. in Electronic Engineering from the University of Science and Technology in Krakow, Poland.

9/30 Gary Grider, Deputy Division Leader, Los Alamos National Lab

Exa‐Scale FSIO - Can we get there? Can we afford to?

Abstract: This talk will describe the anticipated DOE Exascale initiative, a prospective very large extreme scale supercomputing program being formulated by DOE Office of Science and DOE NNSA. Motivations for the program as well as how the program may proceed will be presented. Anticipated Exascale machine dimensions will be provided as well. An analysis of the costs of providing scalable file systems and I/O for these future very large supercomputers will be examined in detail. (Slides)

Bio: Gary currently is the Deputy Division Leader of the High Performance Computing (HPC) Division at Los Alamos National Laboratory, where he is responsible for managing the personnel and processes required to stand up and operate major supercomputing systems, networks, and storage systems for the Laboratory for both the DOE/NNSA Advanced Simulation and Computing (ASC) program and LANL institutional HPC environments. One of his main tasks is conducting and sponsoring R&D to keep the new technology pipeline full and provide solutions to problems in the Lab’s HPC environment. Gary is also the LANL lead in coordinating DOE/NNSA alliances with universities in the HPC I/O and file systems area. He is one of the principal leaders of a small group of multi-agency HPC I/O experts that guide the government in its I/O related computer science R&D investments through the High End Computing Interagency Working Group HECIWG, and is the Director of the Los Alamos/UCSC Institute for Scientific Scalable Data Management and the Los Alamos/CMU Institute for Reliable High Performance Information Technology. He is also the LANL PI for the Petascale Data Storage Institute, a SciDAC2 Institute award winning project. Before working for Los Alamos, Gary spent 10 years with IBM at Los Alamos working on advanced product development and test and 5 years with Sandia National Laboratories working on HPC storage systems.

Gary holds a B.S. in Electrical Engineering along with a registration for certified engineer from Oklahoma State University and the State of Oklahoma. He also received an M.B.A. with emphasis in Management Information Systems, Statistics, Physics, and Mathematics from Oklahoma State University. By far the bulk of Gary's knowledge comes from daily hands on research, design, prototyping, development and testing of new systems and hardware, and through mentoring of people and projects within the high performance storage and network areas.