Guest Speakers

6 Feb Valerie Henson, Linux Kernel Developer, Intel

Repair-driven File System Design

Abstract: Traditionally, file systems have been designed with only on-line performance in mind. The performance and reliability of full file system check and repair has been an afterthought. Unfortunately, disk hardware trends over the next 5-10 years will increase fsck time by a factor of about 10, leading to unacceptable file system downtime. We will discuss repair-driven file system design and how it interacts with distributed file system design.

Bio: Val Henson is Linux kernel developer for the Open Source Technology Center at Intel. She was one of the key architects of Sun's ZFS file system and is currently involved in a variety of Linux file systems projects, including the design and development of chunkfs, a repair-driven file system for Linux. In her spare time, she maintains the TCP/IP Drinking Game and enjoys hiking.

13 Feb Paul Massiglia, Chief Technology Strategist, agami Systems

Five Enterprise Storage Shibboleths You Can Make Obsolete

Abstract: As digital information technology specialties go, you've chosen a pretty mature one. Data storage has been around practically since there have been computers, and it is host to some of the most thoroughly-developed technologies, standards, product concepts, and ways of doing things that exist in computing. As researchers, you basically come up with new ideas and techniques. Is there any hope of new and better ideas being incorporated into mainstream data storage technology, or are you who invent new data storage techniques inevitably destined for careers in demonstrating hypotheses? This talk presents one person's view of areas of enterprise data storage technology that may be ripe for innovation; areas in which you, as a well-grounded technologist, may have the opportunity to have a real impact on the way things are done in the data storage world. The talk doesn't give any answers; it poses questions. Your mission, should you choose to accept it (remember the pre-Tom Cruise Mission Impossible, anyone?), is to come up with the answers.

Bio: Paul Massiglia has been associated with the data storage industry for over 25 years. He has held engineering and marketing positions with major storage hardware and software suppliers, including, the former Digital Equipment Corporation, Adaptec Inc., Quantum Corporation, VERITAS (Symantec), where he created and directed VERITAS Publishing. He has been vice-chairman of the RAID Advisory Board, a Board of Directors member of the SNIA, as well as a member of other industry organizations including the Fibre Channel Association, the Fibre Channel Loop Community, and the SCSI Trade Association. As Chief Technology Strategist for agámi Systems, he advises the company on technology issues, writes technical white papers, and consults with agámi customers on technology issues. The author or editor of thirteen books on data storage-related topics, Mr. Massiglia is also a frequent participant in industry conferences.

Mr. Massiglia is the author of The RAIDbook, The Digital Large System Mass Storage Handbook, Managing Online Volumes in Windows Operating Systems, Highly Available Storage for Windows Servers, Virtual Storage Redefined, Standardizing Storage Management, Using Local Copy Services, and Using Dynamic Storage Tiering, as well as co-author (with Richard Barker) of Storage Area Network Essentials. He is also co-editor (with Evan Marcus) of The Resilient Enterprise, a book on protecting against and recovering from information technology disasters, and From Cost Center to Value Center; Making the Move to Utility Computing.

16 Feb Mike Eisler, Technical Director, Netapp

Data ONTAP GX: A Scalable Storage Cluster

Abstract: Data ONTAP GX is a clustered Network Attached File server composed of a number of cooperating filers. Each filer manages its own local file system, which consists of a number of disconnected flexible volumes. A separate namespace infrastructure runs within the cluster, which connects the volumes into one or more namespaces by means of internal junctions. The cluster collectively exposes a potentially large number of separate virtual servers, each with its own independent namespace, security and administrative domain. The cluster implements a protocol routing and translation layer which translates requests in all incoming file protocols into a single unified internal file access protocol called SpinNP. The translated requests are then forwarded to the correct filer within the cluster for servicing by the local file system instance. This provides data location transparency, which is used to support transparent data migration, load balancing, mirroring for load sharing and data protection, and fault tolerance. The cluster itself greatly simplifies the administration of a large number of filers by consolidating them into a single system image. Results from benchmarks (over one million file operations per second on a 24 node cluster) and customer experience demonstrate linear scaling.

Bio: Mike Eisler graduated from the University of Central Florida with a master's degree in computer science in 1985. His first exposure to NFS and NIS came while working for Lachman Associates, Inc., where he was responsible for porting NFS and NIS to System V platforms. He later joined Sun Microsystems, Inc., responsible for projects such as NFS server performance, NFS/TCP, WebNFS, NFS secured with Kerberos V5, NFS Version 4, and JavaCard security. Mike has authored or coauthored several Request For Comments documents for the Internet Engineering Task Force, relating to NFS and security. He is currently a Technical Director at Network Appliance's NAS business unit.

27 Feb Gary Grider, John Bent, James Nunez (LANL)

The Road to High Performance I/O and Storage Systems

Abstract: This talk provides an introduction to high performance computing (HPC)

at the national labs. Emphasis is placed on the I/O and storage

challenges of supporting widely scaled and highly synchronized parallel

I/O. An example workload is characterized and several hardware and

software approaches to these challenges are considered. Discussion and

interruptions are encouraged!

Bios:

Gary Grider is the group leader of HPC-5, the High-Performance Systems Integration group in the HPC division at the Los Alamos National Laboratory (LANL). Gary is also the the US-DOE/National Nuclear Safety Administration I/O and Storage Coordinator and the High End Computing Inter-agency Working Group (HECIWG) File Systems and I/O Leader which coordinates the US government investments in File Systems and I/O research and development. Additionally, Gary is the LANL director of the LANL/University of California Santa Cruz Institute for Scalable Scientific Data Management, an educational institute training advanced degree students on high performance computing storage topics. Prior to working for LANL, Gary spent 10 years working for IBM in the storage area and 5 years working at Sandia National Laboratories also in the data storage field.

John Bent currently works on I/O and storage systems for the large parallel clusters at Los Alamas National Laboratory. John is also involved in various academic outreach programs at LANL such as the ISSDM and HECIWG. Prior to joining LANL in 2005, John researched data driven batch scheduling systems for his dissertation work at Wisconsin.

James Nunez is the co-team leader for the Networking and Scalable I/O Team in the High Performance Computing (HPC) Systems Integration Group at Los Alamos National Lab. He is involved in and supports all aspects of File Systems and Storage for use by the Laboratory, including the ASC File System Path forward contract to create the high scalable global parallel file system based on secure object device technology, Lustre, and the ASC Alliances with Universities to develop the first reference implementation of NFS version 4, develop middleware solutions for small unaligned writes and investigate improved scalable metadata and security. James was an integral component in deploying the first truly global parallel file system shared in a scalable manner between multiple heterogeneous terascale clusters at the Lab. At Los Alamos, James is concerned with evaluating and benchmarking HPC file systems, I/O middleware solutions, and understanding the I/O interface and application of new storage innovations to real science applications and has worked in the High End Computing Interagency Working Group on coordination of interagency I/O funding.

6 Mar Alan Rowe, Intransa

Lessons from Software Development and Deployment

Abstract: I will talk about case studies for various instances of interesting software failures, and what we can learn from them.

Bio: Alan Rowe started playing with computers in Scotland almost before they were invented. He worked at Plessey in England, Bell-Northern Research in Canada, Tandem in Cupertino on high-reliability systems, and Network Appliance in Sunnyvale on storage systems, before joining Intransa, a storage startup. He believes that computers should do what their users expect them to do, which makes him unique.

15 Mar Richard Hedges, Lawrence Livermore National Labs

I/O and Storage at Livermore Computing: a Guided Tour

Abstract: The Livermore Laboratory computer center contains the most powerful assemblage of scientific computing resources in the world. I/O and Storage requirements are likewise extreme. This lecture is designed to convey many of the practical implications of "The Road to High Performance I/O and Storage Systems" using our computer center as an example.

Bio: Richard Hedges is a computer scientist at the Lawrence Livermore National Laboratory. He began his career as a computational chemist (actually at this fine institution, Crown College '76) and earned a Ph.D. in Chemical Physics (Colorado, '83). From these early experiences in computation, his interest in high end computing evolved, leading him to positions with Cray Research, Fujitsu, and Silicon Graphics. He joined the Livermore Laboratory in 1998 to work on parallel I/O.