hpc

Structure-Aware Intelligent Data Routing in SciHadoop

On the Role of Burst Buffers in Leadership-class Storage Systems

The largest-scale high-performance (HPC) systems are stretching parallel file systems to their limits in terms of aggregate bandwidth and numbers of clients. To further sustain the scalability of these file systems, researchers and HPC storage …

SciHadoop: Array-based Query Processing in Hadoop

Hadoop has become the de facto platform for large-scale data analysis in commercial applications, and increasingly so in scientific applications. However, Hadoop's byte stream data model causes inefficiencies when used to process scientific data that …

Modeling a Leadership-scale Storage System

Exascale supercomputers will have the potential for billion-way parallelism. While physical implementations of these systems are currently not available, HPC system designers can develop models of exascale systems to evaluate system design points. …

Mixing Hadoop and HPC Workloads on Parallel Filesystems

MapReduce-tailored distributed filesystems---such as HDFS for Hadoop MapReduce---and parallel high-performance computing filesystems are tailored for considerably different workloads. The purpose of our work is to examine the performance of each …