Mixing Hadoop and HPC Workloads on Parallel Filesystems

Abstract

MapReduce-tailored distributed filesystems—such as HDFS for Hadoop MapReduce—and parallel high-performance computing filesystems are tailored for considerably different workloads. The purpose of our work is to examine the performance of each filesystem when both sorts of workload run on it concurrently. We examine two workloads on two filesystems. For the HPC workload, we use the IOR checkpointing benchmark and the Parallel Virtual File System, Version 2 (PVFS); for Hadoop, we use an HTTP attack classifier and the CloudStore filesystem. We analyze the performance of each file system when it concurrently runs its ``native’’ workload as well as the non-native workload.

Publication
Proceedings of the 2009 ACM Petascale Data Storage Workshop (PDSW 09)