mapreduce

SupMR: Circumventing Disk and Memory Bandwidth Bottlenecks for Scale-up MapReduce

Reading input from primary storage (i.e. the ingest phase) and aggregating results (i.e. the merge phase) are important pre- and post-processing steps in large batch computations. Unfortunately, today's data sets are so large that the ingest and …

SIDR: Structure-Aware Intelligent Data Routing in Hadoop

The MapReduce framework is being extended for domains quite different from the web applications for which it was designed, including the processing of big structured data, e.g., scientific and financial data. Previous work using MapReduce to process …

Compressing Intermediate Keys between Mappers and Reducers in SciHadoop

In Hadoop mappers send data to reducers in the form of key/value pairs. The default design of Hadoop's process for transmitting this intermediate data can cause a very high overhead, especially for scientific data containing multiple variables in a …

SciHadoop Semantic Compression

Structure-Aware Intelligent Data Routing in SciHadoop

SciHadoop: Array-based Query Processing in Hadoop

Hadoop has become the de facto platform for large-scale data analysis in commercial applications, and increasingly so in scientific applications. However, Hadoop's byte stream data model causes inefficiencies when used to process scientific data that …

Haceph: Scalable Metadata Management for Hadoop using Ceph

Ceph as a Scalable Alternative to the Hadoop Distributed File System

The Hadoop Distributed File System (HDFS) has a single metadata server that sets a hard limit on its maximum size. Ceph, a high-performance distributed file system under development since 2005 and now supported in Linux, bypasses the scaling limits …

Mixing Hadoop and HPC Workloads on Parallel Filesystems

MapReduce-tailored distributed filesystems---such as HDFS for Hadoop MapReduce---and parallel high-performance computing filesystems are tailored for considerably different workloads. The purpose of our work is to examine the performance of each …