datamanagement

Towards Physical Design Management in Storage Systems

In the post-Moore era, systems and devices with new architectures will arrive at a rapid rate with significant impacts on the software stack. Applications will not be able to fully benefit from new architectures unless they can delegate adapting to …

Cudele: An API and Framework for Programmable Consistency and Durability in a Global Namespace

HPC and data center scale application developers are abandoning POSIX IO because file system metadata synchronization and serialization overheads of providing strong consistency and durability are too costly -- and often unnecessary -- for their …

Efficient, Failure Resilient Transactions for Parallel and Distributed Computing

Scientific simulations are moving away from using centralized persistent storage for intermediate data between workflow steps towards an all online model. This shift is motivated by the relatively slow IO bandwidth growth compared with compute speed …

Consistency and Fault Tolerance Considerations for the Next Iteration of the DOE Fast Forward Storage and IO Project

The DOE Extreme-Scale Technology Acceleration Fast Forward Storage and IO Stack project is going to have significant impact on storage systems design within and beyond the HPC community. With phase 1 of the project complete, it is an excellent …

Efficient Transactions for Parallel Data Movement

The rise of Integrated Application Workflows (IAWs) for processing data prior to storage on persistent media prompts the need to incorporate features that reproduce many of the semantics of persistent storage devices. One such feature is the ability …

Exploring Trade-offs in Transactional Parallel Data Movement

SIDR: Structure-Aware Intelligent Data Routing in Hadoop

The MapReduce framework is being extended for domains quite different from the web applications for which it was designed, including the processing of big structured data, e.g., scientific and financial data. Previous work using MapReduce to process …

DataMods: Programmable File System Services

As applications become more complex, and the level of concurrency in systems continue to rise, developers are struggling to scale complex data models on top of a traditional byte stream interface. Middleware tailored for specific data models is a …

SciHadoop Semantic Compression

DataMods: Programmable File System Services

Cloud-based services have become an attractive alternative to in-house data centers because of their flexible, on-demand availability of compute and storage resources. This is also true for scientific high-performance computing (HPC) applications …