Research Statement September, 2009

Research Statement

September, 2009

My research involves solving hard problems in computer systems. I believe in—and have successfully applied—three principles: First, I look for solutions in multiple disciplines. Each discipline offers its own language and concepts. The exercise of translating and applying “foreign” concepts to a problem allows me to view the problem from different perspectives and has lead me to adapt solutions from other disciplines in novel ways. Particularly in the publications [8, 11, 15] I combined approaches from linguistics, cognitive science, software engineering, machine learning, networking, and storage systems. Second, whenever possible I use real workloads in addition to synthetic workloads for systems performance evaluations. This principle turned out to be particularly valuable in large, networked systems where real workloads frequently uncover dynamics that have important implications on systems design (see [13] and [14]). Third, I build functioning prototypes of systems. Especially if the prototype consists of subsystems that have been only evaluated by simulations, a functioning prototype frequently uncovers gaps and problems that often are of fundamental interest (see [11, 20]). The following gives a more detailed overview over my research by main topic in chronological order.

Formalized Sharing Processes: Taming Large Software Projects (1992-1997)

Managing large software projects and maintaining predictable schedules is hard [5]. Project management techniques used for, say, building bridges do not work for building software. A particularly challenging aspect of software projects is communication and coordination among team members. I’m interested in the research question whether and how computer systems can support communication and coordination in this context. (More ...)

Performance Management in Distributed Systems:
Workload Transformation and System Insulation (1997-present)

Web proxies are intermediaries between Web clients and servers in which requests from Web clients are forwarded to Web servers and replies are not only returned to the client but also cached at the intermediary in the hope that future requests can be satisfied from the intermediary without having to contact Web servers. The two main uses of Web proxies are (1) to reduce bandwidth and latency by reducing the number of requests over expensive or slow links, and (2) to reduce the load on Web servers. Caching Web proxies are surprisingly hard to get right: the main challenges are the interaction of the caching subsystem with the underlying storage, the management of a large number of open network connections, and isolation from a variety of unreliable network services. (More ...)

Petabyte-scale Storage Systems:
Generative Allocation and Intelligent Devices (2005-present)

Storage systems have to keep track of where data is stored. Traditional file systems are using allocation tables to look up the location of a file. In very large scale, storage clusters managing this bookkeeping information becomes the bottleneck for systems performance and scalability. Generative allocation calculates the location of data instead of storing and managing large allocation tables. This allows much more efficient communication of system state and enables novel distributed architectures where, for example, load balancing and failure recovery are the result of collaborating, intelligent storage devices, instead of a centralized system component. (More ...)

Scalable File System Interfaces: Overcoming a 20-year-old Legacy (2007-present)

Storage systems design is currently in a state of crisis, as the exponential increase in storage needs rapidly outgrows the system interfaces that were standardized over 20 years ago. Some of today’s large file systems store 10-100s of petabytes of enterprise-related or scientific data. These file systems contain many thousands of files per single directory or have deep directory trees with large numbers of excessively long paths. Almost all Petabyte-scale file systems use the POSIX file system interface, a standard from 1988 that emerged from a project that began in 1985. This standard was designed in the context of file systems that were orders of magnitude smaller in terms of total size, number of files per directory, and size of directory trees. In particular POSIX does not offer any query mechanism other than accessing the list of a given directory. (More ...)

REFERENCES