Active Storage

Scott Brandt, Carlos Maltzahn

 

Overview

Computing systems and storage systems have traditionally been separated by the I/O subsystem interface.  With the development of network attached storage, this serial I/O interface has become a significant bottleneck to high-performance parallel and distributed computers accessing information on high-performance parallel storage. 


Modern object-based storage promises to break this paradigm.  In object-based storage, each node in a distributed storage system has a CPU, RAM, network interface, and disk(s). These nodes are capable of executing code, but no completely general model currently exists for safely and effectively executing application code directly on the nodes, although solutions such as Google's MapReduce, Hadoop, and Microsoft's Dryad get us partway there.


This class will focus on the research of the UCSC Systems Research Lab (and others) to break through the serial I/O bottleneck by executing general-purpose application code directly on the storage nodes.  We will begin with background reading in distributed computing and storage. We will then get current on active storage and existing active storage systems, Next we will examine the different technologies required to enable these capabilities, including programming models and storage system infrastructure including virtual machines, performance virtualization, and related technologies.  Finally, we will focus on specific projects enabling active object-based storage in our Ceph distributed object-based storage

system.


The class will consist of three parts:

  1. 1.Weekly readings and class discussions on papers related to the class topic.

  2. 2.An individual or group project. We will develop a number of specific project ideas as part of the class and everyone will be expected to implement one of these ideas, either individually or as part of a group.

  3. 3.A final report.  Everyone will be expected to turn in a project writeup similar to the conference papers we will be reading in class.


Note: The instructors have some limited research funding to support the research project that forms the basis for this class and are looking to hire 1-2 students to work on it. If you are interested, please let us know.


Prerequisites: you are expected to have basic operating system knowledge, such as presented in a standard undergraduate course such as CMPS 111. Furthermore, you are expected to have taken CMPS 221, Advanced Operating Systems. Others will be admitted with the instructors permission based upon demonstrated systems background and sophistication necessary for successful completion of the course.


Course Requirements

One or more articles will be assigned as reading prior to each class meeting - usually two per class. These articles should be read carefully, and a short summary of each article and a few questions or insightful comments about the material (at least 3 per paper) prepared for the following class meeting. The summary of each article consists of brief answers to the following seven questions:


   1. What is the problem the authors are trying to solve?

   2. What other approaches or solutions existed at the time that this work was done?

   3. What was wrong with the other approaches or solutions?

   4. What is the authors' approach or solution?

   5. Why is it better than the other approaches or solutions?

   6. How does it perform?

   7. Why is this work important?

   8. 3+ comments/questions


You will be required to write a report on a topic in the area of storage systems. This report should be the results of a project, original research (preferred), or a strong survey of prior art. Reporting work done for another course is not acceptable. You must choose a topic by second week of the quarter. Each student will give a final presentation on their project at the end of the quarter.


Your grade in the course is based 25% on preparedness and class participation, 25% for presentations, and 50% for your term project and report.


Attendance

Class attendance is required. This is a discussion-based seminar course and you will not pass if you routinely miss class.


Academic Honesty

All the work you turn in must be your own. If you get ideas or material from any source other than your own mind (even from conversations with others), you must cite that source. Failure to do so constitutes plagiarism and will not be tolerated - you will not pass the course.