Ongoing Research Projects

My research is concerned with a broad range of computer system and architecture, electronic design automation, and VLSI design. It is driven by both emerging device/circuit technologies and modern applications. On the technology side, current computer hardware and hardware/software interface have been well shaped by traditional technologies, e.g., CMOS, DRAM, and disks. New technologies are emerging with unique properties that are likely to create an inflection point, opening doors for novel energy-efficient architectures by redesigning current hardware/software interface to realize their full potential. On the application side, the design of computer hardware and hardware/software interface must be grounded on an understanding of application’s needs. Here is a list of current research projects in my group.

Persistent memory system design

Modern persistent memory (enabled by NVRAM) is envisioned as one of the next new data storage components to be adopted in data centers. It unifies memory and storage systems, offering the fast access of memory and data recoverability of storage in a single device. Seeing the great value, both system software and hardware suppliers have recently begun to adopt persistent memory in their next-generation designs. Though promising, this technical transition fundamentally changes current memory and storage system design assumptions and introduce critical design challenges. Our goal in this research is to enhance the performance, energy efficiency, and reliability of memory and storage systems ranging from data centers to end-user devices by fully exploiting the persistent memory technique. Our approach includes persistent-memory-aware fault tolerance support, logging acceleration, persistent caching in system libraries and userspace, NV-DIMM system profiling and characterization, and load balancing techniques among critical system resources.

HPCA 2018 DAC 2017 USENIX CoolDC 2016 MICRO 2015, MICRO 2014, MICRO 2013, WEED 2013

Smart memory system design for data-intensive applications

The goal of this research is to enable high-performance, energy-efficient, and flexible smart memory system design, which is adaptive to the irregular, diverse, and changing behaviors among data-intensive applications. We propose a heterogeneous system design incorporated with fixed-function accelerators and programmable processing units. Our design combines novel system architecture, programming models, and runtime systems. We explore a hierarchical programming environment, consisting of low-level APIs and libraries to enable direct control of the smart memory, a tailored programming model to enable portability, and a high-level directive- based programming model and runtime that improve programmer productivity and application performance.

MICRO 2016 MEMSYS 2016, HUCAA 2016

Architecture support for general-purpose applications on GPUs and CPU/GPU heterogeneous systems

Recently, accelerating general-purpose applications with GPU draws wide interests. In this project, we explore hardware design and hardware/software interface that continue to exploit the energy-efficiency of specialization while broadening the range of applicable applications. We have designed effective GPU architectures and data management techniques to optimize system energy efficiency and memory bandwidth, by exploiting the memory access patterns of general-purpose applications. Our on-going research investigates architectural support for (1) reducing the performance constraints posed by specialization (e.g., thread synchronization overheads of GPUs), (2) increasing the programmability of special-purpose accelerators, by developing software interfaces that can enable efficient mapping of generalpurpose applications to special-purpose hardware, without the need of reprogramming applications for each different type of accelerators.

TACO 2013, ICCAD 2012, ISLPED 2012

Past Research Projects

Cost Model for new technology based design and fabrication

Cost is always an important factor that influence the adoption of new technologies. We analyze and model the cost of designing and fabricating circuits and systems developed by employing new technologies. With the comprehensive cost models, we intend to help hardware designers choose the most cost effective design strategy at the early stages of the design flow. Our cost analysis among various design options can demonstrate that by properly configuring processor organization, new technologies are able to reduce the fabrication cost compared to their traditional technology counterparts.

TCAD 2010, DAC 2010

Memory Hierarchy Design with Hybrid Memory Technology

In the area of hybrid memory designs, our studies are concerned with helping architects determine the best system organization when multiple choices of memory technologies exist. In particular, we proposed a bandwidth-aware reconfigurable cache architecture design, consisting of a hybrid cache hierarchy, a reconfiguration mechanism, and a statistical prediction engine. Our design dynamically adapts the cache capacity of each level based on the predicted bandwidth demands of different applications. We also developed an analytical performance model that estimates the performance of a throughput computing working running on various memory hierarchy configurations.

ICCAD 2011, ISCA 2011

Energy efficient 3D CMP Design

3D integration technology allows us to integrate two or more dies of a chip multiprocessor (CMP) vertically. Compared to traditional 2D CMPs, 3D CMPs promise reduced circuit delay, high memory bandwidth, and condensed form factor. But the continuous increase of power and energy budgets for CMP designs potentially brings in critical system design problems, such as power supply rail design, system reliability, and thermal issue. We address the high energy consumption and thermal issues when 3D stacking technology is used in developing CMPs. We propose to reduce the energy consumption of 3D-stacked CMPs by both temporally and spatially finegrained tuning the supply voltage and frequency of of processor cores and caches. Our tuning technique is implemented by integrating an array of onchip voltage regulators into the original processor.

DATE 2011