The last decade has witnessed a proliferation of heterogeneity across diverse application domains spanning from high-end datacenters to low-cost embedded systems, because they are capable of better performance and energy efficiency compared to homogeneous multicore architectures. These systems typically include a subset of CPUs, GPUs, FPGAs and ASICs as compute engines and hence, present unique programming/resource management challenges. However, the lack of required compiler and runtime support, present a barrier to the widespread adoption of heterogeneous systems. Furthermore, the design of the underlying heterogeneous architecture in terms of number and placement of various compute engines, memory subsystems and interconnects for a given area/power budget to satiate various application demands, is not fully explored. Therefore, it is imperative to investigate the entire system stack in a cohesive manner spanning applications, system software and underlying hardware for providing the required support for efficient application executions. Thus, the main goal of this research project is to enable dynamic mapping of an application to different computing engines for improving performance/power efficiency and system utilization. The outcomes of this project are poised to change the way the programmers and users perceive heterogeneity and interact with it. The research on heterogeneous computing will be integrated with the educational activities and student training at Penn State for nurturing the future workforce in science and engineering, with active participation of female graduate students and undergraduates (Honors) students.
The project consists four tasks. Task-I aims at conducting a profile-based workload characterization for various application domains including deep learning, cloud computing and high-performance computing on diverse hardware platforms to understand their performance/power utility. This will be used to develop a machine-learning (ML) based model for initial assignment of tasks to different compute engines. Task-II is aimed at exploring compiler/programming support to transform application code into suitable device-agnostic 'codelets', that serve as the granularity for seamless scheduling and execution across different hardware units. Task-III investigates runtime support to optimally schedule and seamlessly move the codelets across the hardware units for improving system performance. Finally, Task-IV explores design of heterogeneous platforms by analyzing issues such as degree of heterogeneity, placement and integration of various computing engines on a chip and across chips, the underlying communication support.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|Effective start/end date||6/1/18 → 5/31/22|
- National Science Foundation: $1,000,000.00