SHF: Small: NPU-based Architecture for Accelerating Deep Learning on Mobile Devices

Project: Research project

Project Details


The rapid progress of deep-learning techniques has enabled many emerging artificial intelligence applications (e.g., augmented reality), and there is a tremendous demand for running these applications on mobile devices. However, deep-learning models are by nature computationally intensive, making them challenging to deploy on battery-powered mobile devices. This project systematically investigates the fundamental and challenging issues for running deep-learning applications on mobile devices by designing a mobile architecture based on Neural Processing Units (NPUs). An NPU is a microprocessor that specializes in the acceleration of deep-learning algorithms; however, it incurs accuracy loss, and it is a challenge to address this problem. This research identifies some special characteristics of running deep-learning models on NPUs and leverages such findings to design novel techniques to maximize accuracy or minimize processing time based on the application requirements. As deep learning has been successfully applied to various problems in people's daily lives, this project has great potential to benefit society by improving the performance, the energy efficiency, and the quality of running deep-learning applications on mobile devices. This project is also contributing to society through developing new curricula, disseminating research for education and training, engaging under-represented students in research, and outreaching to high-school students.

The primary goal of this project is to design an NPU-based architecture for accelerating deep learning that can address the accuracy-loss problem of NPUs as well as the energy and performance limitations of current mobile architectures. The project consists of three tasks: (1) investigating model-partitioning techniques to decompose the deep-learning model into different layers running on heterogeneous processors to minimize processing time or maximize accuracy based on the application requirements; (2) designing energy-and thermal-aware architectures to address the performance limitations of the current mobile architecture, by exploring techniques to decompose the computation between heterogeneous processors to avoid overheating; (3) exploring the collaborative intelligence among edge/servers, hardware accelerators, and NPU-based architectures to optimize performance, by investigating how and where to run the computation based on the confidence level of executing deep learning models on an NPU.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Effective start/end date10/1/219/30/24


  • National Science Foundation: $500,000.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.