A scalable architecture for multi-class visual object detection

Siddharth Advani, Yasuki Tanabe, Kevin Irick, John Morgan Sampson, Vijaykrishnan Narayanan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

As high-fidelity small form-factor cameras become increasingly available and affordable, there will be a subsequent growth and emergence of vision-based applications that take advantage of this increase in visual information. The key challenge is for the embedded systems, on which the bulk of these applications will be deployed, to maintain real-time performance in the midst of the exponential increase in spatial and temporal visual data. For example, a useful vision-based driver assistance system needs to locate and identify critical objects such as pedestrians, other vehicles, pot-holes, animals, and street signs with latency small enough to allow a human driver to react accordingly. In this work, we propose a digital accelerator architecture for a high-throughput, robust, scalable, and tunable visual object detection pipeline based on Histogram of Oriented Gradients (HOG) features. From a systems perspective, efficacy can be measured in terms of speed, accuracy, energy efficiency and scalability in performing such visual tasks. Since each application dictates the criticality of any one of these dimensions, our proposed architecture exposes design-time parameters that can take advantage of domain-specific knowledge while supporting tune-ability through run-time configurations. To evaluate the effectiveness of our vision accelerator we map the architecture to a modern FPGA and demonstrate full HD video processing at 30 fps (frames per second) operating at a conservative 100 MHz clock. Evaluations on a single object class show throughput improvements of 2× and 5× over GPU and multi-threaded CPU implementations respectively. Further more we provide a pathway for enhanced scalability for the many-class problem and achieve over 20× improvement over an equivalent CPU implementation for 5 object classes.

Original languageEnglish (US)
Title of host publication25th International Conference on Field Programmable Logic and Applications, FPL 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9780993428005
DOIs
StatePublished - Oct 7 2015
Event25th International Conference on Field Programmable Logic and Applications, FPL 2015 - London, United Kingdom
Duration: Sep 2 2015Sep 4 2015

Publication series

Name25th International Conference on Field Programmable Logic and Applications, FPL 2015

Other

Other25th International Conference on Field Programmable Logic and Applications, FPL 2015
CountryUnited Kingdom
CityLondon
Period9/2/159/4/15

Fingerprint

Particle accelerators
Program processors
Scalability
Throughput
Embedded systems
Energy efficiency
Field programmable gate arrays (FPGA)
Clocks
Animals
Pipelines
Cameras
Processing
Object detection
Graphics processing unit

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Signal Processing
  • Software
  • Computer Science Applications

Cite this

Advani, S., Tanabe, Y., Irick, K., Sampson, J. M., & Narayanan, V. (2015). A scalable architecture for multi-class visual object detection. In 25th International Conference on Field Programmable Logic and Applications, FPL 2015 [7293961] (25th International Conference on Field Programmable Logic and Applications, FPL 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/FPL.2015.7293961
Advani, Siddharth ; Tanabe, Yasuki ; Irick, Kevin ; Sampson, John Morgan ; Narayanan, Vijaykrishnan. / A scalable architecture for multi-class visual object detection. 25th International Conference on Field Programmable Logic and Applications, FPL 2015. Institute of Electrical and Electronics Engineers Inc., 2015. (25th International Conference on Field Programmable Logic and Applications, FPL 2015).
@inproceedings{f5236eba92ba4a948d2bbd957409c3d4,
title = "A scalable architecture for multi-class visual object detection",
abstract = "As high-fidelity small form-factor cameras become increasingly available and affordable, there will be a subsequent growth and emergence of vision-based applications that take advantage of this increase in visual information. The key challenge is for the embedded systems, on which the bulk of these applications will be deployed, to maintain real-time performance in the midst of the exponential increase in spatial and temporal visual data. For example, a useful vision-based driver assistance system needs to locate and identify critical objects such as pedestrians, other vehicles, pot-holes, animals, and street signs with latency small enough to allow a human driver to react accordingly. In this work, we propose a digital accelerator architecture for a high-throughput, robust, scalable, and tunable visual object detection pipeline based on Histogram of Oriented Gradients (HOG) features. From a systems perspective, efficacy can be measured in terms of speed, accuracy, energy efficiency and scalability in performing such visual tasks. Since each application dictates the criticality of any one of these dimensions, our proposed architecture exposes design-time parameters that can take advantage of domain-specific knowledge while supporting tune-ability through run-time configurations. To evaluate the effectiveness of our vision accelerator we map the architecture to a modern FPGA and demonstrate full HD video processing at 30 fps (frames per second) operating at a conservative 100 MHz clock. Evaluations on a single object class show throughput improvements of 2× and 5× over GPU and multi-threaded CPU implementations respectively. Further more we provide a pathway for enhanced scalability for the many-class problem and achieve over 20× improvement over an equivalent CPU implementation for 5 object classes.",
author = "Siddharth Advani and Yasuki Tanabe and Kevin Irick and Sampson, {John Morgan} and Vijaykrishnan Narayanan",
year = "2015",
month = "10",
day = "7",
doi = "10.1109/FPL.2015.7293961",
language = "English (US)",
series = "25th International Conference on Field Programmable Logic and Applications, FPL 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "25th International Conference on Field Programmable Logic and Applications, FPL 2015",
address = "United States",

}

Advani, S, Tanabe, Y, Irick, K, Sampson, JM & Narayanan, V 2015, A scalable architecture for multi-class visual object detection. in 25th International Conference on Field Programmable Logic and Applications, FPL 2015., 7293961, 25th International Conference on Field Programmable Logic and Applications, FPL 2015, Institute of Electrical and Electronics Engineers Inc., 25th International Conference on Field Programmable Logic and Applications, FPL 2015, London, United Kingdom, 9/2/15. https://doi.org/10.1109/FPL.2015.7293961

A scalable architecture for multi-class visual object detection. / Advani, Siddharth; Tanabe, Yasuki; Irick, Kevin; Sampson, John Morgan; Narayanan, Vijaykrishnan.

25th International Conference on Field Programmable Logic and Applications, FPL 2015. Institute of Electrical and Electronics Engineers Inc., 2015. 7293961 (25th International Conference on Field Programmable Logic and Applications, FPL 2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A scalable architecture for multi-class visual object detection

AU - Advani, Siddharth

AU - Tanabe, Yasuki

AU - Irick, Kevin

AU - Sampson, John Morgan

AU - Narayanan, Vijaykrishnan

PY - 2015/10/7

Y1 - 2015/10/7

N2 - As high-fidelity small form-factor cameras become increasingly available and affordable, there will be a subsequent growth and emergence of vision-based applications that take advantage of this increase in visual information. The key challenge is for the embedded systems, on which the bulk of these applications will be deployed, to maintain real-time performance in the midst of the exponential increase in spatial and temporal visual data. For example, a useful vision-based driver assistance system needs to locate and identify critical objects such as pedestrians, other vehicles, pot-holes, animals, and street signs with latency small enough to allow a human driver to react accordingly. In this work, we propose a digital accelerator architecture for a high-throughput, robust, scalable, and tunable visual object detection pipeline based on Histogram of Oriented Gradients (HOG) features. From a systems perspective, efficacy can be measured in terms of speed, accuracy, energy efficiency and scalability in performing such visual tasks. Since each application dictates the criticality of any one of these dimensions, our proposed architecture exposes design-time parameters that can take advantage of domain-specific knowledge while supporting tune-ability through run-time configurations. To evaluate the effectiveness of our vision accelerator we map the architecture to a modern FPGA and demonstrate full HD video processing at 30 fps (frames per second) operating at a conservative 100 MHz clock. Evaluations on a single object class show throughput improvements of 2× and 5× over GPU and multi-threaded CPU implementations respectively. Further more we provide a pathway for enhanced scalability for the many-class problem and achieve over 20× improvement over an equivalent CPU implementation for 5 object classes.

AB - As high-fidelity small form-factor cameras become increasingly available and affordable, there will be a subsequent growth and emergence of vision-based applications that take advantage of this increase in visual information. The key challenge is for the embedded systems, on which the bulk of these applications will be deployed, to maintain real-time performance in the midst of the exponential increase in spatial and temporal visual data. For example, a useful vision-based driver assistance system needs to locate and identify critical objects such as pedestrians, other vehicles, pot-holes, animals, and street signs with latency small enough to allow a human driver to react accordingly. In this work, we propose a digital accelerator architecture for a high-throughput, robust, scalable, and tunable visual object detection pipeline based on Histogram of Oriented Gradients (HOG) features. From a systems perspective, efficacy can be measured in terms of speed, accuracy, energy efficiency and scalability in performing such visual tasks. Since each application dictates the criticality of any one of these dimensions, our proposed architecture exposes design-time parameters that can take advantage of domain-specific knowledge while supporting tune-ability through run-time configurations. To evaluate the effectiveness of our vision accelerator we map the architecture to a modern FPGA and demonstrate full HD video processing at 30 fps (frames per second) operating at a conservative 100 MHz clock. Evaluations on a single object class show throughput improvements of 2× and 5× over GPU and multi-threaded CPU implementations respectively. Further more we provide a pathway for enhanced scalability for the many-class problem and achieve over 20× improvement over an equivalent CPU implementation for 5 object classes.

UR - http://www.scopus.com/inward/record.url?scp=84962393676&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962393676&partnerID=8YFLogxK

U2 - 10.1109/FPL.2015.7293961

DO - 10.1109/FPL.2015.7293961

M3 - Conference contribution

AN - SCOPUS:84962393676

T3 - 25th International Conference on Field Programmable Logic and Applications, FPL 2015

BT - 25th International Conference on Field Programmable Logic and Applications, FPL 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Advani S, Tanabe Y, Irick K, Sampson JM, Narayanan V. A scalable architecture for multi-class visual object detection. In 25th International Conference on Field Programmable Logic and Applications, FPL 2015. Institute of Electrical and Electronics Engineers Inc. 2015. 7293961. (25th International Conference on Field Programmable Logic and Applications, FPL 2015). https://doi.org/10.1109/FPL.2015.7293961