Visual co-occurrence network

Using context for large-scale object recognition in retail

Siddharth Advani, Brigid Smith, Yasuki Tanabe, Kevin Irick, Matthew Cotter, John Morgan Sampson, Vijaykrishnan Narayanan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

In any visual object recognition system, the classification accuracy will likely determine the usefulness of the system as a whole. In many real-world applications, it is also important to be able to recognize a large number of diverse objects for the system to be robust enough to handle the sort of tasks that the human visual system handles on an average day. These objectives are often at odds with performance, as running too large of a number of detectors on any one scene will be prohibitively slow for use in any real-time scenario. However, visual information has temporal and spatial context that can be exploited to reduce the number of detectors that need to be triggered at any given instance. In this paper, we propose a dynamic approach to encode such context, called Visual Co-occurrence Network (ViCoNet) that establishes relationships between objects observed in a visual scene. We investigate the utility of ViCoNet when integrated into a vision pipeline targeted for retail shopping. When evaluated on a large and deep dataset, we achieve a 50% improvement in performance and a 7% improvement in accuracy in the best case, and a 45% improvement in performance and a 3% improvement in accuracy in the average case over an established baseline. The memory overhead of ViCoNet is around 10KB, highlighting its effectiveness on temporal big data.

Original languageEnglish (US)
Title of host publicationESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781467381642
DOIs
StatePublished - Dec 9 2015
Event13th IEEE Symposium on Embedded Systems for Real-Time Multimedia, ESTIMedia 2015 - Amsterdam, Netherlands
Duration: Oct 8 2015Oct 9 2015

Publication series

NameESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia

Other

Other13th IEEE Symposium on Embedded Systems for Real-Time Multimedia, ESTIMedia 2015
CountryNetherlands
CityAmsterdam
Period10/8/1510/9/15

Fingerprint

Object recognition
Detectors
Pipelines
Data storage equipment
Big data

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Media Technology

Cite this

Advani, S., Smith, B., Tanabe, Y., Irick, K., Cotter, M., Sampson, J. M., & Narayanan, V. (2015). Visual co-occurrence network: Using context for large-scale object recognition in retail. In ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia [7351774] (ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ESTIMedia.2015.7351774
Advani, Siddharth ; Smith, Brigid ; Tanabe, Yasuki ; Irick, Kevin ; Cotter, Matthew ; Sampson, John Morgan ; Narayanan, Vijaykrishnan. / Visual co-occurrence network : Using context for large-scale object recognition in retail. ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia. Institute of Electrical and Electronics Engineers Inc., 2015. (ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia).
@inproceedings{5b3877d788c848a7963043bf2a3b3ba8,
title = "Visual co-occurrence network: Using context for large-scale object recognition in retail",
abstract = "In any visual object recognition system, the classification accuracy will likely determine the usefulness of the system as a whole. In many real-world applications, it is also important to be able to recognize a large number of diverse objects for the system to be robust enough to handle the sort of tasks that the human visual system handles on an average day. These objectives are often at odds with performance, as running too large of a number of detectors on any one scene will be prohibitively slow for use in any real-time scenario. However, visual information has temporal and spatial context that can be exploited to reduce the number of detectors that need to be triggered at any given instance. In this paper, we propose a dynamic approach to encode such context, called Visual Co-occurrence Network (ViCoNet) that establishes relationships between objects observed in a visual scene. We investigate the utility of ViCoNet when integrated into a vision pipeline targeted for retail shopping. When evaluated on a large and deep dataset, we achieve a 50{\%} improvement in performance and a 7{\%} improvement in accuracy in the best case, and a 45{\%} improvement in performance and a 3{\%} improvement in accuracy in the average case over an established baseline. The memory overhead of ViCoNet is around 10KB, highlighting its effectiveness on temporal big data.",
author = "Siddharth Advani and Brigid Smith and Yasuki Tanabe and Kevin Irick and Matthew Cotter and Sampson, {John Morgan} and Vijaykrishnan Narayanan",
year = "2015",
month = "12",
day = "9",
doi = "10.1109/ESTIMedia.2015.7351774",
language = "English (US)",
series = "ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia",
address = "United States",

}

Advani, S, Smith, B, Tanabe, Y, Irick, K, Cotter, M, Sampson, JM & Narayanan, V 2015, Visual co-occurrence network: Using context for large-scale object recognition in retail. in ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia., 7351774, ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia, Institute of Electrical and Electronics Engineers Inc., 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia, ESTIMedia 2015, Amsterdam, Netherlands, 10/8/15. https://doi.org/10.1109/ESTIMedia.2015.7351774

Visual co-occurrence network : Using context for large-scale object recognition in retail. / Advani, Siddharth; Smith, Brigid; Tanabe, Yasuki; Irick, Kevin; Cotter, Matthew; Sampson, John Morgan; Narayanan, Vijaykrishnan.

ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia. Institute of Electrical and Electronics Engineers Inc., 2015. 7351774 (ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Visual co-occurrence network

T2 - Using context for large-scale object recognition in retail

AU - Advani, Siddharth

AU - Smith, Brigid

AU - Tanabe, Yasuki

AU - Irick, Kevin

AU - Cotter, Matthew

AU - Sampson, John Morgan

AU - Narayanan, Vijaykrishnan

PY - 2015/12/9

Y1 - 2015/12/9

N2 - In any visual object recognition system, the classification accuracy will likely determine the usefulness of the system as a whole. In many real-world applications, it is also important to be able to recognize a large number of diverse objects for the system to be robust enough to handle the sort of tasks that the human visual system handles on an average day. These objectives are often at odds with performance, as running too large of a number of detectors on any one scene will be prohibitively slow for use in any real-time scenario. However, visual information has temporal and spatial context that can be exploited to reduce the number of detectors that need to be triggered at any given instance. In this paper, we propose a dynamic approach to encode such context, called Visual Co-occurrence Network (ViCoNet) that establishes relationships between objects observed in a visual scene. We investigate the utility of ViCoNet when integrated into a vision pipeline targeted for retail shopping. When evaluated on a large and deep dataset, we achieve a 50% improvement in performance and a 7% improvement in accuracy in the best case, and a 45% improvement in performance and a 3% improvement in accuracy in the average case over an established baseline. The memory overhead of ViCoNet is around 10KB, highlighting its effectiveness on temporal big data.

AB - In any visual object recognition system, the classification accuracy will likely determine the usefulness of the system as a whole. In many real-world applications, it is also important to be able to recognize a large number of diverse objects for the system to be robust enough to handle the sort of tasks that the human visual system handles on an average day. These objectives are often at odds with performance, as running too large of a number of detectors on any one scene will be prohibitively slow for use in any real-time scenario. However, visual information has temporal and spatial context that can be exploited to reduce the number of detectors that need to be triggered at any given instance. In this paper, we propose a dynamic approach to encode such context, called Visual Co-occurrence Network (ViCoNet) that establishes relationships between objects observed in a visual scene. We investigate the utility of ViCoNet when integrated into a vision pipeline targeted for retail shopping. When evaluated on a large and deep dataset, we achieve a 50% improvement in performance and a 7% improvement in accuracy in the best case, and a 45% improvement in performance and a 3% improvement in accuracy in the average case over an established baseline. The memory overhead of ViCoNet is around 10KB, highlighting its effectiveness on temporal big data.

UR - http://www.scopus.com/inward/record.url?scp=84962295258&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962295258&partnerID=8YFLogxK

U2 - 10.1109/ESTIMedia.2015.7351774

DO - 10.1109/ESTIMedia.2015.7351774

M3 - Conference contribution

T3 - ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia

BT - ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Advani S, Smith B, Tanabe Y, Irick K, Cotter M, Sampson JM et al. Visual co-occurrence network: Using context for large-scale object recognition in retail. In ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia. Institute of Electrical and Electronics Engineers Inc. 2015. 7351774. (ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia). https://doi.org/10.1109/ESTIMedia.2015.7351774