Noise-robust speech recognition through auditory feature detection and spike sequence decoding

Phillip B. Schafer, Dezhe Jin

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans andmachines.We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences-one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognitionmethods.Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.

Original languageEnglish (US)
Pages (from-to)523-556
Number of pages34
JournalNeural computation
Volume26
Issue number3
DOIs
StatePublished - Feb 11 2014

Fingerprint

Noise
Neurons
Cochlear Nerve
Computer Systems
Signal-To-Noise Ratio
Neurosciences
Acoustics
Population
Action Potentials
Decoding
Hearing
Speech Recognition
Databases
Brain
Template
Automatic Speech Recognition
Neuron

All Science Journal Classification (ASJC) codes

  • Arts and Humanities (miscellaneous)
  • Cognitive Neuroscience

Cite this

@article{020b1794451f4e29b6e6a036b85bfa04,
title = "Noise-robust speech recognition through auditory feature detection and spike sequence decoding",
abstract = "Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans andmachines.We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences-one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognitionmethods.Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.",
author = "Schafer, {Phillip B.} and Dezhe Jin",
year = "2014",
month = "2",
day = "11",
doi = "10.1162/NECO_a_00557",
language = "English (US)",
volume = "26",
pages = "523--556",
journal = "Neural Computation",
issn = "0899-7667",
publisher = "MIT Press Journals",
number = "3",

}

Noise-robust speech recognition through auditory feature detection and spike sequence decoding. / Schafer, Phillip B.; Jin, Dezhe.

In: Neural computation, Vol. 26, No. 3, 11.02.2014, p. 523-556.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Noise-robust speech recognition through auditory feature detection and spike sequence decoding

AU - Schafer, Phillip B.

AU - Jin, Dezhe

PY - 2014/2/11

Y1 - 2014/2/11

N2 - Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans andmachines.We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences-one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognitionmethods.Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.

AB - Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans andmachines.We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences-one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognitionmethods.Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.

UR - http://www.scopus.com/inward/record.url?scp=84893461274&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893461274&partnerID=8YFLogxK

U2 - 10.1162/NECO_a_00557

DO - 10.1162/NECO_a_00557

M3 - Article

C2 - 24320849

AN - SCOPUS:84893461274

VL - 26

SP - 523

EP - 556

JO - Neural Computation

JF - Neural Computation

SN - 0899-7667

IS - 3

ER -