Prediction of a hotspot pattern in keyword search results

Jie Gao, Axinia Radeva, Chuyao Shen, Shiqi Wang, Qianbo Wang, Rebecca J. Passonneau

Research output: Contribution to journalArticle

Abstract

This paper identifies and models a phenomenon observed across low-resource languages in keyword search results from speech retrieval systems where the speech recognition has high error rate, due to very limited training data. High confidence correct detections (HCCDs) of keywords are rare, yet often succeed one another closely in time. We refer to these close sequences of HCCDs as keyword hotspots. The ability to predict keyword hotspots could support speech retrieval, and provide new insights into the behavior of speech recognition systems. We treat hotspot prediction as a binary classification task on all word-sized time intervals in an audio file of a telephone conversation, using prosodic features as predictors. Rare events that follow this pattern are often modeled as a self-exciting point process (SEPP), meaning the occurrence of a rare event excites a following one. To label successive points in time as occurring within a hotspot or not, we fit a SEPP function to the distribution of HCCDs in the keyword search output. Two major learning challenges are that the size of the positive class is very small, and the training and test data have dissimilar distributions. To address these challenges, we develop a novel data selection framework that chooses training data with good generalization properties. Results exhibit superior generalization performance.

Original languageEnglish (US)
Pages (from-to)80-102
Number of pages23
JournalComputer Speech and Language
Volume48
DOIs
StatePublished - Mar 2018

Fingerprint

Keyword Search
Hot Spot
Speech recognition
Confidence
Rare Events
Prediction
Point Process
Speech Recognition
Telephone
Labels
Retrieval
Binary Classification
Error Rate
Predictors
Choose
Predict
Resources
Interval
Output
Training

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Human-Computer Interaction

Cite this

Gao, Jie ; Radeva, Axinia ; Shen, Chuyao ; Wang, Shiqi ; Wang, Qianbo ; Passonneau, Rebecca J. / Prediction of a hotspot pattern in keyword search results. In: Computer Speech and Language. 2018 ; Vol. 48. pp. 80-102.
@article{ae10e1ef589241cc8f5e3d878eee5d66,
title = "Prediction of a hotspot pattern in keyword search results",
abstract = "This paper identifies and models a phenomenon observed across low-resource languages in keyword search results from speech retrieval systems where the speech recognition has high error rate, due to very limited training data. High confidence correct detections (HCCDs) of keywords are rare, yet often succeed one another closely in time. We refer to these close sequences of HCCDs as keyword hotspots. The ability to predict keyword hotspots could support speech retrieval, and provide new insights into the behavior of speech recognition systems. We treat hotspot prediction as a binary classification task on all word-sized time intervals in an audio file of a telephone conversation, using prosodic features as predictors. Rare events that follow this pattern are often modeled as a self-exciting point process (SEPP), meaning the occurrence of a rare event excites a following one. To label successive points in time as occurring within a hotspot or not, we fit a SEPP function to the distribution of HCCDs in the keyword search output. Two major learning challenges are that the size of the positive class is very small, and the training and test data have dissimilar distributions. To address these challenges, we develop a novel data selection framework that chooses training data with good generalization properties. Results exhibit superior generalization performance.",
author = "Jie Gao and Axinia Radeva and Chuyao Shen and Shiqi Wang and Qianbo Wang and Passonneau, {Rebecca J.}",
year = "2018",
month = "3",
doi = "10.1016/j.csl.2017.10.005",
language = "English (US)",
volume = "48",
pages = "80--102",
journal = "Computer Speech and Language",
issn = "0885-2308",
publisher = "Academic Press Inc.",

}

Prediction of a hotspot pattern in keyword search results. / Gao, Jie; Radeva, Axinia; Shen, Chuyao; Wang, Shiqi; Wang, Qianbo; Passonneau, Rebecca J.

In: Computer Speech and Language, Vol. 48, 03.2018, p. 80-102.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Prediction of a hotspot pattern in keyword search results

AU - Gao, Jie

AU - Radeva, Axinia

AU - Shen, Chuyao

AU - Wang, Shiqi

AU - Wang, Qianbo

AU - Passonneau, Rebecca J.

PY - 2018/3

Y1 - 2018/3

N2 - This paper identifies and models a phenomenon observed across low-resource languages in keyword search results from speech retrieval systems where the speech recognition has high error rate, due to very limited training data. High confidence correct detections (HCCDs) of keywords are rare, yet often succeed one another closely in time. We refer to these close sequences of HCCDs as keyword hotspots. The ability to predict keyword hotspots could support speech retrieval, and provide new insights into the behavior of speech recognition systems. We treat hotspot prediction as a binary classification task on all word-sized time intervals in an audio file of a telephone conversation, using prosodic features as predictors. Rare events that follow this pattern are often modeled as a self-exciting point process (SEPP), meaning the occurrence of a rare event excites a following one. To label successive points in time as occurring within a hotspot or not, we fit a SEPP function to the distribution of HCCDs in the keyword search output. Two major learning challenges are that the size of the positive class is very small, and the training and test data have dissimilar distributions. To address these challenges, we develop a novel data selection framework that chooses training data with good generalization properties. Results exhibit superior generalization performance.

AB - This paper identifies and models a phenomenon observed across low-resource languages in keyword search results from speech retrieval systems where the speech recognition has high error rate, due to very limited training data. High confidence correct detections (HCCDs) of keywords are rare, yet often succeed one another closely in time. We refer to these close sequences of HCCDs as keyword hotspots. The ability to predict keyword hotspots could support speech retrieval, and provide new insights into the behavior of speech recognition systems. We treat hotspot prediction as a binary classification task on all word-sized time intervals in an audio file of a telephone conversation, using prosodic features as predictors. Rare events that follow this pattern are often modeled as a self-exciting point process (SEPP), meaning the occurrence of a rare event excites a following one. To label successive points in time as occurring within a hotspot or not, we fit a SEPP function to the distribution of HCCDs in the keyword search output. Two major learning challenges are that the size of the positive class is very small, and the training and test data have dissimilar distributions. To address these challenges, we develop a novel data selection framework that chooses training data with good generalization properties. Results exhibit superior generalization performance.

UR - http://www.scopus.com/inward/record.url?scp=85032953962&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85032953962&partnerID=8YFLogxK

U2 - 10.1016/j.csl.2017.10.005

DO - 10.1016/j.csl.2017.10.005

M3 - Article

AN - SCOPUS:85032953962

VL - 48

SP - 80

EP - 102

JO - Computer Speech and Language

JF - Computer Speech and Language

SN - 0885-2308

ER -