Visual speech segmentation: Using facial cues to locate word boundaries in continuous speech

Aaron D. Mitchel, Daniel J. Weiss

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Speech is typically a multimodal phenomenon, yet few studies have focused on the exclusive contributions of visual cues to language acquisition. To address this gap, we investigated whether visual prosodic information can facilitate speech segmentation. Previous research has demonstrated that language learners can use lexical stress and pitch cues to segment speech and that learners can extract this information from talking faces. Thus, we created an artificial speech stream that contained minimal segmentation cues and paired it with two synchronous facial displays in which visual prosody was either informative or uninformative for identifying word boundaries. Across three familiarisation conditions (audio stream alone, facial streams alone, and paired audiovisual), learning occurred only when the facial displays were informative to word boundaries, suggesting that facial cues can help learners solve the early challenges of language acquisition.

Original languageEnglish (US)
Pages (from-to)771-780
Number of pages10
JournalLanguage, Cognition and Neuroscience
Volume29
Issue number7
DOIs
StatePublished - May 3 2013

Fingerprint

Cues
Language
language acquisition
Learning
segmentation
Speech Segmentation
Continuous Speech
language
Research
learning
Language Acquisition
Prosody
Lexical Stress
Artificial
Segmentation

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Experimental and Cognitive Psychology
  • Linguistics and Language
  • Cognitive Neuroscience

Cite this

@article{256e963745154182aaace85e909b92f5,
title = "Visual speech segmentation: Using facial cues to locate word boundaries in continuous speech",
abstract = "Speech is typically a multimodal phenomenon, yet few studies have focused on the exclusive contributions of visual cues to language acquisition. To address this gap, we investigated whether visual prosodic information can facilitate speech segmentation. Previous research has demonstrated that language learners can use lexical stress and pitch cues to segment speech and that learners can extract this information from talking faces. Thus, we created an artificial speech stream that contained minimal segmentation cues and paired it with two synchronous facial displays in which visual prosody was either informative or uninformative for identifying word boundaries. Across three familiarisation conditions (audio stream alone, facial streams alone, and paired audiovisual), learning occurred only when the facial displays were informative to word boundaries, suggesting that facial cues can help learners solve the early challenges of language acquisition.",
author = "Mitchel, {Aaron D.} and Weiss, {Daniel J.}",
year = "2013",
month = "5",
day = "3",
doi = "10.1080/01690965.2013.791703",
language = "English (US)",
volume = "29",
pages = "771--780",
journal = "Language, Cognition and Neuroscience",
issn = "2327-3798",
publisher = "Taylor and Francis",
number = "7",

}

Visual speech segmentation : Using facial cues to locate word boundaries in continuous speech. / Mitchel, Aaron D.; Weiss, Daniel J.

In: Language, Cognition and Neuroscience, Vol. 29, No. 7, 03.05.2013, p. 771-780.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Visual speech segmentation

T2 - Using facial cues to locate word boundaries in continuous speech

AU - Mitchel, Aaron D.

AU - Weiss, Daniel J.

PY - 2013/5/3

Y1 - 2013/5/3

N2 - Speech is typically a multimodal phenomenon, yet few studies have focused on the exclusive contributions of visual cues to language acquisition. To address this gap, we investigated whether visual prosodic information can facilitate speech segmentation. Previous research has demonstrated that language learners can use lexical stress and pitch cues to segment speech and that learners can extract this information from talking faces. Thus, we created an artificial speech stream that contained minimal segmentation cues and paired it with two synchronous facial displays in which visual prosody was either informative or uninformative for identifying word boundaries. Across three familiarisation conditions (audio stream alone, facial streams alone, and paired audiovisual), learning occurred only when the facial displays were informative to word boundaries, suggesting that facial cues can help learners solve the early challenges of language acquisition.

AB - Speech is typically a multimodal phenomenon, yet few studies have focused on the exclusive contributions of visual cues to language acquisition. To address this gap, we investigated whether visual prosodic information can facilitate speech segmentation. Previous research has demonstrated that language learners can use lexical stress and pitch cues to segment speech and that learners can extract this information from talking faces. Thus, we created an artificial speech stream that contained minimal segmentation cues and paired it with two synchronous facial displays in which visual prosody was either informative or uninformative for identifying word boundaries. Across three familiarisation conditions (audio stream alone, facial streams alone, and paired audiovisual), learning occurred only when the facial displays were informative to word boundaries, suggesting that facial cues can help learners solve the early challenges of language acquisition.

UR - http://www.scopus.com/inward/record.url?scp=84942945042&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942945042&partnerID=8YFLogxK

U2 - 10.1080/01690965.2013.791703

DO - 10.1080/01690965.2013.791703

M3 - Article

AN - SCOPUS:84942945042

VL - 29

SP - 771

EP - 780

JO - Language, Cognition and Neuroscience

JF - Language, Cognition and Neuroscience

SN - 2327-3798

IS - 7

ER -