What's in a face? Visual contributions to speech segmentation

Aaron D. Mitchel, Daniel J. Weiss

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

Recent research has demonstrated that adults successfully segment two interleaved artificial speech streams with incongruent statistics (i.e., streams whose combined statistics are noisier than the encapsulated statistics) only when provided with an indexical cue of speaker voice. In a series of five experiments, our study explores whether learners can utilise visual information to encapsulate statistics for each speech stream. We initially presented learners with incongruent artificial speech streams produced by the same female voice along with an accompanying visual display. Learners successfully segmented both streams when the audio stream was presented with an indexical cue of talking faces (Experiment 1). This learning cannot be attributed to the presence of the talking face display alone, as a single face paired with a single input stream did not improve segmentation (Experiment 2). Additionally, participants failed to successfully segment two streams when they were paired with a synchronised single talking face display (Experiment 3). Likewise, learners failed to successfully segment both streams when the visual indexical cue lacked audio-visual synchrony, such as changes in background screen colour (Experiment 4) or a static face display (Experiment 5). We end by discussing the possible relevance of the speaker's face in speech segmentation and bilingual language acquisition.

Original languageEnglish (US)
Pages (from-to)456-482
Number of pages27
JournalLanguage and Cognitive Processes
Volume25
Issue number4
DOIs
StatePublished - May 1 2010

Fingerprint

experiment
statistics
Cues
language acquisition
segmentation
Speech Segmentation
Experiment
Language
Color
Learning
Statistics
Research
learning
Indexicals
Artificial

All Science Journal Classification (ASJC) codes

  • Experimental and Cognitive Psychology
  • Language and Linguistics
  • Education
  • Linguistics and Language

Cite this

@article{c5057ce82f174f11a7c2777ab35257e6,
title = "What's in a face? Visual contributions to speech segmentation",
abstract = "Recent research has demonstrated that adults successfully segment two interleaved artificial speech streams with incongruent statistics (i.e., streams whose combined statistics are noisier than the encapsulated statistics) only when provided with an indexical cue of speaker voice. In a series of five experiments, our study explores whether learners can utilise visual information to encapsulate statistics for each speech stream. We initially presented learners with incongruent artificial speech streams produced by the same female voice along with an accompanying visual display. Learners successfully segmented both streams when the audio stream was presented with an indexical cue of talking faces (Experiment 1). This learning cannot be attributed to the presence of the talking face display alone, as a single face paired with a single input stream did not improve segmentation (Experiment 2). Additionally, participants failed to successfully segment two streams when they were paired with a synchronised single talking face display (Experiment 3). Likewise, learners failed to successfully segment both streams when the visual indexical cue lacked audio-visual synchrony, such as changes in background screen colour (Experiment 4) or a static face display (Experiment 5). We end by discussing the possible relevance of the speaker's face in speech segmentation and bilingual language acquisition.",
author = "Mitchel, {Aaron D.} and Weiss, {Daniel J.}",
year = "2010",
month = "5",
day = "1",
doi = "10.1080/01690960903209888",
language = "English (US)",
volume = "25",
pages = "456--482",
journal = "Language, Cognition and Neuroscience",
issn = "2327-3798",
publisher = "Taylor and Francis",
number = "4",

}

What's in a face? Visual contributions to speech segmentation. / Mitchel, Aaron D.; Weiss, Daniel J.

In: Language and Cognitive Processes, Vol. 25, No. 4, 01.05.2010, p. 456-482.

Research output: Contribution to journalArticle

TY - JOUR

T1 - What's in a face? Visual contributions to speech segmentation

AU - Mitchel, Aaron D.

AU - Weiss, Daniel J.

PY - 2010/5/1

Y1 - 2010/5/1

N2 - Recent research has demonstrated that adults successfully segment two interleaved artificial speech streams with incongruent statistics (i.e., streams whose combined statistics are noisier than the encapsulated statistics) only when provided with an indexical cue of speaker voice. In a series of five experiments, our study explores whether learners can utilise visual information to encapsulate statistics for each speech stream. We initially presented learners with incongruent artificial speech streams produced by the same female voice along with an accompanying visual display. Learners successfully segmented both streams when the audio stream was presented with an indexical cue of talking faces (Experiment 1). This learning cannot be attributed to the presence of the talking face display alone, as a single face paired with a single input stream did not improve segmentation (Experiment 2). Additionally, participants failed to successfully segment two streams when they were paired with a synchronised single talking face display (Experiment 3). Likewise, learners failed to successfully segment both streams when the visual indexical cue lacked audio-visual synchrony, such as changes in background screen colour (Experiment 4) or a static face display (Experiment 5). We end by discussing the possible relevance of the speaker's face in speech segmentation and bilingual language acquisition.

AB - Recent research has demonstrated that adults successfully segment two interleaved artificial speech streams with incongruent statistics (i.e., streams whose combined statistics are noisier than the encapsulated statistics) only when provided with an indexical cue of speaker voice. In a series of five experiments, our study explores whether learners can utilise visual information to encapsulate statistics for each speech stream. We initially presented learners with incongruent artificial speech streams produced by the same female voice along with an accompanying visual display. Learners successfully segmented both streams when the audio stream was presented with an indexical cue of talking faces (Experiment 1). This learning cannot be attributed to the presence of the talking face display alone, as a single face paired with a single input stream did not improve segmentation (Experiment 2). Additionally, participants failed to successfully segment two streams when they were paired with a synchronised single talking face display (Experiment 3). Likewise, learners failed to successfully segment both streams when the visual indexical cue lacked audio-visual synchrony, such as changes in background screen colour (Experiment 4) or a static face display (Experiment 5). We end by discussing the possible relevance of the speaker's face in speech segmentation and bilingual language acquisition.

UR - http://www.scopus.com/inward/record.url?scp=77951664180&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951664180&partnerID=8YFLogxK

U2 - 10.1080/01690960903209888

DO - 10.1080/01690960903209888

M3 - Article

VL - 25

SP - 456

EP - 482

JO - Language, Cognition and Neuroscience

JF - Language, Cognition and Neuroscience

SN - 2327-3798

IS - 4

ER -