Nonuniform speaker normalization using affine transformation

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

In this paper, a well-motivated nonuniform speaker normalization model that affinely relates the formant frequencies of speakers enunciating the same sound is proposed. Using the proposed affine model, the corresponding universal-warping function that is required for normalization is shown to have the same parametric form as the mel scale formula. The parameters of this universal-warping function are estimated from the vowel formant data and are shown to be close to the commonly used formula for the mel scale. This shows an interesting connection between nonuniform speaker normalization and the psychoacoustics based mel scale. In addition, the affine model fits the vowel formant data better than commonly used ad hoc normalization models. This work is motivated by a desire to improve the performance of speaker-independent speech recognition systems, where speaker normalization is conventionally done by assuming a linear-scaling relationship between spectra of speakers. The proposed affine relation is extended to describe the relationship between spectra of speakers enunciating the same sound. On a telephone-based connected digit recognition task, the proposed model provides improved recognition performance over the linear-scaling model.

Original languageEnglish (US)
Pages (from-to)1727-1738
Number of pages12
JournalJournal of the Acoustical Society of America
Volume124
Issue number3
DOIs
StatePublished - 2008

All Science Journal Classification (ASJC) codes

  • Arts and Humanities (miscellaneous)
  • Acoustics and Ultrasonics

Fingerprint

Dive into the research topics of 'Nonuniform speaker normalization using affine transformation'. Together they form a unique fingerprint.

Cite this