Study of non-linear frequency warping functions for speaker normalization

Bharath Kumar Sriperumbudur, S. Umesh, R. Sinha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

In this paper, we study non-linear frequency-warping functions that are commonly used in speaker normalization. This study is motivated by our recently proposed affine transformation model for speaker normalization [1] which has provided improved recognition performance when compared to uniform scaling model [1, 2]. In this work, using formant data from Peterson & Barney and Hillenbrand vowel databases, we analyze the behavior of scale factor as a function of frequency. The empirical observation [3, 4] shows that while uniform scaling assumption may be valid at higher frequencies, there are significant deviations at low frequencies. We show that while our recently proposed model has behavior similar to the empirical result, the behavior of many of the commonly used non-linear models (including that of Eide-Gish, power law and bilinear transformation) differ significantly from the empirical result. This difference in behavior from the empirical observation may explain the limited improvement in recognition performance provided by these non-linear models when compared to conventional uniform-scaling model. We also show that our proposed model does better fitting to the formant data than these non-linear models. We, therefore, conclude that the affine-transformation model may be a more appropriate non-linear model for speaker normalization.

Original languageEnglish (US)
Title of host publication2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings
StatePublished - Dec 1 2006
Event2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006 - Toulouse, France
Duration: May 14 2006May 19 2006

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
ISSN (Print)1520-6149

Other

Other2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006
CountryFrance
CityToulouse
Period5/14/065/19/06

Fingerprint

Probability density function
scaling
vowels
low frequencies
deviation

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Sriperumbudur, B. K., Umesh, S., & Sinha, R. (2006). Study of non-linear frequency warping functions for speaker normalization. In 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings [1660253] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 1).
Sriperumbudur, Bharath Kumar ; Umesh, S. ; Sinha, R. / Study of non-linear frequency warping functions for speaker normalization. 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings. 2006. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).
@inproceedings{872f51952b2943fa94e82d1c96876554,
title = "Study of non-linear frequency warping functions for speaker normalization",
abstract = "In this paper, we study non-linear frequency-warping functions that are commonly used in speaker normalization. This study is motivated by our recently proposed affine transformation model for speaker normalization [1] which has provided improved recognition performance when compared to uniform scaling model [1, 2]. In this work, using formant data from Peterson & Barney and Hillenbrand vowel databases, we analyze the behavior of scale factor as a function of frequency. The empirical observation [3, 4] shows that while uniform scaling assumption may be valid at higher frequencies, there are significant deviations at low frequencies. We show that while our recently proposed model has behavior similar to the empirical result, the behavior of many of the commonly used non-linear models (including that of Eide-Gish, power law and bilinear transformation) differ significantly from the empirical result. This difference in behavior from the empirical observation may explain the limited improvement in recognition performance provided by these non-linear models when compared to conventional uniform-scaling model. We also show that our proposed model does better fitting to the formant data than these non-linear models. We, therefore, conclude that the affine-transformation model may be a more appropriate non-linear model for speaker normalization.",
author = "Sriperumbudur, {Bharath Kumar} and S. Umesh and R. Sinha",
year = "2006",
month = "12",
day = "1",
language = "English (US)",
isbn = "142440469X",
series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
booktitle = "2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings",

}

Sriperumbudur, BK, Umesh, S & Sinha, R 2006, Study of non-linear frequency warping functions for speaker normalization. in 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings., 1660253, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 1, 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006, Toulouse, France, 5/14/06.

Study of non-linear frequency warping functions for speaker normalization. / Sriperumbudur, Bharath Kumar; Umesh, S.; Sinha, R.

2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings. 2006. 1660253 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 1).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Study of non-linear frequency warping functions for speaker normalization

AU - Sriperumbudur, Bharath Kumar

AU - Umesh, S.

AU - Sinha, R.

PY - 2006/12/1

Y1 - 2006/12/1

N2 - In this paper, we study non-linear frequency-warping functions that are commonly used in speaker normalization. This study is motivated by our recently proposed affine transformation model for speaker normalization [1] which has provided improved recognition performance when compared to uniform scaling model [1, 2]. In this work, using formant data from Peterson & Barney and Hillenbrand vowel databases, we analyze the behavior of scale factor as a function of frequency. The empirical observation [3, 4] shows that while uniform scaling assumption may be valid at higher frequencies, there are significant deviations at low frequencies. We show that while our recently proposed model has behavior similar to the empirical result, the behavior of many of the commonly used non-linear models (including that of Eide-Gish, power law and bilinear transformation) differ significantly from the empirical result. This difference in behavior from the empirical observation may explain the limited improvement in recognition performance provided by these non-linear models when compared to conventional uniform-scaling model. We also show that our proposed model does better fitting to the formant data than these non-linear models. We, therefore, conclude that the affine-transformation model may be a more appropriate non-linear model for speaker normalization.

AB - In this paper, we study non-linear frequency-warping functions that are commonly used in speaker normalization. This study is motivated by our recently proposed affine transformation model for speaker normalization [1] which has provided improved recognition performance when compared to uniform scaling model [1, 2]. In this work, using formant data from Peterson & Barney and Hillenbrand vowel databases, we analyze the behavior of scale factor as a function of frequency. The empirical observation [3, 4] shows that while uniform scaling assumption may be valid at higher frequencies, there are significant deviations at low frequencies. We show that while our recently proposed model has behavior similar to the empirical result, the behavior of many of the commonly used non-linear models (including that of Eide-Gish, power law and bilinear transformation) differ significantly from the empirical result. This difference in behavior from the empirical observation may explain the limited improvement in recognition performance provided by these non-linear models when compared to conventional uniform-scaling model. We also show that our proposed model does better fitting to the formant data than these non-linear models. We, therefore, conclude that the affine-transformation model may be a more appropriate non-linear model for speaker normalization.

UR - http://www.scopus.com/inward/record.url?scp=33947690236&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33947690236&partnerID=8YFLogxK

M3 - Conference contribution

SN - 142440469X

SN - 9781424404698

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

BT - 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings

ER -

Sriperumbudur BK, Umesh S, Sinha R. Study of non-linear frequency warping functions for speaker normalization. In 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings. 2006. 1660253. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).