Non-uniform speaker normalization using affine-transformation

Bharath Kumar Sriperumbudur, S. Umesh, Rohit Sinha

Research output: Contribution to journalConference article

3 Citations (Scopus)

Abstract

In this paper, we propose a mathematical model to describe the relation between the formant frequencies of speakers and show that with the proposed affine model, speaker differences separate out as translation factors when a "mel-like" warping is performed. Using speech data we estimate the parameters of this warping function and show that it is close to the usual mel-formula. This model is motivated by Rohit et al.'s shift-based non-uniform speaker-normalization method, which provides improvement over the conventional maximum-likelihood based speaker normalization methods. We therefore provide a unified framework that relates the relationship between formants of speakers and method of removing speakers difference (which involves mel-warping) in a neat mathematical framework which is substantiated by our recognition experiments.

Original languageEnglish (US)
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
StatePublished - Sep 28 2004
EventProceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing - Montreal, Que, Canada
Duration: May 17 2004May 21 2004

Fingerprint

Maximum likelihood
mathematical models
Mathematical models
shift
estimates
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

@article{e56cb07f7efe4013a381bd0a804fbb18,
title = "Non-uniform speaker normalization using affine-transformation",
abstract = "In this paper, we propose a mathematical model to describe the relation between the formant frequencies of speakers and show that with the proposed affine model, speaker differences separate out as translation factors when a {"}mel-like{"} warping is performed. Using speech data we estimate the parameters of this warping function and show that it is close to the usual mel-formula. This model is motivated by Rohit et al.'s shift-based non-uniform speaker-normalization method, which provides improvement over the conventional maximum-likelihood based speaker normalization methods. We therefore provide a unified framework that relates the relationship between formants of speakers and method of removing speakers difference (which involves mel-warping) in a neat mathematical framework which is substantiated by our recognition experiments.",
author = "Sriperumbudur, {Bharath Kumar} and S. Umesh and Rohit Sinha",
year = "2004",
month = "9",
day = "28",
language = "English (US)",
volume = "1",
journal = "Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing",
issn = "0736-7791",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Non-uniform speaker normalization using affine-transformation

AU - Sriperumbudur, Bharath Kumar

AU - Umesh, S.

AU - Sinha, Rohit

PY - 2004/9/28

Y1 - 2004/9/28

N2 - In this paper, we propose a mathematical model to describe the relation between the formant frequencies of speakers and show that with the proposed affine model, speaker differences separate out as translation factors when a "mel-like" warping is performed. Using speech data we estimate the parameters of this warping function and show that it is close to the usual mel-formula. This model is motivated by Rohit et al.'s shift-based non-uniform speaker-normalization method, which provides improvement over the conventional maximum-likelihood based speaker normalization methods. We therefore provide a unified framework that relates the relationship between formants of speakers and method of removing speakers difference (which involves mel-warping) in a neat mathematical framework which is substantiated by our recognition experiments.

AB - In this paper, we propose a mathematical model to describe the relation between the formant frequencies of speakers and show that with the proposed affine model, speaker differences separate out as translation factors when a "mel-like" warping is performed. Using speech data we estimate the parameters of this warping function and show that it is close to the usual mel-formula. This model is motivated by Rohit et al.'s shift-based non-uniform speaker-normalization method, which provides improvement over the conventional maximum-likelihood based speaker normalization methods. We therefore provide a unified framework that relates the relationship between formants of speakers and method of removing speakers difference (which involves mel-warping) in a neat mathematical framework which is substantiated by our recognition experiments.

UR - http://www.scopus.com/inward/record.url?scp=4544359024&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4544359024&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:4544359024

VL - 1

JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

SN - 0736-7791

ER -