Estimation of the invariant and variant characteristics in speech articulation and its application to speaker identification

Abhay Prasad, Vijitha Periyasamy, Prasanta Kumar Ghosh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Speech articulation varies across speakers for producing a speech sound due to the differences in their vocal tract morphologies, though the speech motor actions are executed in terms of relatively invariant gestures [1]. While the invariant articulatory gestures are driven by the linguistic content of the spoken utterance, the component of speech articulation that varies across speakers reflects speaker-specific and other paralinguistic information. In this work, we present a formulation to decompose the speech articulation from multiple speakers into the variant and invariant aspects when they speak the same sentence. The variant component is found to be a better representation for discriminating speakers compared to the speech articulation which includes the invariant part. Experiments with real-time magnetic resonance imaging (rtMRI) videos of speech production from multiple speakers reveal that the variant component of speech articulation yields a better frame-level speaker identification accuracy compared to the speech articulation as well as acoustic features by 29.9% and 9.4% (absolute) respectively.

Original languageEnglish
Title of host publication2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4265-4269
Number of pages5
Volume2015-August
ISBN (Electronic)9781467369978
DOIs
Publication statusPublished - 01-01-2015
Externally publishedYes
Event40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Brisbane, Australia
Duration: 19-04-201424-04-2014

Conference

Conference40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
CountryAustralia
CityBrisbane
Period19-04-1424-04-14

Fingerprint

Magnetic resonance
Linguistics
Acoustics
Acoustic waves
Imaging techniques
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Prasad, A., Periyasamy, V., & Ghosh, P. K. (2015). Estimation of the invariant and variant characteristics in speech articulation and its application to speaker identification. In 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings (Vol. 2015-August, pp. 4265-4269). [7178775] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2015.7178775
Prasad, Abhay ; Periyasamy, Vijitha ; Ghosh, Prasanta Kumar. / Estimation of the invariant and variant characteristics in speech articulation and its application to speaker identification. 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings. Vol. 2015-August Institute of Electrical and Electronics Engineers Inc., 2015. pp. 4265-4269
@inproceedings{d335e06c55e343f784335141f885656b,
title = "Estimation of the invariant and variant characteristics in speech articulation and its application to speaker identification",
abstract = "Speech articulation varies across speakers for producing a speech sound due to the differences in their vocal tract morphologies, though the speech motor actions are executed in terms of relatively invariant gestures [1]. While the invariant articulatory gestures are driven by the linguistic content of the spoken utterance, the component of speech articulation that varies across speakers reflects speaker-specific and other paralinguistic information. In this work, we present a formulation to decompose the speech articulation from multiple speakers into the variant and invariant aspects when they speak the same sentence. The variant component is found to be a better representation for discriminating speakers compared to the speech articulation which includes the invariant part. Experiments with real-time magnetic resonance imaging (rtMRI) videos of speech production from multiple speakers reveal that the variant component of speech articulation yields a better frame-level speaker identification accuracy compared to the speech articulation as well as acoustic features by 29.9{\%} and 9.4{\%} (absolute) respectively.",
author = "Abhay Prasad and Vijitha Periyasamy and Ghosh, {Prasanta Kumar}",
year = "2015",
month = "1",
day = "1",
doi = "10.1109/ICASSP.2015.7178775",
language = "English",
volume = "2015-August",
pages = "4265--4269",
booktitle = "2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Prasad, A, Periyasamy, V & Ghosh, PK 2015, Estimation of the invariant and variant characteristics in speech articulation and its application to speaker identification. in 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings. vol. 2015-August, 7178775, Institute of Electrical and Electronics Engineers Inc., pp. 4265-4269, 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015, Brisbane, Australia, 19-04-14. https://doi.org/10.1109/ICASSP.2015.7178775

Estimation of the invariant and variant characteristics in speech articulation and its application to speaker identification. / Prasad, Abhay; Periyasamy, Vijitha; Ghosh, Prasanta Kumar.

2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings. Vol. 2015-August Institute of Electrical and Electronics Engineers Inc., 2015. p. 4265-4269 7178775.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Estimation of the invariant and variant characteristics in speech articulation and its application to speaker identification

AU - Prasad, Abhay

AU - Periyasamy, Vijitha

AU - Ghosh, Prasanta Kumar

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Speech articulation varies across speakers for producing a speech sound due to the differences in their vocal tract morphologies, though the speech motor actions are executed in terms of relatively invariant gestures [1]. While the invariant articulatory gestures are driven by the linguistic content of the spoken utterance, the component of speech articulation that varies across speakers reflects speaker-specific and other paralinguistic information. In this work, we present a formulation to decompose the speech articulation from multiple speakers into the variant and invariant aspects when they speak the same sentence. The variant component is found to be a better representation for discriminating speakers compared to the speech articulation which includes the invariant part. Experiments with real-time magnetic resonance imaging (rtMRI) videos of speech production from multiple speakers reveal that the variant component of speech articulation yields a better frame-level speaker identification accuracy compared to the speech articulation as well as acoustic features by 29.9% and 9.4% (absolute) respectively.

AB - Speech articulation varies across speakers for producing a speech sound due to the differences in their vocal tract morphologies, though the speech motor actions are executed in terms of relatively invariant gestures [1]. While the invariant articulatory gestures are driven by the linguistic content of the spoken utterance, the component of speech articulation that varies across speakers reflects speaker-specific and other paralinguistic information. In this work, we present a formulation to decompose the speech articulation from multiple speakers into the variant and invariant aspects when they speak the same sentence. The variant component is found to be a better representation for discriminating speakers compared to the speech articulation which includes the invariant part. Experiments with real-time magnetic resonance imaging (rtMRI) videos of speech production from multiple speakers reveal that the variant component of speech articulation yields a better frame-level speaker identification accuracy compared to the speech articulation as well as acoustic features by 29.9% and 9.4% (absolute) respectively.

UR - http://www.scopus.com/inward/record.url?scp=84946010858&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946010858&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2015.7178775

DO - 10.1109/ICASSP.2015.7178775

M3 - Conference contribution

VL - 2015-August

SP - 4265

EP - 4269

BT - 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Prasad A, Periyasamy V, Ghosh PK. Estimation of the invariant and variant characteristics in speech articulation and its application to speaker identification. In 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings. Vol. 2015-August. Institute of Electrical and Electronics Engineers Inc. 2015. p. 4265-4269. 7178775 https://doi.org/10.1109/ICASSP.2015.7178775