Selection of optimal vocal tract regions using real-time magnetic resonance imaging for robust voice activity detection

Abhay Prasad, Prasanta Kumar Ghosh, Shrikanth S. Narayanan

Research output: Contribution to journalConference article

Abstract

Real time magnetic resonance imaging (rtMRI) enables direct video capture of the moving vocal tract concurrent with audio signal providing valuable data for speech research. We consider a multimodal approach to voice activity detection (VAD) in the rtMRI recording that uses audio signal as well as MRI image sequence. The degraded quality of the audio recorded in the scanner motivates this multimodal scheme for robust VAD. Optimal regions in the MRI image are selected for performing VAD with a novel algorithm. VAD experiments using rtMRI data of two male and two female subjects show that VAD performance using optimally selected regions from MRI images is comparable to that using only audio signal. The optimal regions turn out to be parts of jaw, velum, glottis and lips. VAD performance using audio signal and MRI image sequence together is found to be significantly better (∼14% absolute improvement in VAD accuracy) than that using the audio only when the audio is contaminated with additive noise at low SNR.

Original languageEnglish
Pages (from-to)1539-1543
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 01-01-2014
Externally publishedYes
Event15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore
Duration: 14-09-201418-09-2014

Fingerprint

Voice Activity Detection
Magnetic Resonance Imaging
Magnetic resonance
Real-time Imaging
Real-time
Imaging techniques
Magnetic resonance imaging
Image Sequence
Additive Noise
Scanner
Vocal Tract
Additive noise
Concurrent

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

@article{10d7723c116e417194adf1e22751e555,
title = "Selection of optimal vocal tract regions using real-time magnetic resonance imaging for robust voice activity detection",
abstract = "Real time magnetic resonance imaging (rtMRI) enables direct video capture of the moving vocal tract concurrent with audio signal providing valuable data for speech research. We consider a multimodal approach to voice activity detection (VAD) in the rtMRI recording that uses audio signal as well as MRI image sequence. The degraded quality of the audio recorded in the scanner motivates this multimodal scheme for robust VAD. Optimal regions in the MRI image are selected for performing VAD with a novel algorithm. VAD experiments using rtMRI data of two male and two female subjects show that VAD performance using optimally selected regions from MRI images is comparable to that using only audio signal. The optimal regions turn out to be parts of jaw, velum, glottis and lips. VAD performance using audio signal and MRI image sequence together is found to be significantly better (∼14{\%} absolute improvement in VAD accuracy) than that using the audio only when the audio is contaminated with additive noise at low SNR.",
author = "Abhay Prasad and Ghosh, {Prasanta Kumar} and Narayanan, {Shrikanth S.}",
year = "2014",
month = "1",
day = "1",
language = "English",
pages = "1539--1543",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

Selection of optimal vocal tract regions using real-time magnetic resonance imaging for robust voice activity detection. / Prasad, Abhay; Ghosh, Prasanta Kumar; Narayanan, Shrikanth S.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 01.01.2014, p. 1539-1543.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Selection of optimal vocal tract regions using real-time magnetic resonance imaging for robust voice activity detection

AU - Prasad, Abhay

AU - Ghosh, Prasanta Kumar

AU - Narayanan, Shrikanth S.

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Real time magnetic resonance imaging (rtMRI) enables direct video capture of the moving vocal tract concurrent with audio signal providing valuable data for speech research. We consider a multimodal approach to voice activity detection (VAD) in the rtMRI recording that uses audio signal as well as MRI image sequence. The degraded quality of the audio recorded in the scanner motivates this multimodal scheme for robust VAD. Optimal regions in the MRI image are selected for performing VAD with a novel algorithm. VAD experiments using rtMRI data of two male and two female subjects show that VAD performance using optimally selected regions from MRI images is comparable to that using only audio signal. The optimal regions turn out to be parts of jaw, velum, glottis and lips. VAD performance using audio signal and MRI image sequence together is found to be significantly better (∼14% absolute improvement in VAD accuracy) than that using the audio only when the audio is contaminated with additive noise at low SNR.

AB - Real time magnetic resonance imaging (rtMRI) enables direct video capture of the moving vocal tract concurrent with audio signal providing valuable data for speech research. We consider a multimodal approach to voice activity detection (VAD) in the rtMRI recording that uses audio signal as well as MRI image sequence. The degraded quality of the audio recorded in the scanner motivates this multimodal scheme for robust VAD. Optimal regions in the MRI image are selected for performing VAD with a novel algorithm. VAD experiments using rtMRI data of two male and two female subjects show that VAD performance using optimally selected regions from MRI images is comparable to that using only audio signal. The optimal regions turn out to be parts of jaw, velum, glottis and lips. VAD performance using audio signal and MRI image sequence together is found to be significantly better (∼14% absolute improvement in VAD accuracy) than that using the audio only when the audio is contaminated with additive noise at low SNR.

UR - http://www.scopus.com/inward/record.url?scp=84910028717&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84910028717&partnerID=8YFLogxK

M3 - Conference article

SP - 1539

EP - 1543

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -