Emotion recognition from varying length patterns of speech using cnn-based segment-level pyramid match kernel based SVMs

Shikha Gupta, Kishalaya De, Dileep Aroor Dinesh, Veena Thenkanidiyoor

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Convolutional Neural Networks (CNNs) and its variants have achieved impressive performance when used for different speech processing tasks like spoken language identification, speaker verification, speech emotion recognition, etc. Conventionally, CNNs for speech applications consider input features from fixed duration speech segments as input. In this work, we attempt to consider features from complete speech signal as input to CNN. We propose to use spatial pyramid pooling (SPP) layer in CNN architecture to remove the fixed length constraint and to consider features from varying length speech signals as input to CNN for an end to end training. Proposed architecture also results in varying size set of feature maps from convolution layer. Further, we propose novel CNN-based segment-level pyramid match kernel (CNN-SLPMK) as dynamic kernel between a pair of varying size set of feature maps for the classification framework using support vector machines (SVMs) based classifier. We demonstrate that our proposed approach achieves comparable results with state-of-the-art techniques for speech emotion recognition task.

Original languageEnglish
Title of host publication25th National Conference on Communications, NCC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538692868
DOIs
Publication statusPublished - 01-02-2019
Externally publishedYes
Event25th National Conference on Communications, NCC 2019 - Bangalore, India
Duration: 20-02-201923-02-2019

Publication series

Name25th National Conference on Communications, NCC 2019

Conference

Conference25th National Conference on Communications, NCC 2019
CountryIndia
CityBangalore
Period20-02-1923-02-19

Fingerprint

Support vector machines
Neural networks
Speech recognition
Speech processing
Network architecture
Convolution
Classifiers

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Signal Processing
  • Safety, Risk, Reliability and Quality

Cite this

Gupta, S., De, K., Dinesh, D. A., & Thenkanidiyoor, V. (2019). Emotion recognition from varying length patterns of speech using cnn-based segment-level pyramid match kernel based SVMs. In 25th National Conference on Communications, NCC 2019 [8732191] (25th National Conference on Communications, NCC 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/NCC.2019.8732191
Gupta, Shikha ; De, Kishalaya ; Dinesh, Dileep Aroor ; Thenkanidiyoor, Veena. / Emotion recognition from varying length patterns of speech using cnn-based segment-level pyramid match kernel based SVMs. 25th National Conference on Communications, NCC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. (25th National Conference on Communications, NCC 2019).
@inproceedings{242c7842046547f184722923e4002130,
title = "Emotion recognition from varying length patterns of speech using cnn-based segment-level pyramid match kernel based SVMs",
abstract = "Convolutional Neural Networks (CNNs) and its variants have achieved impressive performance when used for different speech processing tasks like spoken language identification, speaker verification, speech emotion recognition, etc. Conventionally, CNNs for speech applications consider input features from fixed duration speech segments as input. In this work, we attempt to consider features from complete speech signal as input to CNN. We propose to use spatial pyramid pooling (SPP) layer in CNN architecture to remove the fixed length constraint and to consider features from varying length speech signals as input to CNN for an end to end training. Proposed architecture also results in varying size set of feature maps from convolution layer. Further, we propose novel CNN-based segment-level pyramid match kernel (CNN-SLPMK) as dynamic kernel between a pair of varying size set of feature maps for the classification framework using support vector machines (SVMs) based classifier. We demonstrate that our proposed approach achieves comparable results with state-of-the-art techniques for speech emotion recognition task.",
author = "Shikha Gupta and Kishalaya De and Dinesh, {Dileep Aroor} and Veena Thenkanidiyoor",
year = "2019",
month = "2",
day = "1",
doi = "10.1109/NCC.2019.8732191",
language = "English",
series = "25th National Conference on Communications, NCC 2019",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "25th National Conference on Communications, NCC 2019",
address = "United States",

}

Gupta, S, De, K, Dinesh, DA & Thenkanidiyoor, V 2019, Emotion recognition from varying length patterns of speech using cnn-based segment-level pyramid match kernel based SVMs. in 25th National Conference on Communications, NCC 2019., 8732191, 25th National Conference on Communications, NCC 2019, Institute of Electrical and Electronics Engineers Inc., 25th National Conference on Communications, NCC 2019, Bangalore, India, 20-02-19. https://doi.org/10.1109/NCC.2019.8732191

Emotion recognition from varying length patterns of speech using cnn-based segment-level pyramid match kernel based SVMs. / Gupta, Shikha; De, Kishalaya; Dinesh, Dileep Aroor; Thenkanidiyoor, Veena.

25th National Conference on Communications, NCC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. 8732191 (25th National Conference on Communications, NCC 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Emotion recognition from varying length patterns of speech using cnn-based segment-level pyramid match kernel based SVMs

AU - Gupta, Shikha

AU - De, Kishalaya

AU - Dinesh, Dileep Aroor

AU - Thenkanidiyoor, Veena

PY - 2019/2/1

Y1 - 2019/2/1

N2 - Convolutional Neural Networks (CNNs) and its variants have achieved impressive performance when used for different speech processing tasks like spoken language identification, speaker verification, speech emotion recognition, etc. Conventionally, CNNs for speech applications consider input features from fixed duration speech segments as input. In this work, we attempt to consider features from complete speech signal as input to CNN. We propose to use spatial pyramid pooling (SPP) layer in CNN architecture to remove the fixed length constraint and to consider features from varying length speech signals as input to CNN for an end to end training. Proposed architecture also results in varying size set of feature maps from convolution layer. Further, we propose novel CNN-based segment-level pyramid match kernel (CNN-SLPMK) as dynamic kernel between a pair of varying size set of feature maps for the classification framework using support vector machines (SVMs) based classifier. We demonstrate that our proposed approach achieves comparable results with state-of-the-art techniques for speech emotion recognition task.

AB - Convolutional Neural Networks (CNNs) and its variants have achieved impressive performance when used for different speech processing tasks like spoken language identification, speaker verification, speech emotion recognition, etc. Conventionally, CNNs for speech applications consider input features from fixed duration speech segments as input. In this work, we attempt to consider features from complete speech signal as input to CNN. We propose to use spatial pyramid pooling (SPP) layer in CNN architecture to remove the fixed length constraint and to consider features from varying length speech signals as input to CNN for an end to end training. Proposed architecture also results in varying size set of feature maps from convolution layer. Further, we propose novel CNN-based segment-level pyramid match kernel (CNN-SLPMK) as dynamic kernel between a pair of varying size set of feature maps for the classification framework using support vector machines (SVMs) based classifier. We demonstrate that our proposed approach achieves comparable results with state-of-the-art techniques for speech emotion recognition task.

UR - http://www.scopus.com/inward/record.url?scp=85067945774&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067945774&partnerID=8YFLogxK

U2 - 10.1109/NCC.2019.8732191

DO - 10.1109/NCC.2019.8732191

M3 - Conference contribution

AN - SCOPUS:85067945774

T3 - 25th National Conference on Communications, NCC 2019

BT - 25th National Conference on Communications, NCC 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Gupta S, De K, Dinesh DA, Thenkanidiyoor V. Emotion recognition from varying length patterns of speech using cnn-based segment-level pyramid match kernel based SVMs. In 25th National Conference on Communications, NCC 2019. Institute of Electrical and Electronics Engineers Inc. 2019. 8732191. (25th National Conference on Communications, NCC 2019). https://doi.org/10.1109/NCC.2019.8732191