Emotion recognition from varying length patterns of speech using cnn-based segment-level pyramid match kernel based SVMs

Shikha Gupta, Kishalaya De, Dileep Aroor Dinesh, Veena Thenkanidiyoor

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Convolutional Neural Networks (CNNs) and its variants have achieved impressive performance when used for different speech processing tasks like spoken language identification, speaker verification, speech emotion recognition, etc. Conventionally, CNNs for speech applications consider input features from fixed duration speech segments as input. In this work, we attempt to consider features from complete speech signal as input to CNN. We propose to use spatial pyramid pooling (SPP) layer in CNN architecture to remove the fixed length constraint and to consider features from varying length speech signals as input to CNN for an end to end training. Proposed architecture also results in varying size set of feature maps from convolution layer. Further, we propose novel CNN-based segment-level pyramid match kernel (CNN-SLPMK) as dynamic kernel between a pair of varying size set of feature maps for the classification framework using support vector machines (SVMs) based classifier. We demonstrate that our proposed approach achieves comparable results with state-of-the-art techniques for speech emotion recognition task.

Original languageEnglish
Title of host publication25th National Conference on Communications, NCC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538692868
DOIs
Publication statusPublished - 01-02-2019
Externally publishedYes
Event25th National Conference on Communications, NCC 2019 - Bangalore, India
Duration: 20-02-201923-02-2019

Publication series

Name25th National Conference on Communications, NCC 2019

Conference

Conference25th National Conference on Communications, NCC 2019
Country/TerritoryIndia
CityBangalore
Period20-02-1923-02-19

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Signal Processing
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Emotion recognition from varying length patterns of speech using cnn-based segment-level pyramid match kernel based SVMs'. Together they form a unique fingerprint.

Cite this