Comparison of t-test ranking with PCA and SEPCOR feature selection for wake and stage 1 sleep pattern recognition in multichannel electroencephalograms

T. K. Padma Shri, N. Sriraam

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Feature selection is critical for effective analysis of data and resource savings. In multi-dimensional datasets, feature selection methods mainly use filter based approach to obtain an optimal feature subspace and wrapper methods to search for an optimal feature subset within this space. In the proposed study, two filter based statistical feature selection methods viz., statistical t-test ranking with principal component analysis (PCA) and Separability & Correlation (SEPCOR) analysis are applied to identify patterns with high discrimination between wake and stage 1 sleep of a 8-channel (6 active +2 reference electrodes) electroencephalogram (EEG) sleep dataset. The feature set consists of 6-dimensional Spectral Entropy vectors computed over EEG epochs of one second duration. In the first method, spectral entropy feature ranking is based on a t-test statistic that maximizes class separation between wakefulness/stage1 sleep. Prior to classification, PCA is performed on the ranked and non-ranked feature subsets to study the contribution of ranked channels on classifier performance. The second method uses SEPCOR analysis to automatically select an optimal feature subset with low correlation among the chosen features and maximum separation between their class means. A correlation threshold is chosen heuristically in steps of 0.05 from 0.6 to 0.75 in order to select different subsets of features. The optimal feature subsets are evaluated using multi layered perceptron (MLP) network & k-nearest neighbor (k-NN) classifiers with 50% hold out cross validation. For ranked feature subsets N = 3, 4, 5, k-NN classifier outperforms MLP network with an increase in the number of principal components (pcs). Results indicate that the pcs of ranked channels enhance the performance of k-NN classifier whereas MLP network shows only a marginal improvement with ranking for number of channels, N ≤  4. As the number of pcs is varied from 2 to 4 in steps of one, there is an improvement of approximately 2% in the classification accuracies of k-NN classifier with ranking as compared to their non-ranked counterparts. The MLP exhibits only 1% improvement with ranking for the same case with number of hidden neurons, N = 50. The k-NN classifier responds with maximum accuracies of 96.43%, 95.7% and 94.10% (pc =  4, 3, 2 for no. of ranked channels, N = 4) as compared to 94.71%, 93.13% and 92% (pc = 4, 3, 2 non-ranked N = 4) respectively. The SEPCOR results show that with correlation threshold increasing from 0.6 to 0.75 in steps of 0.05, it automatically selects feature subsets of 2, 3, 4 and 5 which contribute to detection accuracies of 72.4%, 80%, 91.6% and 92% with k-NN classifier and improved accuracies of 73%, 85%, 95.6% and 95.8% with MLP network (no. of hidden neurons, N = 50) respectively. The SE feature ranking provides better classification results using k-NN classifier than non-ranked cases whereas features obtained using SEPCOR analysis prove to be better discriminators with MLP network for the classification of wake/stage1 sleep data. The computation speed is faster in k-NN classifier and independent of increase in value of k whereas MLP takes much more computation time for training based on the number of hidden neurons.

Original languageEnglish
Pages (from-to)499-512
Number of pages14
JournalBiomedical Signal Processing and Control
Volume31
DOIs
Publication statusPublished - 01-01-2017

Fingerprint

Neural Networks (Computer)
Sleep Stages
Electroencephalography
Principal Component Analysis
Principal component analysis
Pattern recognition
Feature extraction
Classifiers
Neural networks
Sleep
Entropy
Neurons
Wakefulness
Discriminators
Statistical tests
Electrodes
Statistics

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Health Informatics

Cite this

@article{3eaf455df20d43b19610b0778cb75053,
title = "Comparison of t-test ranking with PCA and SEPCOR feature selection for wake and stage 1 sleep pattern recognition in multichannel electroencephalograms",
abstract = "Feature selection is critical for effective analysis of data and resource savings. In multi-dimensional datasets, feature selection methods mainly use filter based approach to obtain an optimal feature subspace and wrapper methods to search for an optimal feature subset within this space. In the proposed study, two filter based statistical feature selection methods viz., statistical t-test ranking with principal component analysis (PCA) and Separability & Correlation (SEPCOR) analysis are applied to identify patterns with high discrimination between wake and stage 1 sleep of a 8-channel (6 active +2 reference electrodes) electroencephalogram (EEG) sleep dataset. The feature set consists of 6-dimensional Spectral Entropy vectors computed over EEG epochs of one second duration. In the first method, spectral entropy feature ranking is based on a t-test statistic that maximizes class separation between wakefulness/stage1 sleep. Prior to classification, PCA is performed on the ranked and non-ranked feature subsets to study the contribution of ranked channels on classifier performance. The second method uses SEPCOR analysis to automatically select an optimal feature subset with low correlation among the chosen features and maximum separation between their class means. A correlation threshold is chosen heuristically in steps of 0.05 from 0.6 to 0.75 in order to select different subsets of features. The optimal feature subsets are evaluated using multi layered perceptron (MLP) network & k-nearest neighbor (k-NN) classifiers with 50{\%} hold out cross validation. For ranked feature subsets N = 3, 4, 5, k-NN classifier outperforms MLP network with an increase in the number of principal components (pcs). Results indicate that the pcs of ranked channels enhance the performance of k-NN classifier whereas MLP network shows only a marginal improvement with ranking for number of channels, N ≤  4. As the number of pcs is varied from 2 to 4 in steps of one, there is an improvement of approximately 2{\%} in the classification accuracies of k-NN classifier with ranking as compared to their non-ranked counterparts. The MLP exhibits only 1{\%} improvement with ranking for the same case with number of hidden neurons, N = 50. The k-NN classifier responds with maximum accuracies of 96.43{\%}, 95.7{\%} and 94.10{\%} (pc =  4, 3, 2 for no. of ranked channels, N = 4) as compared to 94.71{\%}, 93.13{\%} and 92{\%} (pc = 4, 3, 2 non-ranked N = 4) respectively. The SEPCOR results show that with correlation threshold increasing from 0.6 to 0.75 in steps of 0.05, it automatically selects feature subsets of 2, 3, 4 and 5 which contribute to detection accuracies of 72.4{\%}, 80{\%}, 91.6{\%} and 92{\%} with k-NN classifier and improved accuracies of 73{\%}, 85{\%}, 95.6{\%} and 95.8{\%} with MLP network (no. of hidden neurons, N = 50) respectively. The SE feature ranking provides better classification results using k-NN classifier than non-ranked cases whereas features obtained using SEPCOR analysis prove to be better discriminators with MLP network for the classification of wake/stage1 sleep data. The computation speed is faster in k-NN classifier and independent of increase in value of k whereas MLP takes much more computation time for training based on the number of hidden neurons.",
author = "{Padma Shri}, {T. K.} and N. Sriraam",
year = "2017",
month = "1",
day = "1",
doi = "10.1016/j.bspc.2016.09.016",
language = "English",
volume = "31",
pages = "499--512",
journal = "Biomedical Signal Processing and Control",
issn = "1746-8094",
publisher = "Elsevier BV",

}

TY - JOUR

T1 - Comparison of t-test ranking with PCA and SEPCOR feature selection for wake and stage 1 sleep pattern recognition in multichannel electroencephalograms

AU - Padma Shri, T. K.

AU - Sriraam, N.

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Feature selection is critical for effective analysis of data and resource savings. In multi-dimensional datasets, feature selection methods mainly use filter based approach to obtain an optimal feature subspace and wrapper methods to search for an optimal feature subset within this space. In the proposed study, two filter based statistical feature selection methods viz., statistical t-test ranking with principal component analysis (PCA) and Separability & Correlation (SEPCOR) analysis are applied to identify patterns with high discrimination between wake and stage 1 sleep of a 8-channel (6 active +2 reference electrodes) electroencephalogram (EEG) sleep dataset. The feature set consists of 6-dimensional Spectral Entropy vectors computed over EEG epochs of one second duration. In the first method, spectral entropy feature ranking is based on a t-test statistic that maximizes class separation between wakefulness/stage1 sleep. Prior to classification, PCA is performed on the ranked and non-ranked feature subsets to study the contribution of ranked channels on classifier performance. The second method uses SEPCOR analysis to automatically select an optimal feature subset with low correlation among the chosen features and maximum separation between their class means. A correlation threshold is chosen heuristically in steps of 0.05 from 0.6 to 0.75 in order to select different subsets of features. The optimal feature subsets are evaluated using multi layered perceptron (MLP) network & k-nearest neighbor (k-NN) classifiers with 50% hold out cross validation. For ranked feature subsets N = 3, 4, 5, k-NN classifier outperforms MLP network with an increase in the number of principal components (pcs). Results indicate that the pcs of ranked channels enhance the performance of k-NN classifier whereas MLP network shows only a marginal improvement with ranking for number of channels, N ≤  4. As the number of pcs is varied from 2 to 4 in steps of one, there is an improvement of approximately 2% in the classification accuracies of k-NN classifier with ranking as compared to their non-ranked counterparts. The MLP exhibits only 1% improvement with ranking for the same case with number of hidden neurons, N = 50. The k-NN classifier responds with maximum accuracies of 96.43%, 95.7% and 94.10% (pc =  4, 3, 2 for no. of ranked channels, N = 4) as compared to 94.71%, 93.13% and 92% (pc = 4, 3, 2 non-ranked N = 4) respectively. The SEPCOR results show that with correlation threshold increasing from 0.6 to 0.75 in steps of 0.05, it automatically selects feature subsets of 2, 3, 4 and 5 which contribute to detection accuracies of 72.4%, 80%, 91.6% and 92% with k-NN classifier and improved accuracies of 73%, 85%, 95.6% and 95.8% with MLP network (no. of hidden neurons, N = 50) respectively. The SE feature ranking provides better classification results using k-NN classifier than non-ranked cases whereas features obtained using SEPCOR analysis prove to be better discriminators with MLP network for the classification of wake/stage1 sleep data. The computation speed is faster in k-NN classifier and independent of increase in value of k whereas MLP takes much more computation time for training based on the number of hidden neurons.

AB - Feature selection is critical for effective analysis of data and resource savings. In multi-dimensional datasets, feature selection methods mainly use filter based approach to obtain an optimal feature subspace and wrapper methods to search for an optimal feature subset within this space. In the proposed study, two filter based statistical feature selection methods viz., statistical t-test ranking with principal component analysis (PCA) and Separability & Correlation (SEPCOR) analysis are applied to identify patterns with high discrimination between wake and stage 1 sleep of a 8-channel (6 active +2 reference electrodes) electroencephalogram (EEG) sleep dataset. The feature set consists of 6-dimensional Spectral Entropy vectors computed over EEG epochs of one second duration. In the first method, spectral entropy feature ranking is based on a t-test statistic that maximizes class separation between wakefulness/stage1 sleep. Prior to classification, PCA is performed on the ranked and non-ranked feature subsets to study the contribution of ranked channels on classifier performance. The second method uses SEPCOR analysis to automatically select an optimal feature subset with low correlation among the chosen features and maximum separation between their class means. A correlation threshold is chosen heuristically in steps of 0.05 from 0.6 to 0.75 in order to select different subsets of features. The optimal feature subsets are evaluated using multi layered perceptron (MLP) network & k-nearest neighbor (k-NN) classifiers with 50% hold out cross validation. For ranked feature subsets N = 3, 4, 5, k-NN classifier outperforms MLP network with an increase in the number of principal components (pcs). Results indicate that the pcs of ranked channels enhance the performance of k-NN classifier whereas MLP network shows only a marginal improvement with ranking for number of channels, N ≤  4. As the number of pcs is varied from 2 to 4 in steps of one, there is an improvement of approximately 2% in the classification accuracies of k-NN classifier with ranking as compared to their non-ranked counterparts. The MLP exhibits only 1% improvement with ranking for the same case with number of hidden neurons, N = 50. The k-NN classifier responds with maximum accuracies of 96.43%, 95.7% and 94.10% (pc =  4, 3, 2 for no. of ranked channels, N = 4) as compared to 94.71%, 93.13% and 92% (pc = 4, 3, 2 non-ranked N = 4) respectively. The SEPCOR results show that with correlation threshold increasing from 0.6 to 0.75 in steps of 0.05, it automatically selects feature subsets of 2, 3, 4 and 5 which contribute to detection accuracies of 72.4%, 80%, 91.6% and 92% with k-NN classifier and improved accuracies of 73%, 85%, 95.6% and 95.8% with MLP network (no. of hidden neurons, N = 50) respectively. The SE feature ranking provides better classification results using k-NN classifier than non-ranked cases whereas features obtained using SEPCOR analysis prove to be better discriminators with MLP network for the classification of wake/stage1 sleep data. The computation speed is faster in k-NN classifier and independent of increase in value of k whereas MLP takes much more computation time for training based on the number of hidden neurons.

UR - http://www.scopus.com/inward/record.url?scp=84989359982&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84989359982&partnerID=8YFLogxK

U2 - 10.1016/j.bspc.2016.09.016

DO - 10.1016/j.bspc.2016.09.016

M3 - Article

AN - SCOPUS:84989359982

VL - 31

SP - 499

EP - 512

JO - Biomedical Signal Processing and Control

JF - Biomedical Signal Processing and Control

SN - 1746-8094

ER -