Exploiting heterogeneous features to improve in silico prediction of peptide status - amyloidogenic or non-amyloidogenic

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Background: Prediction of short stretches in protein sequences capable of forming amyloid-like fibrils is important in understanding the underlying cause of amyloid illnesses thereby aiding in the discovery of sequence-targeted anti-aggregation pharmaceuticals. Due to the constraints of experimental molecular techniques in identifying such motif segments, it is highly desirable to develop computational methods to provide better and affordable in silico predictions.Results: Accurate in silico prediction techniques of amyloidogenic peptide regions rely on the cooperation between informative features and classifier design. In this research article, we propose one such efficient fibril prediction implementation exploiting heterogeneous features based on bio-physio-chemical (BPC) properties, auto-correlation function of carefully selected amino acid indices and atomic composition within a protein fragment of amino acids in a window. In an attempt to get an optimal number of BPC features, an evolutionary Support Vector Machine (SVM) integrating a novel implementation of hybrid Genetic Algorithm termed Memetic Algorithm and SVM is utilized. Five prediction modules designed using Artificial Neural Network (ANN) models are trained with independent and integrated features in order to validate the fibril forming motifs. The results provide evidence that incorporating new feature namely auto-correlation function besides BPC, attempt to strengthen the sequence interaction effect in forming the feature vector thereby obtaining better prediction quality in terms of sensitivity, specificity, Mathews Correlation Coefficient and Area under the Receiver Operating Characteristics curve.Conclusion: A significant improvement in performance is observed by introducing features like auto-correlation function that maintains sequence order effect, in addition to the conventional BPC properties selected through a novel optimization strategy to predict the peptide status - amyloidogenic or non-amyloidogenic. The proposed approach achieves acceptable results, comparable to most online predictors. Besides, it compensates the lacuna in existing amyloid fibril prediction tools by maintaining equilibrium between sensitivity and specificity.

Original languageEnglish
Article numberS21
JournalBMC Bioinformatics
Volume12
Issue numberSUPPL. 13
DOIs
Publication statusPublished - 30-11-2011

Fingerprint

Amyloid
Peptides
Computer Simulation
Prediction
Autocorrelation Function
Amino Acids
Sensitivity and Specificity
Autocorrelation
Neural Networks (Computer)
ROC Curve
Proteins
Chemical properties
Specificity
Support vector machines
Amino acids
Support Vector Machine
Hybrid Genetic Algorithm
Memetic Algorithm
Interaction Effects
Research

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

@article{be2f48f05e034b8fba350bf0c255bec0,
title = "Exploiting heterogeneous features to improve in silico prediction of peptide status - amyloidogenic or non-amyloidogenic",
abstract = "Background: Prediction of short stretches in protein sequences capable of forming amyloid-like fibrils is important in understanding the underlying cause of amyloid illnesses thereby aiding in the discovery of sequence-targeted anti-aggregation pharmaceuticals. Due to the constraints of experimental molecular techniques in identifying such motif segments, it is highly desirable to develop computational methods to provide better and affordable in silico predictions.Results: Accurate in silico prediction techniques of amyloidogenic peptide regions rely on the cooperation between informative features and classifier design. In this research article, we propose one such efficient fibril prediction implementation exploiting heterogeneous features based on bio-physio-chemical (BPC) properties, auto-correlation function of carefully selected amino acid indices and atomic composition within a protein fragment of amino acids in a window. In an attempt to get an optimal number of BPC features, an evolutionary Support Vector Machine (SVM) integrating a novel implementation of hybrid Genetic Algorithm termed Memetic Algorithm and SVM is utilized. Five prediction modules designed using Artificial Neural Network (ANN) models are trained with independent and integrated features in order to validate the fibril forming motifs. The results provide evidence that incorporating new feature namely auto-correlation function besides BPC, attempt to strengthen the sequence interaction effect in forming the feature vector thereby obtaining better prediction quality in terms of sensitivity, specificity, Mathews Correlation Coefficient and Area under the Receiver Operating Characteristics curve.Conclusion: A significant improvement in performance is observed by introducing features like auto-correlation function that maintains sequence order effect, in addition to the conventional BPC properties selected through a novel optimization strategy to predict the peptide status - amyloidogenic or non-amyloidogenic. The proposed approach achieves acceptable results, comparable to most online predictors. Besides, it compensates the lacuna in existing amyloid fibril prediction tools by maintaining equilibrium between sensitivity and specificity.",
author = "Nair, {Smitha S.K.} and {Subba Reddy}, {N. V.} and Hareesha, {K. S.}",
year = "2011",
month = "11",
day = "30",
doi = "10.1186/1471-2105-12-S13-S21",
language = "English",
volume = "12",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "SUPPL. 13",

}

Exploiting heterogeneous features to improve in silico prediction of peptide status - amyloidogenic or non-amyloidogenic. / Nair, Smitha S.K.; Subba Reddy, N. V.; Hareesha, K. S.

In: BMC Bioinformatics, Vol. 12, No. SUPPL. 13, S21, 30.11.2011.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Exploiting heterogeneous features to improve in silico prediction of peptide status - amyloidogenic or non-amyloidogenic

AU - Nair, Smitha S.K.

AU - Subba Reddy, N. V.

AU - Hareesha, K. S.

PY - 2011/11/30

Y1 - 2011/11/30

N2 - Background: Prediction of short stretches in protein sequences capable of forming amyloid-like fibrils is important in understanding the underlying cause of amyloid illnesses thereby aiding in the discovery of sequence-targeted anti-aggregation pharmaceuticals. Due to the constraints of experimental molecular techniques in identifying such motif segments, it is highly desirable to develop computational methods to provide better and affordable in silico predictions.Results: Accurate in silico prediction techniques of amyloidogenic peptide regions rely on the cooperation between informative features and classifier design. In this research article, we propose one such efficient fibril prediction implementation exploiting heterogeneous features based on bio-physio-chemical (BPC) properties, auto-correlation function of carefully selected amino acid indices and atomic composition within a protein fragment of amino acids in a window. In an attempt to get an optimal number of BPC features, an evolutionary Support Vector Machine (SVM) integrating a novel implementation of hybrid Genetic Algorithm termed Memetic Algorithm and SVM is utilized. Five prediction modules designed using Artificial Neural Network (ANN) models are trained with independent and integrated features in order to validate the fibril forming motifs. The results provide evidence that incorporating new feature namely auto-correlation function besides BPC, attempt to strengthen the sequence interaction effect in forming the feature vector thereby obtaining better prediction quality in terms of sensitivity, specificity, Mathews Correlation Coefficient and Area under the Receiver Operating Characteristics curve.Conclusion: A significant improvement in performance is observed by introducing features like auto-correlation function that maintains sequence order effect, in addition to the conventional BPC properties selected through a novel optimization strategy to predict the peptide status - amyloidogenic or non-amyloidogenic. The proposed approach achieves acceptable results, comparable to most online predictors. Besides, it compensates the lacuna in existing amyloid fibril prediction tools by maintaining equilibrium between sensitivity and specificity.

AB - Background: Prediction of short stretches in protein sequences capable of forming amyloid-like fibrils is important in understanding the underlying cause of amyloid illnesses thereby aiding in the discovery of sequence-targeted anti-aggregation pharmaceuticals. Due to the constraints of experimental molecular techniques in identifying such motif segments, it is highly desirable to develop computational methods to provide better and affordable in silico predictions.Results: Accurate in silico prediction techniques of amyloidogenic peptide regions rely on the cooperation between informative features and classifier design. In this research article, we propose one such efficient fibril prediction implementation exploiting heterogeneous features based on bio-physio-chemical (BPC) properties, auto-correlation function of carefully selected amino acid indices and atomic composition within a protein fragment of amino acids in a window. In an attempt to get an optimal number of BPC features, an evolutionary Support Vector Machine (SVM) integrating a novel implementation of hybrid Genetic Algorithm termed Memetic Algorithm and SVM is utilized. Five prediction modules designed using Artificial Neural Network (ANN) models are trained with independent and integrated features in order to validate the fibril forming motifs. The results provide evidence that incorporating new feature namely auto-correlation function besides BPC, attempt to strengthen the sequence interaction effect in forming the feature vector thereby obtaining better prediction quality in terms of sensitivity, specificity, Mathews Correlation Coefficient and Area under the Receiver Operating Characteristics curve.Conclusion: A significant improvement in performance is observed by introducing features like auto-correlation function that maintains sequence order effect, in addition to the conventional BPC properties selected through a novel optimization strategy to predict the peptide status - amyloidogenic or non-amyloidogenic. The proposed approach achieves acceptable results, comparable to most online predictors. Besides, it compensates the lacuna in existing amyloid fibril prediction tools by maintaining equilibrium between sensitivity and specificity.

UR - http://www.scopus.com/inward/record.url?scp=84864052876&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864052876&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-12-S13-S21

DO - 10.1186/1471-2105-12-S13-S21

M3 - Article

VL - 12

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL. 13

M1 - S21

ER -