Machine learning study of classifiers trained with biophysiochemical properties of amino acids to predict fibril forming peptide motifs

Smitha Sunil Kumaran Nair, N. V. Subba Reddy, K. S. Hareesha

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

It is important to understand the cause of amyloid illnesses by predicting the short protein fragments capable of forming amyloid-like fibril motifs aiding in the discovery of sequence-targeted anti-aggregation drugs. It is extremely desirable to design computational tools to provide affordable in silico predictions owing to the limitations of molecular techniques for their identification. In this research article, we tried to study, from a machine learning perspective, the performance of several machine learning classifiers that use heterogenous features based on biochemical and biophysical properties of amino acids to discriminate between amyloidogenic and non-amyloidogenic regions in peptides. Four conventional machine learning classifiers namely Support Vector Machine, Neural network, Decision tree and Random forest were trained and tested to find the best classifier that fits the problem domain well. Prior to classification, novel implementations of two biologically- inspired feature optimization techniques based on evolutionary algorithms and methodologies that mimic social life and a multivariate method based on projection are utilized in order to remove the unimportant and uninformative features. Among the dimenionality reduction algorithms considered under the study, prediction results show that algorithms based on evolutionary computation is the most effective. SVM best suits the problem domain in its fitment among the classifiers considered. The best classifier is also compared with an online predictor to evidence the equilibrium maintained between true positive rates and false positive rates in the proposed classifier. This exploratory study suggests that these methods are promising in providing amyloidogenity prediction and may be further extended for large-scale proteomic studies.

Original languageEnglish
Pages (from-to)917-923
Number of pages7
JournalProtein and Peptide Letters
Volume19
Issue number9
DOIs
Publication statusPublished - 09-2012

Fingerprint

Learning systems
Classifiers
Amyloid
Amino Acids
Peptides
Decision Trees
Evolutionary algorithms
Computer Simulation
Proteomics
Decision trees
Research
Pharmaceutical Preparations
Support vector machines
Machine Learning
Proteins
Agglomeration
Neural networks
Forests
Support Vector Machine

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry

Cite this

@article{a6296c0743ea4137a49410ad423b8b0f,
title = "Machine learning study of classifiers trained with biophysiochemical properties of amino acids to predict fibril forming peptide motifs",
abstract = "It is important to understand the cause of amyloid illnesses by predicting the short protein fragments capable of forming amyloid-like fibril motifs aiding in the discovery of sequence-targeted anti-aggregation drugs. It is extremely desirable to design computational tools to provide affordable in silico predictions owing to the limitations of molecular techniques for their identification. In this research article, we tried to study, from a machine learning perspective, the performance of several machine learning classifiers that use heterogenous features based on biochemical and biophysical properties of amino acids to discriminate between amyloidogenic and non-amyloidogenic regions in peptides. Four conventional machine learning classifiers namely Support Vector Machine, Neural network, Decision tree and Random forest were trained and tested to find the best classifier that fits the problem domain well. Prior to classification, novel implementations of two biologically- inspired feature optimization techniques based on evolutionary algorithms and methodologies that mimic social life and a multivariate method based on projection are utilized in order to remove the unimportant and uninformative features. Among the dimenionality reduction algorithms considered under the study, prediction results show that algorithms based on evolutionary computation is the most effective. SVM best suits the problem domain in its fitment among the classifiers considered. The best classifier is also compared with an online predictor to evidence the equilibrium maintained between true positive rates and false positive rates in the proposed classifier. This exploratory study suggests that these methods are promising in providing amyloidogenity prediction and may be further extended for large-scale proteomic studies.",
author = "{Kumaran Nair}, {Smitha Sunil} and {Subba Reddy}, {N. V.} and Hareesha, {K. S.}",
year = "2012",
month = "9",
doi = "10.2174/092986612802084429",
language = "English",
volume = "19",
pages = "917--923",
journal = "Protein and Peptide Letters",
issn = "0929-8665",
publisher = "Bentham Science Publishers B.V.",
number = "9",

}

TY - JOUR

T1 - Machine learning study of classifiers trained with biophysiochemical properties of amino acids to predict fibril forming peptide motifs

AU - Kumaran Nair, Smitha Sunil

AU - Subba Reddy, N. V.

AU - Hareesha, K. S.

PY - 2012/9

Y1 - 2012/9

N2 - It is important to understand the cause of amyloid illnesses by predicting the short protein fragments capable of forming amyloid-like fibril motifs aiding in the discovery of sequence-targeted anti-aggregation drugs. It is extremely desirable to design computational tools to provide affordable in silico predictions owing to the limitations of molecular techniques for their identification. In this research article, we tried to study, from a machine learning perspective, the performance of several machine learning classifiers that use heterogenous features based on biochemical and biophysical properties of amino acids to discriminate between amyloidogenic and non-amyloidogenic regions in peptides. Four conventional machine learning classifiers namely Support Vector Machine, Neural network, Decision tree and Random forest were trained and tested to find the best classifier that fits the problem domain well. Prior to classification, novel implementations of two biologically- inspired feature optimization techniques based on evolutionary algorithms and methodologies that mimic social life and a multivariate method based on projection are utilized in order to remove the unimportant and uninformative features. Among the dimenionality reduction algorithms considered under the study, prediction results show that algorithms based on evolutionary computation is the most effective. SVM best suits the problem domain in its fitment among the classifiers considered. The best classifier is also compared with an online predictor to evidence the equilibrium maintained between true positive rates and false positive rates in the proposed classifier. This exploratory study suggests that these methods are promising in providing amyloidogenity prediction and may be further extended for large-scale proteomic studies.

AB - It is important to understand the cause of amyloid illnesses by predicting the short protein fragments capable of forming amyloid-like fibril motifs aiding in the discovery of sequence-targeted anti-aggregation drugs. It is extremely desirable to design computational tools to provide affordable in silico predictions owing to the limitations of molecular techniques for their identification. In this research article, we tried to study, from a machine learning perspective, the performance of several machine learning classifiers that use heterogenous features based on biochemical and biophysical properties of amino acids to discriminate between amyloidogenic and non-amyloidogenic regions in peptides. Four conventional machine learning classifiers namely Support Vector Machine, Neural network, Decision tree and Random forest were trained and tested to find the best classifier that fits the problem domain well. Prior to classification, novel implementations of two biologically- inspired feature optimization techniques based on evolutionary algorithms and methodologies that mimic social life and a multivariate method based on projection are utilized in order to remove the unimportant and uninformative features. Among the dimenionality reduction algorithms considered under the study, prediction results show that algorithms based on evolutionary computation is the most effective. SVM best suits the problem domain in its fitment among the classifiers considered. The best classifier is also compared with an online predictor to evidence the equilibrium maintained between true positive rates and false positive rates in the proposed classifier. This exploratory study suggests that these methods are promising in providing amyloidogenity prediction and may be further extended for large-scale proteomic studies.

UR - http://www.scopus.com/inward/record.url?scp=84865800237&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84865800237&partnerID=8YFLogxK

U2 - 10.2174/092986612802084429

DO - 10.2174/092986612802084429

M3 - Article

C2 - 22486618

AN - SCOPUS:84865800237

VL - 19

SP - 917

EP - 923

JO - Protein and Peptide Letters

JF - Protein and Peptide Letters

SN - 0929-8665

IS - 9

ER -