Bench marking of classification algorithms

Decision Trees and Random Forests-a case study using R

Manish Varma Datla

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Decision Trees and Random Forests are leading Machine Learning Algorithms, which are used for Classification purposes. Through the course of this paper, a comparison is made of classification results of these two algorithms, for classifying data sets obtained from Kaggle's Bike Sharing System and Titanic problems. The solution methodology deployed is primarily broken into two segments. First, being Feature Engineering where the given instance variables are made noise free and two or more variables are used together to give rise to a valuable third. Secondly, the classification parameters are worked out, consisting of correctly classified instances, incorrectly classified instances, Precision and Accuracy. This process ensured that the instance variables and classification parameters were best treated before they were deployed with the two algorithms i.e. Decision Trees and Random Forests. The developed model has been validated by using Systems data and the Classification results. From the model it can safely be concluded that for all classification problems Decision Trees is handy with small data sets i.e. less number of instances and Random Forests gives better results for the same number of attributes and large data sets i.e. with greater number of instances. R language has been used to solve the problem and to present the results.

Original languageEnglish
Title of host publicationInternational Conference on Trends in Automation, Communication and Computing Technologies, I-TACT 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781467366670
DOIs
Publication statusPublished - 15-06-2016
Externally publishedYes
Event2015 International Conference on Trends in Automation, Communication and Computing Technologies, I-TACT 2015 - Bangalore, India
Duration: 21-12-201522-12-2015

Conference

Conference2015 International Conference on Trends in Automation, Communication and Computing Technologies, I-TACT 2015
CountryIndia
CityBangalore
Period21-12-1522-12-15

Fingerprint

Decision trees
seats
marking
machine learning
data systems
classifying
Learning algorithms
Learning systems
engineering
methodology

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Control and Systems Engineering
  • Instrumentation

Cite this

Datla, M. V. (2016). Bench marking of classification algorithms: Decision Trees and Random Forests-a case study using R. In International Conference on Trends in Automation, Communication and Computing Technologies, I-TACT 2015 [7492647] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ITACT.2015.7492647
Datla, Manish Varma. / Bench marking of classification algorithms : Decision Trees and Random Forests-a case study using R. International Conference on Trends in Automation, Communication and Computing Technologies, I-TACT 2015. Institute of Electrical and Electronics Engineers Inc., 2016.
@inproceedings{b22775d405c7435193977673f136a05f,
title = "Bench marking of classification algorithms: Decision Trees and Random Forests-a case study using R",
abstract = "Decision Trees and Random Forests are leading Machine Learning Algorithms, which are used for Classification purposes. Through the course of this paper, a comparison is made of classification results of these two algorithms, for classifying data sets obtained from Kaggle's Bike Sharing System and Titanic problems. The solution methodology deployed is primarily broken into two segments. First, being Feature Engineering where the given instance variables are made noise free and two or more variables are used together to give rise to a valuable third. Secondly, the classification parameters are worked out, consisting of correctly classified instances, incorrectly classified instances, Precision and Accuracy. This process ensured that the instance variables and classification parameters were best treated before they were deployed with the two algorithms i.e. Decision Trees and Random Forests. The developed model has been validated by using Systems data and the Classification results. From the model it can safely be concluded that for all classification problems Decision Trees is handy with small data sets i.e. less number of instances and Random Forests gives better results for the same number of attributes and large data sets i.e. with greater number of instances. R language has been used to solve the problem and to present the results.",
author = "Datla, {Manish Varma}",
year = "2016",
month = "6",
day = "15",
doi = "10.1109/ITACT.2015.7492647",
language = "English",
booktitle = "International Conference on Trends in Automation, Communication and Computing Technologies, I-TACT 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Datla, MV 2016, Bench marking of classification algorithms: Decision Trees and Random Forests-a case study using R. in International Conference on Trends in Automation, Communication and Computing Technologies, I-TACT 2015., 7492647, Institute of Electrical and Electronics Engineers Inc., 2015 International Conference on Trends in Automation, Communication and Computing Technologies, I-TACT 2015, Bangalore, India, 21-12-15. https://doi.org/10.1109/ITACT.2015.7492647

Bench marking of classification algorithms : Decision Trees and Random Forests-a case study using R. / Datla, Manish Varma.

International Conference on Trends in Automation, Communication and Computing Technologies, I-TACT 2015. Institute of Electrical and Electronics Engineers Inc., 2016. 7492647.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Bench marking of classification algorithms

T2 - Decision Trees and Random Forests-a case study using R

AU - Datla, Manish Varma

PY - 2016/6/15

Y1 - 2016/6/15

N2 - Decision Trees and Random Forests are leading Machine Learning Algorithms, which are used for Classification purposes. Through the course of this paper, a comparison is made of classification results of these two algorithms, for classifying data sets obtained from Kaggle's Bike Sharing System and Titanic problems. The solution methodology deployed is primarily broken into two segments. First, being Feature Engineering where the given instance variables are made noise free and two or more variables are used together to give rise to a valuable third. Secondly, the classification parameters are worked out, consisting of correctly classified instances, incorrectly classified instances, Precision and Accuracy. This process ensured that the instance variables and classification parameters were best treated before they were deployed with the two algorithms i.e. Decision Trees and Random Forests. The developed model has been validated by using Systems data and the Classification results. From the model it can safely be concluded that for all classification problems Decision Trees is handy with small data sets i.e. less number of instances and Random Forests gives better results for the same number of attributes and large data sets i.e. with greater number of instances. R language has been used to solve the problem and to present the results.

AB - Decision Trees and Random Forests are leading Machine Learning Algorithms, which are used for Classification purposes. Through the course of this paper, a comparison is made of classification results of these two algorithms, for classifying data sets obtained from Kaggle's Bike Sharing System and Titanic problems. The solution methodology deployed is primarily broken into two segments. First, being Feature Engineering where the given instance variables are made noise free and two or more variables are used together to give rise to a valuable third. Secondly, the classification parameters are worked out, consisting of correctly classified instances, incorrectly classified instances, Precision and Accuracy. This process ensured that the instance variables and classification parameters were best treated before they were deployed with the two algorithms i.e. Decision Trees and Random Forests. The developed model has been validated by using Systems data and the Classification results. From the model it can safely be concluded that for all classification problems Decision Trees is handy with small data sets i.e. less number of instances and Random Forests gives better results for the same number of attributes and large data sets i.e. with greater number of instances. R language has been used to solve the problem and to present the results.

UR - http://www.scopus.com/inward/record.url?scp=84979282682&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979282682&partnerID=8YFLogxK

U2 - 10.1109/ITACT.2015.7492647

DO - 10.1109/ITACT.2015.7492647

M3 - Conference contribution

BT - International Conference on Trends in Automation, Communication and Computing Technologies, I-TACT 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Datla MV. Bench marking of classification algorithms: Decision Trees and Random Forests-a case study using R. In International Conference on Trends in Automation, Communication and Computing Technologies, I-TACT 2015. Institute of Electrical and Electronics Engineers Inc. 2016. 7492647 https://doi.org/10.1109/ITACT.2015.7492647