Text based machine learning using discriminative classifiers

Rongon Chatterjee, Vasundhara Acharya, Krishna Prakasha, R. Vijaya Arjunan

Research output: Contribution to journalArticle

Abstract

Ever since the invention of computer, a curiosity exists to see if it can be made to learn. If humans could understand how to program them and learn to improve automatically with experience, the impact would be dramatic. A successful understanding of how to make computers learn would open up many new uses of computers and new levels of competence and customization. In this paper, two applications of Machine Learning are explored. In the first one, linear regression to understand the correlation of the feature columns with the output and make predictions based on the “line of best fit” is given. In the second one, discriminative classifiers for analyzing and segregating text-based data is proposed. On applying regression analysis on advertising data, it is observed that TV advertising has the strongest linear correlation with sales. In the later section, text-based machine learning is employed using the scikit-learn library of Python. Multiple contemporary classifiers are applied on a set of SMS’s to perform spam detection. The performance of the classifiers is evaluated using suitable accuracy metrics. The results show that the Naïve Bayes algorithm is much faster than other algorithms such as Logistic Regression. Using a Bayesian probabilistic approach, a spam ratio is attached to all the tokens in the input set. The proposed work proves to be helpful in the field of advertising and spam detection systems.

Original languageEnglish
Pages (from-to)32-41
Number of pages10
JournalJournal of Advanced Research in Dynamical and Control Systems
Volume11
Issue number7
Publication statusPublished - 01-01-2019

Fingerprint

Learning systems
Marketing
Classifiers
Patents and inventions
Linear regression
Regression analysis
Logistics
Sales

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Engineering(all)

Cite this

Chatterjee, R., Acharya, V., Prakasha, K., & Arjunan, R. V. (2019). Text based machine learning using discriminative classifiers. Journal of Advanced Research in Dynamical and Control Systems, 11(7), 32-41.
Chatterjee, Rongon ; Acharya, Vasundhara ; Prakasha, Krishna ; Arjunan, R. Vijaya. / Text based machine learning using discriminative classifiers. In: Journal of Advanced Research in Dynamical and Control Systems. 2019 ; Vol. 11, No. 7. pp. 32-41.
@article{22c84281222e4daab067e87607f4e352,
title = "Text based machine learning using discriminative classifiers",
abstract = "Ever since the invention of computer, a curiosity exists to see if it can be made to learn. If humans could understand how to program them and learn to improve automatically with experience, the impact would be dramatic. A successful understanding of how to make computers learn would open up many new uses of computers and new levels of competence and customization. In this paper, two applications of Machine Learning are explored. In the first one, linear regression to understand the correlation of the feature columns with the output and make predictions based on the “line of best fit” is given. In the second one, discriminative classifiers for analyzing and segregating text-based data is proposed. On applying regression analysis on advertising data, it is observed that TV advertising has the strongest linear correlation with sales. In the later section, text-based machine learning is employed using the scikit-learn library of Python. Multiple contemporary classifiers are applied on a set of SMS’s to perform spam detection. The performance of the classifiers is evaluated using suitable accuracy metrics. The results show that the Na{\"i}ve Bayes algorithm is much faster than other algorithms such as Logistic Regression. Using a Bayesian probabilistic approach, a spam ratio is attached to all the tokens in the input set. The proposed work proves to be helpful in the field of advertising and spam detection systems.",
author = "Rongon Chatterjee and Vasundhara Acharya and Krishna Prakasha and Arjunan, {R. Vijaya}",
year = "2019",
month = "1",
day = "1",
language = "English",
volume = "11",
pages = "32--41",
journal = "Journal of Advanced Research in Dynamical and Control Systems",
issn = "1943-023X",
publisher = "Institute of Advanced Scientific Research",
number = "7",

}

Chatterjee, R, Acharya, V, Prakasha, K & Arjunan, RV 2019, 'Text based machine learning using discriminative classifiers', Journal of Advanced Research in Dynamical and Control Systems, vol. 11, no. 7, pp. 32-41.

Text based machine learning using discriminative classifiers. / Chatterjee, Rongon; Acharya, Vasundhara; Prakasha, Krishna; Arjunan, R. Vijaya.

In: Journal of Advanced Research in Dynamical and Control Systems, Vol. 11, No. 7, 01.01.2019, p. 32-41.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Text based machine learning using discriminative classifiers

AU - Chatterjee, Rongon

AU - Acharya, Vasundhara

AU - Prakasha, Krishna

AU - Arjunan, R. Vijaya

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Ever since the invention of computer, a curiosity exists to see if it can be made to learn. If humans could understand how to program them and learn to improve automatically with experience, the impact would be dramatic. A successful understanding of how to make computers learn would open up many new uses of computers and new levels of competence and customization. In this paper, two applications of Machine Learning are explored. In the first one, linear regression to understand the correlation of the feature columns with the output and make predictions based on the “line of best fit” is given. In the second one, discriminative classifiers for analyzing and segregating text-based data is proposed. On applying regression analysis on advertising data, it is observed that TV advertising has the strongest linear correlation with sales. In the later section, text-based machine learning is employed using the scikit-learn library of Python. Multiple contemporary classifiers are applied on a set of SMS’s to perform spam detection. The performance of the classifiers is evaluated using suitable accuracy metrics. The results show that the Naïve Bayes algorithm is much faster than other algorithms such as Logistic Regression. Using a Bayesian probabilistic approach, a spam ratio is attached to all the tokens in the input set. The proposed work proves to be helpful in the field of advertising and spam detection systems.

AB - Ever since the invention of computer, a curiosity exists to see if it can be made to learn. If humans could understand how to program them and learn to improve automatically with experience, the impact would be dramatic. A successful understanding of how to make computers learn would open up many new uses of computers and new levels of competence and customization. In this paper, two applications of Machine Learning are explored. In the first one, linear regression to understand the correlation of the feature columns with the output and make predictions based on the “line of best fit” is given. In the second one, discriminative classifiers for analyzing and segregating text-based data is proposed. On applying regression analysis on advertising data, it is observed that TV advertising has the strongest linear correlation with sales. In the later section, text-based machine learning is employed using the scikit-learn library of Python. Multiple contemporary classifiers are applied on a set of SMS’s to perform spam detection. The performance of the classifiers is evaluated using suitable accuracy metrics. The results show that the Naïve Bayes algorithm is much faster than other algorithms such as Logistic Regression. Using a Bayesian probabilistic approach, a spam ratio is attached to all the tokens in the input set. The proposed work proves to be helpful in the field of advertising and spam detection systems.

UR - http://www.scopus.com/inward/record.url?scp=85073333734&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073333734&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:85073333734

VL - 11

SP - 32

EP - 41

JO - Journal of Advanced Research in Dynamical and Control Systems

JF - Journal of Advanced Research in Dynamical and Control Systems

SN - 1943-023X

IS - 7

ER -