Spam mail detection through data mining techniques

Shubhi Shrivastava, R. Anju

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In todays electronic world a huge part of communication, both professional and private, takes place in the form of electronic mails or emails. However, due to advertising agencies and social networking websites most of the emails circulated contain unwanted information which is not relevant to the user. Spam emails are a type of electronic mail where the user receives unsolicited messages via email. Spam emails cause inconvenience and financial loss to the recipients so there is a need to filter them and separate them from the legitimate emails. Many algorithms and filters have been developed to detect the spam emails but spammers continuously evolve and sophisticate their spamming techniques due to which the existing filters are becoming less effective. The method proposed in this paper involves creating a spam filter using binary and continuous probability distributions. The algorithms implemented in building the classifier model are Naive Bayes and Decision Trees. The effect of overfitting on the performance and accuracy of decision trees is analyzed. Finally, the better classifier model is identified based on its accuracy to correctly classify spam and non-spam emails.

Original languageEnglish
Title of host publicationICCT 2017 - International Conference on Intelligent Communication and Computational Techniques
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages61-64
Number of pages4
Volume2018-January
ISBN (Electronic)9781538630303
DOIs
Publication statusPublished - 23-03-2018
Externally publishedYes
Event2017 International Conference on Intelligent Communication and Computational Techniques, ICCT 2017 - Jaipur, India
Duration: 22-12-201723-12-2017

Conference

Conference2017 International Conference on Intelligent Communication and Computational Techniques, ICCT 2017
CountryIndia
CityJaipur
Period22-12-1723-12-17

Fingerprint

Spam
Electronic mail
Electronic Mail
Data mining
Data Mining
Filter
Decision trees
Decision tree
Classifiers
Spamming
Classifier
Social Networking
Naive Bayes
Overfitting
Continuous Distributions
Probability distributions
Websites
Marketing
Probability Distribution
Classify

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Computational Mathematics
  • Control and Optimization
  • Artificial Intelligence

Cite this

Shrivastava, S., & Anju, R. (2018). Spam mail detection through data mining techniques. In ICCT 2017 - International Conference on Intelligent Communication and Computational Techniques (Vol. 2018-January, pp. 61-64). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/INTELCCT.2017.8324021
Shrivastava, Shubhi ; Anju, R. / Spam mail detection through data mining techniques. ICCT 2017 - International Conference on Intelligent Communication and Computational Techniques. Vol. 2018-January Institute of Electrical and Electronics Engineers Inc., 2018. pp. 61-64
@inproceedings{014ad879d5e64a6eb6baba21297e8885,
title = "Spam mail detection through data mining techniques",
abstract = "In todays electronic world a huge part of communication, both professional and private, takes place in the form of electronic mails or emails. However, due to advertising agencies and social networking websites most of the emails circulated contain unwanted information which is not relevant to the user. Spam emails are a type of electronic mail where the user receives unsolicited messages via email. Spam emails cause inconvenience and financial loss to the recipients so there is a need to filter them and separate them from the legitimate emails. Many algorithms and filters have been developed to detect the spam emails but spammers continuously evolve and sophisticate their spamming techniques due to which the existing filters are becoming less effective. The method proposed in this paper involves creating a spam filter using binary and continuous probability distributions. The algorithms implemented in building the classifier model are Naive Bayes and Decision Trees. The effect of overfitting on the performance and accuracy of decision trees is analyzed. Finally, the better classifier model is identified based on its accuracy to correctly classify spam and non-spam emails.",
author = "Shubhi Shrivastava and R. Anju",
year = "2018",
month = "3",
day = "23",
doi = "10.1109/INTELCCT.2017.8324021",
language = "English",
volume = "2018-January",
pages = "61--64",
booktitle = "ICCT 2017 - International Conference on Intelligent Communication and Computational Techniques",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Shrivastava, S & Anju, R 2018, Spam mail detection through data mining techniques. in ICCT 2017 - International Conference on Intelligent Communication and Computational Techniques. vol. 2018-January, Institute of Electrical and Electronics Engineers Inc., pp. 61-64, 2017 International Conference on Intelligent Communication and Computational Techniques, ICCT 2017, Jaipur, India, 22-12-17. https://doi.org/10.1109/INTELCCT.2017.8324021

Spam mail detection through data mining techniques. / Shrivastava, Shubhi; Anju, R.

ICCT 2017 - International Conference on Intelligent Communication and Computational Techniques. Vol. 2018-January Institute of Electrical and Electronics Engineers Inc., 2018. p. 61-64.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Spam mail detection through data mining techniques

AU - Shrivastava, Shubhi

AU - Anju, R.

PY - 2018/3/23

Y1 - 2018/3/23

N2 - In todays electronic world a huge part of communication, both professional and private, takes place in the form of electronic mails or emails. However, due to advertising agencies and social networking websites most of the emails circulated contain unwanted information which is not relevant to the user. Spam emails are a type of electronic mail where the user receives unsolicited messages via email. Spam emails cause inconvenience and financial loss to the recipients so there is a need to filter them and separate them from the legitimate emails. Many algorithms and filters have been developed to detect the spam emails but spammers continuously evolve and sophisticate their spamming techniques due to which the existing filters are becoming less effective. The method proposed in this paper involves creating a spam filter using binary and continuous probability distributions. The algorithms implemented in building the classifier model are Naive Bayes and Decision Trees. The effect of overfitting on the performance and accuracy of decision trees is analyzed. Finally, the better classifier model is identified based on its accuracy to correctly classify spam and non-spam emails.

AB - In todays electronic world a huge part of communication, both professional and private, takes place in the form of electronic mails or emails. However, due to advertising agencies and social networking websites most of the emails circulated contain unwanted information which is not relevant to the user. Spam emails are a type of electronic mail where the user receives unsolicited messages via email. Spam emails cause inconvenience and financial loss to the recipients so there is a need to filter them and separate them from the legitimate emails. Many algorithms and filters have been developed to detect the spam emails but spammers continuously evolve and sophisticate their spamming techniques due to which the existing filters are becoming less effective. The method proposed in this paper involves creating a spam filter using binary and continuous probability distributions. The algorithms implemented in building the classifier model are Naive Bayes and Decision Trees. The effect of overfitting on the performance and accuracy of decision trees is analyzed. Finally, the better classifier model is identified based on its accuracy to correctly classify spam and non-spam emails.

UR - http://www.scopus.com/inward/record.url?scp=85048111602&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048111602&partnerID=8YFLogxK

U2 - 10.1109/INTELCCT.2017.8324021

DO - 10.1109/INTELCCT.2017.8324021

M3 - Conference contribution

AN - SCOPUS:85048111602

VL - 2018-January

SP - 61

EP - 64

BT - ICCT 2017 - International Conference on Intelligent Communication and Computational Techniques

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Shrivastava S, Anju R. Spam mail detection through data mining techniques. In ICCT 2017 - International Conference on Intelligent Communication and Computational Techniques. Vol. 2018-January. Institute of Electrical and Electronics Engineers Inc. 2018. p. 61-64 https://doi.org/10.1109/INTELCCT.2017.8324021