N-Gram Assisted Youtube Spam Comment Detection

Shreyas Aiyar, Nisha P. Shetty

Research output: Contribution to journalConference article

3 Citations (Scopus)

Abstract

This paper proposes a novel methodology for the detection of intrusive comments or spam on the video-sharing website - Youtube. We describe spam comments as those which have a promotional intent or those who deem to be contextually irrelevant for a given video. The prospects of monetisation through advertising on popular social media channels over the years has attracted an increasingly larger number of users. This has in turn led to to the growth of malicious users who have begun to develop automated bots, capable of large-scale orchestrated deployment of spam messages across multiple channels simultaneously. The presence of these comments significantly hurts the reputation of a channel and also the experience of normal users. Youtube themselves have tackled this issue with very limited methods which revolve around blocking comments that contain links. Such methods have proven to be extremely ineffective as Spammers have found ways to bypass such heuristics. Standard machine learning classification algorithms have proven to be somewhat effective but there is still room for better accuracy with new approaches. In this work, we attempt to detect such comments by applying conventional machine learning algorithms such as Random Forest, Support Vector Machine, Naive Bayes along with certain custom heuristics such as N-Grams which have proven to be very effective in detecting and subsequently combating spam comments.

Original languageEnglish
Pages (from-to)174-182
Number of pages9
JournalProcedia Computer Science
Volume132
DOIs
Publication statusPublished - 01-01-2018
Externally publishedYes
Event2018 International Conference on Computational Intelligence and Data Science, ICCIDS 2018 - Gurugram, India
Duration: 07-04-201808-04-2018

Fingerprint

Learning systems
Learning algorithms
Support vector machines
Websites
Marketing

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

Aiyar, Shreyas ; Shetty, Nisha P. / N-Gram Assisted Youtube Spam Comment Detection. In: Procedia Computer Science. 2018 ; Vol. 132. pp. 174-182.
@article{b0b43100eff843588eada558b5f22e39,
title = "N-Gram Assisted Youtube Spam Comment Detection",
abstract = "This paper proposes a novel methodology for the detection of intrusive comments or spam on the video-sharing website - Youtube. We describe spam comments as those which have a promotional intent or those who deem to be contextually irrelevant for a given video. The prospects of monetisation through advertising on popular social media channels over the years has attracted an increasingly larger number of users. This has in turn led to to the growth of malicious users who have begun to develop automated bots, capable of large-scale orchestrated deployment of spam messages across multiple channels simultaneously. The presence of these comments significantly hurts the reputation of a channel and also the experience of normal users. Youtube themselves have tackled this issue with very limited methods which revolve around blocking comments that contain links. Such methods have proven to be extremely ineffective as Spammers have found ways to bypass such heuristics. Standard machine learning classification algorithms have proven to be somewhat effective but there is still room for better accuracy with new approaches. In this work, we attempt to detect such comments by applying conventional machine learning algorithms such as Random Forest, Support Vector Machine, Naive Bayes along with certain custom heuristics such as N-Grams which have proven to be very effective in detecting and subsequently combating spam comments.",
author = "Shreyas Aiyar and Shetty, {Nisha P.}",
year = "2018",
month = "1",
day = "1",
doi = "10.1016/j.procs.2018.05.181",
language = "English",
volume = "132",
pages = "174--182",
journal = "Procedia Computer Science",
issn = "1877-0509",
publisher = "Elsevier BV",

}

N-Gram Assisted Youtube Spam Comment Detection. / Aiyar, Shreyas; Shetty, Nisha P.

In: Procedia Computer Science, Vol. 132, 01.01.2018, p. 174-182.

Research output: Contribution to journalConference article

TY - JOUR

T1 - N-Gram Assisted Youtube Spam Comment Detection

AU - Aiyar, Shreyas

AU - Shetty, Nisha P.

PY - 2018/1/1

Y1 - 2018/1/1

N2 - This paper proposes a novel methodology for the detection of intrusive comments or spam on the video-sharing website - Youtube. We describe spam comments as those which have a promotional intent or those who deem to be contextually irrelevant for a given video. The prospects of monetisation through advertising on popular social media channels over the years has attracted an increasingly larger number of users. This has in turn led to to the growth of malicious users who have begun to develop automated bots, capable of large-scale orchestrated deployment of spam messages across multiple channels simultaneously. The presence of these comments significantly hurts the reputation of a channel and also the experience of normal users. Youtube themselves have tackled this issue with very limited methods which revolve around blocking comments that contain links. Such methods have proven to be extremely ineffective as Spammers have found ways to bypass such heuristics. Standard machine learning classification algorithms have proven to be somewhat effective but there is still room for better accuracy with new approaches. In this work, we attempt to detect such comments by applying conventional machine learning algorithms such as Random Forest, Support Vector Machine, Naive Bayes along with certain custom heuristics such as N-Grams which have proven to be very effective in detecting and subsequently combating spam comments.

AB - This paper proposes a novel methodology for the detection of intrusive comments or spam on the video-sharing website - Youtube. We describe spam comments as those which have a promotional intent or those who deem to be contextually irrelevant for a given video. The prospects of monetisation through advertising on popular social media channels over the years has attracted an increasingly larger number of users. This has in turn led to to the growth of malicious users who have begun to develop automated bots, capable of large-scale orchestrated deployment of spam messages across multiple channels simultaneously. The presence of these comments significantly hurts the reputation of a channel and also the experience of normal users. Youtube themselves have tackled this issue with very limited methods which revolve around blocking comments that contain links. Such methods have proven to be extremely ineffective as Spammers have found ways to bypass such heuristics. Standard machine learning classification algorithms have proven to be somewhat effective but there is still room for better accuracy with new approaches. In this work, we attempt to detect such comments by applying conventional machine learning algorithms such as Random Forest, Support Vector Machine, Naive Bayes along with certain custom heuristics such as N-Grams which have proven to be very effective in detecting and subsequently combating spam comments.

UR - http://www.scopus.com/inward/record.url?scp=85049098753&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049098753&partnerID=8YFLogxK

U2 - 10.1016/j.procs.2018.05.181

DO - 10.1016/j.procs.2018.05.181

M3 - Conference article

AN - SCOPUS:85049098753

VL - 132

SP - 174

EP - 182

JO - Procedia Computer Science

JF - Procedia Computer Science

SN - 1877-0509

ER -