Development of real time analytics of movies review data using PySpark

Prakash K. Aithal, U. Dinesh Acharya, M. Geetha

Research output: Contribution to journalReview article

Abstract

The data play the vital role in every organization. The data can be divided into structured, semi-structured and unstructured. One can not process the unstructured data in real-time using RDBMS or Hadoop. Spark is an extension of Hadoop architecture which clubs the goodness of both Hadoop and Storm. Spark supports languages such as Scala, Java, Python, and R. The proposed method uses PySpark to analyze the movies review dataset of 50000 reviews by 36409 peoplefor 1539 movies in real-time. Since movie reviews are written by many users in real-time, it is necessary for real-time data analysis. This method finds all the users who are very activein writing the reviews of the movies. This analytics may be used for giving incentives to the active reviewers. Further, the information about more popular movies based on reviews can be gained through analytics. To achieve these tasks basic map, reduce and filter functionalities have been applied. It is found from the analytics that the Movie code B002VL2PTU has been reviewed by the maximum number of people and also it is determined that maximum of 112 reviewswerewrittenbythe single user with code A3LZGLA88K0LA0. The frequency count of words in the movie review is accomplished, and sentiment of the user can be analyzed using unigrams.

Original languageEnglish
Pages (from-to)542-545
Number of pages4
JournalInternational Journal of Recent Technology and Engineering
Volume7
Issue number5
Publication statusPublished - 01-01-2019

Fingerprint

Electric sparks
Movies
Hadoop
Incentives
Language
MapReduce
Sentiment
Java
Filter
Functionality
Clubs

All Science Journal Classification (ASJC) codes

  • Engineering(all)
  • Management of Technology and Innovation

Cite this

@article{c84091f26cd54aa38a8f2c0991dc016a,
title = "Development of real time analytics of movies review data using PySpark",
abstract = "The data play the vital role in every organization. The data can be divided into structured, semi-structured and unstructured. One can not process the unstructured data in real-time using RDBMS or Hadoop. Spark is an extension of Hadoop architecture which clubs the goodness of both Hadoop and Storm. Spark supports languages such as Scala, Java, Python, and R. The proposed method uses PySpark to analyze the movies review dataset of 50000 reviews by 36409 peoplefor 1539 movies in real-time. Since movie reviews are written by many users in real-time, it is necessary for real-time data analysis. This method finds all the users who are very activein writing the reviews of the movies. This analytics may be used for giving incentives to the active reviewers. Further, the information about more popular movies based on reviews can be gained through analytics. To achieve these tasks basic map, reduce and filter functionalities have been applied. It is found from the analytics that the Movie code B002VL2PTU has been reviewed by the maximum number of people and also it is determined that maximum of 112 reviewswerewrittenbythe single user with code A3LZGLA88K0LA0. The frequency count of words in the movie review is accomplished, and sentiment of the user can be analyzed using unigrams.",
author = "Aithal, {Prakash K.} and {Dinesh Acharya}, U. and M. Geetha",
year = "2019",
month = "1",
day = "1",
language = "English",
volume = "7",
pages = "542--545",
journal = "International Journal of Recent Technology and Engineering",
issn = "2277-3878",
publisher = "Blue Eyes Intelligence Engineering and Sciences Publication",
number = "5",

}

Development of real time analytics of movies review data using PySpark. / Aithal, Prakash K.; Dinesh Acharya, U.; Geetha, M.

In: International Journal of Recent Technology and Engineering, Vol. 7, No. 5, 01.01.2019, p. 542-545.

Research output: Contribution to journalReview article

TY - JOUR

T1 - Development of real time analytics of movies review data using PySpark

AU - Aithal, Prakash K.

AU - Dinesh Acharya, U.

AU - Geetha, M.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - The data play the vital role in every organization. The data can be divided into structured, semi-structured and unstructured. One can not process the unstructured data in real-time using RDBMS or Hadoop. Spark is an extension of Hadoop architecture which clubs the goodness of both Hadoop and Storm. Spark supports languages such as Scala, Java, Python, and R. The proposed method uses PySpark to analyze the movies review dataset of 50000 reviews by 36409 peoplefor 1539 movies in real-time. Since movie reviews are written by many users in real-time, it is necessary for real-time data analysis. This method finds all the users who are very activein writing the reviews of the movies. This analytics may be used for giving incentives to the active reviewers. Further, the information about more popular movies based on reviews can be gained through analytics. To achieve these tasks basic map, reduce and filter functionalities have been applied. It is found from the analytics that the Movie code B002VL2PTU has been reviewed by the maximum number of people and also it is determined that maximum of 112 reviewswerewrittenbythe single user with code A3LZGLA88K0LA0. The frequency count of words in the movie review is accomplished, and sentiment of the user can be analyzed using unigrams.

AB - The data play the vital role in every organization. The data can be divided into structured, semi-structured and unstructured. One can not process the unstructured data in real-time using RDBMS or Hadoop. Spark is an extension of Hadoop architecture which clubs the goodness of both Hadoop and Storm. Spark supports languages such as Scala, Java, Python, and R. The proposed method uses PySpark to analyze the movies review dataset of 50000 reviews by 36409 peoplefor 1539 movies in real-time. Since movie reviews are written by many users in real-time, it is necessary for real-time data analysis. This method finds all the users who are very activein writing the reviews of the movies. This analytics may be used for giving incentives to the active reviewers. Further, the information about more popular movies based on reviews can be gained through analytics. To achieve these tasks basic map, reduce and filter functionalities have been applied. It is found from the analytics that the Movie code B002VL2PTU has been reviewed by the maximum number of people and also it is determined that maximum of 112 reviewswerewrittenbythe single user with code A3LZGLA88K0LA0. The frequency count of words in the movie review is accomplished, and sentiment of the user can be analyzed using unigrams.

UR - http://www.scopus.com/inward/record.url?scp=85063682953&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063682953&partnerID=8YFLogxK

M3 - Review article

AN - SCOPUS:85063682953

VL - 7

SP - 542

EP - 545

JO - International Journal of Recent Technology and Engineering

JF - International Journal of Recent Technology and Engineering

SN - 2277-3878

IS - 5

ER -