Distributed Component-Based Crawler for AJAX Applications

Suryansh Raj, Rajashree Krishna, Ashalatha Nayak

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Crawling web applications is important for indexing websites as well as for testing vulnerabilities present in the website. The research area of crawling traditional websites has made significant progress and many software suites are available which can carry deep crawls of large traditional websites in limited time. The modern AJAX (asynchronous JavaScript and XML) based websites, however, cannot be crawled by traditional crawlers. The area is open to research and many open-source software suites are being developed. However, the software suites developed so far still face the issues of state space explosion, poor time efficiency and incomplete content coverage. This research work aims to develop a distributed component-based crawler for deterministic AJAX applications to reduce state space explosion, improve time efficiency and provide complete content coverage. It uses a combination of multiple approaches to develop the solution. Firstly, it takes into account a Component-Based approach to reduce state space explosion. It then takes a Distributed-Crawling approach to process the events concurrently in order to improve efficiency. It employs a Breadth First Search (BFS) strategy to provide complete content coverage.

Original languageEnglish
Title of host publicationProceedings of 2018 2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538637852
DOIs
Publication statusPublished - 02-10-2018
Event2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018 - Bangalore, India
Duration: 09-02-201810-02-2018

Conference

Conference2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018
CountryIndia
CityBangalore
Period09-02-1810-02-18

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Signal Processing
  • Electrical and Electronic Engineering
  • Control and Optimization
  • Computer Networks and Communications

Cite this

Raj, S., Krishna, R., & Nayak, A. (2018). Distributed Component-Based Crawler for AJAX Applications. In Proceedings of 2018 2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018 [8479454] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICAECC.2018.8479454