Distributed Component-Based Crawler for AJAX Applications

Suryansh Raj, Rajashree Krishna, Ashalatha Nayak

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Crawling web applications is important for indexing websites as well as for testing vulnerabilities present in the website. The research area of crawling traditional websites has made significant progress and many software suites are available which can carry deep crawls of large traditional websites in limited time. The modern AJAX (asynchronous JavaScript and XML) based websites, however, cannot be crawled by traditional crawlers. The area is open to research and many open-source software suites are being developed. However, the software suites developed so far still face the issues of state space explosion, poor time efficiency and incomplete content coverage. This research work aims to develop a distributed component-based crawler for deterministic AJAX applications to reduce state space explosion, improve time efficiency and provide complete content coverage. It uses a combination of multiple approaches to develop the solution. Firstly, it takes into account a Component-Based approach to reduce state space explosion. It then takes a Distributed-Crawling approach to process the events concurrently in order to improve efficiency. It employs a Breadth First Search (BFS) strategy to provide complete content coverage.

Original languageEnglish
Title of host publicationProceedings of 2018 2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538637852
DOIs
Publication statusPublished - 02-10-2018
Event2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018 - Bangalore, India
Duration: 09-02-201810-02-2018

Conference

Conference2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018
CountryIndia
CityBangalore
Period09-02-1810-02-18

Fingerprint

JavaScript
XML
Explosion
Websites
State Space
Coverage
Explosions
Breadth-first Search
Software
Open Source Software
Search Strategy
Web Application
Vulnerability
Indexing
Testing
World Wide Web

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Signal Processing
  • Electrical and Electronic Engineering
  • Control and Optimization
  • Computer Networks and Communications

Cite this

Raj, S., Krishna, R., & Nayak, A. (2018). Distributed Component-Based Crawler for AJAX Applications. In Proceedings of 2018 2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018 [8479454] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICAECC.2018.8479454
Raj, Suryansh ; Krishna, Rajashree ; Nayak, Ashalatha. / Distributed Component-Based Crawler for AJAX Applications. Proceedings of 2018 2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018. Institute of Electrical and Electronics Engineers Inc., 2018.
@inproceedings{0c23fb4f27724f62b2a17086df1a6fc1,
title = "Distributed Component-Based Crawler for AJAX Applications",
abstract = "Crawling web applications is important for indexing websites as well as for testing vulnerabilities present in the website. The research area of crawling traditional websites has made significant progress and many software suites are available which can carry deep crawls of large traditional websites in limited time. The modern AJAX (asynchronous JavaScript and XML) based websites, however, cannot be crawled by traditional crawlers. The area is open to research and many open-source software suites are being developed. However, the software suites developed so far still face the issues of state space explosion, poor time efficiency and incomplete content coverage. This research work aims to develop a distributed component-based crawler for deterministic AJAX applications to reduce state space explosion, improve time efficiency and provide complete content coverage. It uses a combination of multiple approaches to develop the solution. Firstly, it takes into account a Component-Based approach to reduce state space explosion. It then takes a Distributed-Crawling approach to process the events concurrently in order to improve efficiency. It employs a Breadth First Search (BFS) strategy to provide complete content coverage.",
author = "Suryansh Raj and Rajashree Krishna and Ashalatha Nayak",
year = "2018",
month = "10",
day = "2",
doi = "10.1109/ICAECC.2018.8479454",
language = "English",
booktitle = "Proceedings of 2018 2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Raj, S, Krishna, R & Nayak, A 2018, Distributed Component-Based Crawler for AJAX Applications. in Proceedings of 2018 2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018., 8479454, Institute of Electrical and Electronics Engineers Inc., 2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018, Bangalore, India, 09-02-18. https://doi.org/10.1109/ICAECC.2018.8479454

Distributed Component-Based Crawler for AJAX Applications. / Raj, Suryansh; Krishna, Rajashree; Nayak, Ashalatha.

Proceedings of 2018 2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018. Institute of Electrical and Electronics Engineers Inc., 2018. 8479454.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Distributed Component-Based Crawler for AJAX Applications

AU - Raj, Suryansh

AU - Krishna, Rajashree

AU - Nayak, Ashalatha

PY - 2018/10/2

Y1 - 2018/10/2

N2 - Crawling web applications is important for indexing websites as well as for testing vulnerabilities present in the website. The research area of crawling traditional websites has made significant progress and many software suites are available which can carry deep crawls of large traditional websites in limited time. The modern AJAX (asynchronous JavaScript and XML) based websites, however, cannot be crawled by traditional crawlers. The area is open to research and many open-source software suites are being developed. However, the software suites developed so far still face the issues of state space explosion, poor time efficiency and incomplete content coverage. This research work aims to develop a distributed component-based crawler for deterministic AJAX applications to reduce state space explosion, improve time efficiency and provide complete content coverage. It uses a combination of multiple approaches to develop the solution. Firstly, it takes into account a Component-Based approach to reduce state space explosion. It then takes a Distributed-Crawling approach to process the events concurrently in order to improve efficiency. It employs a Breadth First Search (BFS) strategy to provide complete content coverage.

AB - Crawling web applications is important for indexing websites as well as for testing vulnerabilities present in the website. The research area of crawling traditional websites has made significant progress and many software suites are available which can carry deep crawls of large traditional websites in limited time. The modern AJAX (asynchronous JavaScript and XML) based websites, however, cannot be crawled by traditional crawlers. The area is open to research and many open-source software suites are being developed. However, the software suites developed so far still face the issues of state space explosion, poor time efficiency and incomplete content coverage. This research work aims to develop a distributed component-based crawler for deterministic AJAX applications to reduce state space explosion, improve time efficiency and provide complete content coverage. It uses a combination of multiple approaches to develop the solution. Firstly, it takes into account a Component-Based approach to reduce state space explosion. It then takes a Distributed-Crawling approach to process the events concurrently in order to improve efficiency. It employs a Breadth First Search (BFS) strategy to provide complete content coverage.

UR - http://www.scopus.com/inward/record.url?scp=85056171209&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056171209&partnerID=8YFLogxK

U2 - 10.1109/ICAECC.2018.8479454

DO - 10.1109/ICAECC.2018.8479454

M3 - Conference contribution

BT - Proceedings of 2018 2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Raj S, Krishna R, Nayak A. Distributed Component-Based Crawler for AJAX Applications. In Proceedings of 2018 2nd International Conference on Advances in Electronics, Computers and Communications, ICAECC 2018. Institute of Electrical and Electronics Engineers Inc. 2018. 8479454 https://doi.org/10.1109/ICAECC.2018.8479454