Query Quality Prediction on Source Code Base Dataset: A Comparative Study

B. P. Swathi, Balachandra Muniyal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Source code retrieval is a task under text retrieval which is performed by software developers regularly. The existing source code retrieval approaches are regular expression based and anticipate that the software developer querying the code base has an extensive acquaintance with the source code. Unlike keyword or regular expression based source code search which are difficult to remember, software developers should be able to query the code base in a sentential form. Although, performance of the search on text widely depends upon query quality, it succeeds when the quality of the textual query is high. Query quality prediction ahead of query execution on a source code retrieval system will save developers time and effort by notifying him/her when a query is unlikely to perform. This paper assesses the performance of prominent classification algorithms namely Support Vector Machine (SVM), Logistic Regression (LR), Gradient Boosted Tree (GBT) and Decision Tree (DT) to predict the query quality on a data set created from the documentation of the source code files. Experimental results using benchmark open source projects data set demonstrates that Gradient Boosted Tree performs better than others in comparison.

Original languageEnglish
Title of host publication2018 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1115-1119
Number of pages5
ISBN (Electronic)9781538653142
DOIs
Publication statusPublished - 30-11-2018
Event7th International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018 - Bangalore, India
Duration: 19-09-201822-09-2018

Conference

Conference7th International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018
CountryIndia
CityBangalore
Period19-09-1822-09-18

Fingerprint

Decision trees
Support vector machines
Logistics

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems

Cite this

Swathi, B. P., & Muniyal, B. (2018). Query Quality Prediction on Source Code Base Dataset: A Comparative Study. In 2018 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018 (pp. 1115-1119). [8554602] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICACCI.2018.8554602
Swathi, B. P. ; Muniyal, Balachandra. / Query Quality Prediction on Source Code Base Dataset : A Comparative Study. 2018 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 1115-1119
@inproceedings{ff37337dcf7b454d82a2455d97ab2474,
title = "Query Quality Prediction on Source Code Base Dataset: A Comparative Study",
abstract = "Source code retrieval is a task under text retrieval which is performed by software developers regularly. The existing source code retrieval approaches are regular expression based and anticipate that the software developer querying the code base has an extensive acquaintance with the source code. Unlike keyword or regular expression based source code search which are difficult to remember, software developers should be able to query the code base in a sentential form. Although, performance of the search on text widely depends upon query quality, it succeeds when the quality of the textual query is high. Query quality prediction ahead of query execution on a source code retrieval system will save developers time and effort by notifying him/her when a query is unlikely to perform. This paper assesses the performance of prominent classification algorithms namely Support Vector Machine (SVM), Logistic Regression (LR), Gradient Boosted Tree (GBT) and Decision Tree (DT) to predict the query quality on a data set created from the documentation of the source code files. Experimental results using benchmark open source projects data set demonstrates that Gradient Boosted Tree performs better than others in comparison.",
author = "Swathi, {B. P.} and Balachandra Muniyal",
year = "2018",
month = "11",
day = "30",
doi = "10.1109/ICACCI.2018.8554602",
language = "English",
pages = "1115--1119",
booktitle = "2018 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Swathi, BP & Muniyal, B 2018, Query Quality Prediction on Source Code Base Dataset: A Comparative Study. in 2018 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018., 8554602, Institute of Electrical and Electronics Engineers Inc., pp. 1115-1119, 7th International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018, Bangalore, India, 19-09-18. https://doi.org/10.1109/ICACCI.2018.8554602

Query Quality Prediction on Source Code Base Dataset : A Comparative Study. / Swathi, B. P.; Muniyal, Balachandra.

2018 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 1115-1119 8554602.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Query Quality Prediction on Source Code Base Dataset

T2 - A Comparative Study

AU - Swathi, B. P.

AU - Muniyal, Balachandra

PY - 2018/11/30

Y1 - 2018/11/30

N2 - Source code retrieval is a task under text retrieval which is performed by software developers regularly. The existing source code retrieval approaches are regular expression based and anticipate that the software developer querying the code base has an extensive acquaintance with the source code. Unlike keyword or regular expression based source code search which are difficult to remember, software developers should be able to query the code base in a sentential form. Although, performance of the search on text widely depends upon query quality, it succeeds when the quality of the textual query is high. Query quality prediction ahead of query execution on a source code retrieval system will save developers time and effort by notifying him/her when a query is unlikely to perform. This paper assesses the performance of prominent classification algorithms namely Support Vector Machine (SVM), Logistic Regression (LR), Gradient Boosted Tree (GBT) and Decision Tree (DT) to predict the query quality on a data set created from the documentation of the source code files. Experimental results using benchmark open source projects data set demonstrates that Gradient Boosted Tree performs better than others in comparison.

AB - Source code retrieval is a task under text retrieval which is performed by software developers regularly. The existing source code retrieval approaches are regular expression based and anticipate that the software developer querying the code base has an extensive acquaintance with the source code. Unlike keyword or regular expression based source code search which are difficult to remember, software developers should be able to query the code base in a sentential form. Although, performance of the search on text widely depends upon query quality, it succeeds when the quality of the textual query is high. Query quality prediction ahead of query execution on a source code retrieval system will save developers time and effort by notifying him/her when a query is unlikely to perform. This paper assesses the performance of prominent classification algorithms namely Support Vector Machine (SVM), Logistic Regression (LR), Gradient Boosted Tree (GBT) and Decision Tree (DT) to predict the query quality on a data set created from the documentation of the source code files. Experimental results using benchmark open source projects data set demonstrates that Gradient Boosted Tree performs better than others in comparison.

UR - http://www.scopus.com/inward/record.url?scp=85060062917&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060062917&partnerID=8YFLogxK

U2 - 10.1109/ICACCI.2018.8554602

DO - 10.1109/ICACCI.2018.8554602

M3 - Conference contribution

AN - SCOPUS:85060062917

SP - 1115

EP - 1119

BT - 2018 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Swathi BP, Muniyal B. Query Quality Prediction on Source Code Base Dataset: A Comparative Study. In 2018 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1115-1119. 8554602 https://doi.org/10.1109/ICACCI.2018.8554602