TY - JOUR
T1 - CatchPhish
T2 - detection of phishing websites by inspecting URLs
AU - Rao, Routhu Srinivasa
AU - Vaishnavi, Tatti
AU - Pais, Alwyn Roshan
PY - 2020/2/1
Y1 - 2020/2/1
N2 - There exists many anti-phishing techniques which use source code-based features and third party services to detect the phishing sites. These techniques have some limitations and one of them is that they fail to handle drive-by-downloads. They also use third-party services for the detection of phishing URLs which delay the classification process. Hence, in this paper, we propose a light-weight application, CatchPhish which predicts the URL legitimacy without visiting the website. The proposed technique uses hostname, full URL, Term Frequency-Inverse Document Frequency (TF-IDF) features and phish-hinted words from the suspicious URL for the classification using the Random forest classifier. The proposed model with only TF-IDF features on our dataset achieved an accuracy of 93.25%. Experiment with TF-IDF and hand-crafted features achieved a significant accuracy of 94.26% on our dataset and an accuracy of 98.25%, 97.49% on benchmark datasets which is much better than the existing baseline models.
AB - There exists many anti-phishing techniques which use source code-based features and third party services to detect the phishing sites. These techniques have some limitations and one of them is that they fail to handle drive-by-downloads. They also use third-party services for the detection of phishing URLs which delay the classification process. Hence, in this paper, we propose a light-weight application, CatchPhish which predicts the URL legitimacy without visiting the website. The proposed technique uses hostname, full URL, Term Frequency-Inverse Document Frequency (TF-IDF) features and phish-hinted words from the suspicious URL for the classification using the Random forest classifier. The proposed model with only TF-IDF features on our dataset achieved an accuracy of 93.25%. Experiment with TF-IDF and hand-crafted features achieved a significant accuracy of 94.26% on our dataset and an accuracy of 98.25%, 97.49% on benchmark datasets which is much better than the existing baseline models.
UR - http://www.scopus.com/inward/record.url?scp=85065708499&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85065708499&partnerID=8YFLogxK
U2 - 10.1007/s12652-019-01311-4
DO - 10.1007/s12652-019-01311-4
M3 - Article
AN - SCOPUS:85065708499
SN - 1868-5137
JO - Journal of Ambient Intelligence and Humanized Computing
JF - Journal of Ambient Intelligence and Humanized Computing
ER -