Malicious URL Detection Using Deep Learning
Malicious URL Detection Using Deep Learning
Maguluri V Lavanya
Data Science
Geethanjali College of Engineering and
Technology
Cheeryal,Hyderabad mlavanya.cse@gcet.edu.in
Manisha Cherlapally
Data Science
Geethanjali College of Engineering and
Technology
Cheeryal,Hyderabad
22r11a6709@gcet.edu.in
Revanth Palla
Data Science
Geethanjali College of Engineering and
Technology
Cheeryal,Hyderabad
22r11a6730@gcet.edu.in
Mythri Pulicheru
Data Science
Geethanjali College of Engineering and
Technology
Cheeryal,Hyderabad
22r11a6731@gcet.edu.in
Abstract
The rapid growth of internet usage and online services has significantly increased the risk of cyber threats, particularly those involving malicious URLs used for phishing, malware distribution, and other fraudulent activities. These malicious links are often designed to appear legitimate, making them difficult to identify using traditional security techniques. Moreover, attackers continuously modify URL structures to evade detection systems, creating a need for more advanced and adaptive solutions.In this project, a deep learning-based approach is proposed to automatically detect and classify URLs as benign or malicious. The system utilizes multiple models, including Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM), to capture both structural and sequential patterns present in URL data. Initially, raw URLs undergo preprocessing steps such as cleaning, normalization, tokenization, and padding to ensure uniform input representation. These processed inputs are then transformed into numerical vectors using embedding techniques for efficient model training.Each deep learning model is trained to learn unique characteristics of URL patterns. CNN focuses on identifying local structural features, while LSTM and BiLSTM models capture sequential dependencies and contextual relationships within the URL strings. The performance of these models is evaluated using standard metrics such as accuracy, precision, recall, and F1-score to ensure a comprehensive assessment.Experimental results show that all models perform effectively, with the CNN model achieving the highest accuracy of 93.20%, indicating its strong capability in detecting malicious patterns. Furthermore, the proposed system is integrated into a web-based application using the Flask framework, enabling real-time URL classification with confidence scores. Overall, the proposed approach offers a scalable, efficient, and reliable solution for malicious URL detection, contributing to improved cybersecurity by enabling early identification and prevention of potential online threats.
I. KEYWORDS: URL SECURITY, DEEP LEARNING MODELS, PHISHING DETECTION, WEB SAFETY, NEURAL NETWORKS.