Real-Time Phishing URL Detection Pipeline in the PHISHRADOR System
Real-Time Phishing URL Detection Pipeline in the PHISHRADOR System
Authors:
Gayam Ahalya
Computer Science and Engineering Rajiv Gandhi University of Knowledge Technologies
Basar, India b200478@rgukt.ac.in
Alkapalli Harisha
Computer Science and Engineering Rajiv Gandhi University of Knowledge Technologies
Basar, India b200769@rgukt.ac.in
Uppu Sreeja
Computer Science and Engineering Rajiv Gandhi University of Knowledge Technologies
Basar, India b201508@rgukt.ac.in
Abstract—Phishing attacks still remain a major cybersecurity risk, where users are deceived to access malicious websites and disclose confidential information like login credentials, banking information, and personal data. In this project, a hybrid approach for phishing detection of URLs based on machine learning classification, rule-based detection, trusted domain verification, and real-time threat intelligence is proposed. A structured data set with 11,554 input attributes and 47 engineered features for phishing detection is utilized for machine learning model development and testing. Various machine learning classifiers like Support Vector Machine, Random Forest, Decision Tree, Bagging, Gradient Boosting, AdaBoost, and hybrid classifiers like Voting Classifier and Stacking Classifier are implemented and tested for efficient phishing detection. In the proposed system, a feature vector is extracted for a given user-supplied URL, preprocessed, and scaled according to the compatible feature set. Then, it is passed through the best machine learning model for prediction. Additionally, rule-based detection and VirusTotal verification are incorporated for efficient decision-making. A web- based application using Flask is created for real-time phishing prediction and visualization. The proposed hybrid approach for phishing detection is efficient, simple, and deployable for real- world applications.
Index Terms—Phishing URL Detection, Machine Learning, XGBoost, URL Feature Extraction, Cybersecurity, URL Classification, Hybrid URL Features, Flask Deployment