Data Drive: A OCR Model to Prevent Phishing
- Version
- Download 11
- File Size 1.13 MB
- File Count 1
- Create Date 28 April 2025
- Last Updated 28 April 2025
Data Drive: A OCR Model to Prevent Phishing
Authors:
Shreyas Rajak, Dr. Muneeswaran V, Dr. Subhash Chandra Patel
Abstract:
Phishing is a deceptive tactic used by cyber-criminals to extract sensitive information such as passwords or credit card information etc. from the targeted persons by impersonating the real website, through fake emails/ whatsapp or text messages, sometimes also via images or other files, among others. Most phishing detection methodologies will scan website URLs or text content of emails in order to identify threats. However, in instances where the phishing content is hidden within images, these methods prove inefficient because visual data cannot be parsed appropriately. That’s why this paper proposes an OCR-based, data-driven scheme for the automatic extraction and analysis of text from images for this technique to detect phishing whenever there is malicious content within the visual representations. This approach enhances the overall detection accuracy because it focuses on image-based text to offer improved protection against phishing threats.
This proposed model uses OCR for automatically reading text into data from an image and then processes the data using machine learning algorithms to detect other common indicators of phishing, such as suspiciously formed URLs or brand names without authorization. This model's target is the gap that conventional phishing detection systems lack and improves their accuracy by extending protection to cases where text is embedded in images.
Testing on a range of phishing images has shown that the model was able to classify a phishing attempt using good accuracy. Thus, such an OCR-based system, as concluded in this study, further establishes how it can strengthen the security perspective with detection at a better level. It indicates that such a system can be effectively used in other digital platforms and eventually prevent advanced phishing attempts in the minds of users.
Keywords: Phishing Detection, Optical Character Recognition (OCR), Data-Driven Approach, Cybersecurity, Image-based Phishing, Machine Learning, Text Extraction, Visual Content Analysis, Cyber Threat Detection, Anti-Phishing Techniques.
Download