KNOWLEDGE BASED APPROACH TO DETECT POTENTIALLY RISKY WEBSITES
- Version
- Download 10
- File Size 345.76 KB
- File Count 1
- Create Date 7 May 2025
- Last Updated 7 May 2025
KNOWLEDGE BASED APPROACH TO DETECT POTENTIALLY RISKY WEBSITES
Authors:
JEGAJITH S, IMAYAVARMAN M, KOUSHIK M, MAJIDHA FATHIMA KM
ABSTRACT: The challenge of grouping web pages inside a website so that each cluster contains a collection of web pages that can be categorised using a distinct class is known as unsupervised web page categorization. A number of requirements that would make the existing proposals for web page classification suitable for enterprise web information integration are not met, including the need to be unsupervised, which eliminates the need for a training set of pre-classified pages, to be based on lightweight crawling to prevent interfering with the website's normal operation, and to use features from outside the page to be classified to avoid downloading it. In this paper, we introduce CALA, a novel automated solution for creating URL-based classifiers for web pages. Our plan creates several URL patterns that correspond to the various page classes found in a website, allowing further pages to be categorised by comparing their URLs to the patterns. Its key characteristics are that it satisfies every one of the earlier prerequisites and has been verified by numerous tests on popular, real-world websites. Our validation demonstrates how effective and efficient CALA is in real-world scenarios.
Keywords: Machine Learning, Malicious URL Detection, Adversarial Attacks, Malicious Web Services
Download