Dynamic Topic-Guided Deep Learning for Scalable and Interpretable Dark Web Text Analysis
Dynamic Topic-Guided Deep Learning for Scalable and Interpretable Dark Web Text Analysis
Dr. S.Surya Kumari1, Sama Pavithra2, Ryali Revathi3, Vendipalli Niranjan Kumar4, Vathaluru Seshannagari Himabindhu5
1Assistant Professor, Dept of Information Technology, SV College of Engineering, Tirupati, India.
2B.Tech, Dept of Information Technology, SV College of Engineering, Tirupati, India. 3B.Tech, Dept of Information Technology, SV College of Engineering, Tirupati, India. 4B.Tech, Dept of Information Technology, SV College of Engineering, Tirupati, India. 5B.Tech, Dept of Information Technology, SV College of Engineering, Tirupati, India.
Abstract - The Dark Web is a hidden portion of the internet accessible only via specialized software like Tor, offering anonymity for both legal privacy needs and illegal activities such as drug sales and hacking forums. It serves as an anonymous haven for cyber threats including malware trading, hacking forums, and illicit marketplaces, complicating textual classification amid noisy, voluminous data. Existing methods integrate Latent Dirichlet Allocation (LDA) topic modeling weights with TextCNN, preprocessing Dark Web texts to derive class-specific keywords, slashing vector dimensions by approximately 300- fold for superior accuracy on DUTA-10k (25 classes) and CoDA (10 classes) over SVM, Naive Bayes, and prior benchmarks. Despite outperforming baselines, limitations persist: dependency on static datasets neglects dynamic content shifts; variable keyword tuning arises from class overlaps; real-time processing is absent; and separate components obscure neural interpretability. This paper proposes a unified deep learning architecture embedding topic modeling directly into TextCNN for real-time classification, dynamically pruning irrelevant terms while exposing neural influences via integrated keyword analysis. Key benefits include rapid threat detection for operational cybersecurity, enhanced explainability bridging probabilistic weights and deep features, reduced hyperparameter sensitivity for robust generalization, and scalable deployment across evolving Dark Web landscapes, advancing automated intelligence gathering.
Key Words: Dark Web, Latent Dirichlet Allocation (LDA), real-time classification, generalization, TextCNN, operational cybersecurity.