NoLie AI: An AI-Driven System for Plagiarism Detection, Forgery Detection and Privacy Leak Identification using NLP, OCR and Computer Vision
NoLie AI: An AI-Driven System for Plagiarism Detection, Forgery Detection and Privacy Leak Identification using NLP, OCR and Computer Vision
Authors:
Mr. Reddy Santosh Kumar
Assistant Professor
Department of Artificial Intelligence and Machine Learning Ballari Institute of Technology & Management
Ballari, Karnataka, India
Syeda Tanzeemunissa
Department of Artificial Intelligence and Machine Learning Ballari Institute of Technology & Management
Ballari, Karnataka, India
T L Keerthana
Department of Artificial Intelligence and Machine Learning Ballari Institute of Technology & Management
Ballari, Karnataka, India
Sriya K
Department of Artificial Intelligence and Machine Learning Ballari Institute of Technology & Management
Ballari, Karnataka, India
Syeda Zoha Shaik
Department of Artificial Intelligence and Machine Learning Ballari Institute of Technology & Management
Ballari, Karnataka, India
Abstract—Plagiarism, digital forgery, and privacy leakage have become major challenges in the modern digital environment due to the rapid increase in online document sharing and AI-generated content. Traditional verification systems generally focus on a single task such as plagiarism checking or forgery de-tection, making them insufficient for complete document integrity analysis. This paper presents NoLie AI, an AI-powered unified framework designed to detect plagiarism, identify document forgery, and analyze privacy risks in uploaded documents and images.
The proposed system integrates Natural Language Processing (NLP), Optical Character Recognition (OCR), and Computer Vision techniques to analyze multiple forms of digital content. Plagiarism detection is performed using TF-IDF, cosine simi-larity, and Sentence-BERT embeddings to identify copied and semantically similar text. Forgery detection uses Error Level Analysis (ELA), metadata extraction, and OpenCV-based pixel anomaly analysis to detect manipulated regions in images and scanned documents. Privacy leak analysis is performed using Named Entity Recognition (NER) and regex-based techniques to identify sensitive information such as Aadhaar numbers, phone numbers, email IDs, and addresses.
The system generates a comprehensive visual report that displays plagiarism percentage, forgery status, and privacy risk indicators. Experimental results demonstrate that the proposed system effectively improves verification accuracy while reducing manual effort and processing time. NoLie AI provides a scalable and intelligent solution suitable for academic institutions, recruit-ment systems, digital forensic applications, and legal document verification.
Index Terms—NoLie AI, Plagiarism Detection, Forgery De-tection, Privacy Leak Detection, OCR, NLP, Computer Vision, Sentence-BERT, Error Level Analysis.