Multilingual Plagiarism Detection Across Diverse Media Formats Using OCR and Neural Techniques
- Version
- Download 18
- File Size 403.34 KB
- File Count 1
- Create Date 30 January 2025
- Last Updated 30 January 2025
Multilingual Plagiarism Detection Across Diverse Media Formats Using OCR and Neural Techniques
C. Yogesh, G. Balaji, Ganesh Aditya R S, Hari Suriya K, Dr.S. Shargunam
Student B.Tech-CSE(AIML), Kalasalingam Academy of Research and Education
Student B.Tech-CSE(AIML), Kalasalingam Academy of Research and Education
Student B.Tech-CSE(AIML), Kalasalingam Academy of Research and Education
Student B.Tech-CSE(AIML), Kalasalingam Academy of Research and Education
Asst. Professor (B.Tech-CSE), Kalasalingam Academy of Research and Education
Abstract - This study presents a comprehensive approach to multilingual plagiarism detection across various media formats, including image-to-image, text-to-text, PDF-to-PDF, and file-to-file comparisons. Utilizing Optical Character Recognition (OCR) with the PyTesseract module, our system extracts text from images and scanned documents for analysis. By leveraging neural network models and cosine similarity, we computed the semantic similarity across languages and formats to detect potential plagiarism. Our method was tested on a variety of language pairs and media formats, demonstrating its effectiveness in identifying cross-lingual plagiarism with high precision and recall.
Key Words: Multilingual, Plagiarism Detection, OCR, Neural Networks, Cosine Similarity, Media Formats
Download