FAKE NEWS DETECTION IN TAMIL SOCIAL MEDIA USING MULTILINGUAL TRANSFORMERS

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

FAKE NEWS DETECTION IN TAMIL SOCIAL MEDIA USING MULTILINGUAL TRANSFORMERS

Version

File Size 419.56 KB

Downloads 4

Files 1

Published 25 April 2026

Updated 25 April 2026

FAKE NEWS DETECTION IN TAMIL SOCIAL MEDIA USING MULTILINGUAL TRANSFORMERS

Authors:

Arunprasad C

M.Tech Student, Department of Information Technology, Puducherry Technological University, Puducherry, India 1

Abstract — The rapid growth of Tamil social media has led to an alarming increase in the circulation of misleading, fabricated, and politically motivated fake news. This paper proposes a robust automated fake news detection system specifically designed for Tamil content on social media. The system leverages IndicBERT, a lightweight multilingual transformer pretrained on 12 major Indian languages, to generate 768-dimensional contextual embeddings from a novel Tamil dataset of 1,555 fact-checked articles collected via an automated Selenium-based scraping pipeline from YouTurn. Two baseline classifiers—Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM)—are evaluated, followed by a hybrid CNN-BiLSTM model. The hybrid model, trained using multi-seed averaging across three seeds, achieves a 3-class accuracy of 62.38%, a binary (Fake vs. Non-Fake) accuracy of 83.0%, and a Macro F1-score of 0.6155. Results demonstrate the feasibility and competitive performance of using multilingual transformers for Tamil misinformation detection, establishing a strong foundation for future hybrid deep-learning research in low-resource Dravidian languages.

Keywords: Tamil fake news detection, IndicBERT, CNN, BiLSTM, hybrid deep learning, misinformation, low-resource NLP, transformer embeddings.