FAKE NEWS DETECTION IN TAMIL SOCIAL MEDIA USING MULTILINGUAL TRANSFORMERS
FAKE NEWS DETECTION IN TAMIL SOCIAL MEDIA USING MULTILINGUAL TRANSFORMERS
Authors:
Arunprasad C
M.Tech Student, Department of Information Technology, Puducherry Technological University, Puducherry, India 1
Abstract — The rapid growth of Tamil social media has led to an alarming increase in the circulation of misleading, fabricated, and politically motivated fake news. This paper proposes a robust automated fake news detection system specifically designed for Tamil content on social media. The system leverages IndicBERT, a lightweight multilingual transformer pretrained on 12 major Indian languages, to generate 768-dimensional contextual embeddings from a novel Tamil dataset of 1,555 fact-checked articles collected via an automated Selenium-based scraping pipeline from YouTurn. Two baseline classifiers—Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM)—are evaluated, followed by a hybrid CNN-BiLSTM model. The hybrid model, trained using multi-seed averaging across three seeds, achieves a 3-class accuracy of 62.38%, a binary (Fake vs. Non-Fake) accuracy of 83.0%, and a Macro F1-score of 0.6155. Results demonstrate the feasibility and competitive performance of using multilingual transformers for Tamil misinformation detection, establishing a strong foundation for future hybrid deep-learning research in low-resource Dravidian languages.
Keywords: Tamil fake news detection, IndicBERT, CNN, BiLSTM, hybrid deep learning, misinformation, low-resource NLP, transformer embeddings.