Fake News Detection using Machine Learning and NLP Models: A Comparative Study

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Fake News Detection using Machine Learning and NLP Models: A Comparative Study

Version

File Size 314.79 KB

Downloads 30

Files 1

Published 14 April 2026

Updated 14 April 2026

Fake News Detection using Machine Learning and NLP Models: A Comparative Study

E SOUMYA, T.POOJITH
Professor, Department of Computer Science and Engineering, St. Martin's Engineering College, Hyderabad,
India esoumyait@smec.ac.in
Student, Department of Computer Science and Engineering, St. Martin's Engineering College, Hyderabad,
India poojithtadiboyina@gmail.com

Abstract:Widespread dissemination of fabricated and misleading content across online platforms has become one of the most pressing concerns in today's information landscape. The remarkable ease with which inaccurate material can be generated and distributed via social networks demands scalable, automated, and reliable detection mechanisms. This study presents a systematic comparative evaluation of machine learning (ML) and natural language processing (NLP) methodologies applied to the challenge of identifying fake news. The techniques assessed range from conventional classifiers—including Naive Bayes (NB), Support Vector Machines (SVM), and Logistic Regression (LR)—to sophisticated deep learning models such as Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) networks, and Transformer based architectures including BERT and RoBERTa. An analysis of their theoretical underpinnings, feature extractionstrategies, and measured performance across publicly accessible benchmark corpora (LIAR, FakeNewsNet, ISOT, and WELFake) is provided. Metrics considered include accuracy, precision, recall, F1-score, and computational cost. A thorough review of 20 empirical studies published between 2021 and 2025 reveals notable variation in model effectiveness based on dataset composition, linguistic characteristics, and preprocessing choices. Results consistently show that Transformer-based architectures lead in performance, while ensemble and hybrid strategies integrating linguistic and contextual cues offer the greatest potential for dependable real-world application.