Comparative Analysis of Machine Learning Algorithms for Email Spam Detection using TF-IDF
Comparative Analysis of Machine Learning Algorithms for Email Spam Detection using TF-IDF
Patinavalasa Durga Prasad*1, Pathri Deepthi Sri2, Doddi Praveen Kumar3, Sayyed Akbar Alisha4, Vasireddi Saran Manikanta5, Suneel Kimar Duvvuri6
1Student, M.Sc (Computer Science), Government College (Autonomous), Rajahmundry, Adhra Pradesh, India.
2Student, B.Sc (Artificial Intelligence), Government College (Autonomous), Rajahmundry, Adhra Pradesh, India.
3Student, B.Sc (Artificial Intelligence), Government College (Autonomous), Rajahmundry, Adhra Pradesh, India.
4Student, B.Sc (Artificial Intelligence), Government College (Autonomous), Rajahmundry, Adhra Pradesh, India.
5Student, B.Sc (Artificial Intelligence), Government College (Autonomous), Rajahmundry, Adhra Pradesh, India.
6Assistant Professor, Department of Computer Science, Government College (Autonomous), Rajahmundry, Adhra Pradesh, India.
Abstract – Spam email detection has become a critical challenge in modern communication systems due to the increasing volume of unwanted and malicious emails. This research presents a comparative analysis of multiple machine learning algorithms for efficient spam classification. The study utilizes a labeled dataset containing spam and ham messages, which is preprocessed and transformed using Term Frequency–Inverse Document Frequency (TF-IDF) vectorization.Five different machine learning algorithms, namely Gaussian Naive Bayes, K-Nearest Neighbors (KNN), Decision Tree, Random Forest, and Support Vector Machine (SVM), are implemented and evaluated. The dataset is split into training and testing sets, and performance is measured using accuracy and confusion matrix.