Machine Learning-Based Sentiment Analysis: A Comparative Study of Classification Algorithms for Unstructured Digital Text

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Machine Learning-Based Sentiment Analysis: A Comparative Study of Classification Algorithms for Unstructured Digital Text

Version

File Size 498.13 KB

Downloads 61

Files 1

Published 31 March 2026

Updated 31 March 2026

Machine Learning-Based Sentiment Analysis: A Comparative Study of Classification Algorithms for Unstructured Digital Text

Authors: Akkireddi Vara Prasad1, K. Praveen2, K. Gandhi Durga Rao3, P. Dhanush4, Y. Avanthi5 and S.
Srilatha6
Affiliation: Department of Computer Science and Engineering, Visakha Institute of Engineering and
Technology (A), Narava, Visakhapatnam, AP, India.

ABSTRACT:The modern digital era is characterized by an exponential surge in user-generated content from social media, review platforms, and online forums. While these data streams contain critical insights into public opinion and customer feedback, the sheer volume of unstructured text makes manual analysis both computationally expensive and prone to human error. This research presents a robust, automated sentiment analysis framework designed to classify textual data into positive, negative, and neutral categories. The proposed system employs a rigorous pipeline involving preprocessing techniques—such as tokenization, stop-word removal, and text normalization—to refine raw data. To facilitate machine learning, feature extraction methods including Term Frequency-Inverse Document Frequency (TF-IDF) and Bag of Words are utilized to convert text into high dimensional numerical representations. We evaluate the efficacy of multiple supervised learning algorithms, specifically Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression. Experimental evaluations conducted on datasets ranging from 5,000 to 10,000 samples demonstrate that the system achieves a classification accuracy between 85% and 95%. Notably, the SVM model outperformed other architectures, reaching a peak accuracy of approximately 90%. These results underscore the system's capacity to reduce manual effort and provide scalable, real-time insights for decision-making in domains such as business intelligence, marketing, and social media monitoring.
KEYWORDS:Sentiment Analysis, Machine Learning, Natural Language Processing (NLP), Text Classification, TF-IDF, Support Vector
Machine (SVM), Naïve Bayes, Logistic Regression, Opinion Mining, Feature Extraction.