Machine Learning-Based Sentiment Analysis: A Comparative Study of Classification Algorithms for Unstructured Digital Text
Machine Learning-Based Sentiment Analysis: A Comparative Study of Classification Algorithms for Unstructured Digital Text
Authors: Akkireddi Vara Prasad1, K. Praveen2, K. Gandhi Durga Rao3, P. Dhanush4, Y. Avanthi5 and S.
Srilatha6
Affiliation: Department of Computer Science and Engineering, Visakha Institute of Engineering and
Technology (A), Narava, Visakhapatnam, AP, India.
ABSTRACT:The modern digital era is characterized by an exponential surge in user-generated content from social media, review platforms, and online forums. While these data streams contain critical insights into public opinion and customer feedback, the sheer volume of unstructured text makes manual analysis both computationally expensive and prone to human error. This research presents a robust, automated sentiment analysis framework designed to classify textual data into positive, negative, and neutral categories. The proposed system employs a rigorous pipeline involving preprocessing techniques—such as tokenization, stop-word removal, and text normalization—to refine raw data. To facilitate machine learning, feature extraction methods including Term Frequency-Inverse Document Frequency (TF-IDF) and Bag of Words are utilized to convert text into high dimensional numerical representations. We evaluate the efficacy of multiple supervised learning algorithms, specifically Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression. Experimental evaluations conducted on datasets ranging from 5,000 to 10,000 samples demonstrate that the system achieves a classification accuracy between 85% and 95%. Notably, the SVM model outperformed other architectures, reaching a peak accuracy of approximately 90%. These results underscore the system's capacity to reduce manual effort and provide scalable, real-time insights for decision-making in domains such as business intelligence, marketing, and social media monitoring.
KEYWORDS:Sentiment Analysis, Machine Learning, Natural Language Processing (NLP), Text Classification, TF-IDF, Support Vector
Machine (SVM), Naïve Bayes, Logistic Regression, Opinion Mining, Feature Extraction.