Predicting Depression via Transformer-Based Text Embeddings and XGBoost Classification
Predicting Depression via Transformer-Based Text Embeddings and XGBoost Classification
Authors:
Isha B, Drashti I, Mahek J, Shradha B
Abstract - Depression is a pervasive mental health concern whose early signs often remain undetected due to limited clinical accessibility and the subjectivity of traditional diagnostic methods. With the widespread use of social media, individuals frequently express emotions and thoughts that reflect their psychological state. This study proposes a machine learning–based framework to detect depressive tendencies from user-generated social media content through automated and data-driven analysis. The system integrates data preprocessing, feature extraction using transformer-based models such as BERT, RoBERTa, and DistilBERT, and classification using XGBoost. Publicly available annotated datasets are used for training and validation. The preprocessing stage removes noise and standardizes text, while extracted features capture emotional tone and behavioral patterns. Experimental results demonstrate improved accuracy and reliability in identifying depressive indicators. The proposed solution ensures privacy, reduces operational constraints, and provides actionable insights, contributing to advancements in mental health informatics.
Keywords: Depression detection, social media analysis, machine learning, feature extraction, sentiment analysis, behavioral cues