Customer Churn Prediction for Subscription Businesses Using Machine Learning
Customer Churn Prediction for Subscription Businesses Using Machine Learning
Rimi Jitendra Chanawala¹, Bhumi Shah²
¹Department of Computer Science and Engineering, Parul University, Vadodara, India
²Assistant Professor, Department of Computer Science and Engineering, Parul University, Vadodara, India
Abstract — Customer churn — the voluntary cancellation of a subscription service — directly threatens the revenue sustainability of telecommunications and SaaS businesses. This paper presents an end-to- end machine learning pipeline for churn prediction, built and validated on the IBM Telco Customer Churn dataset (7,043 records, 21 features). The dataset exhibits a class imbalance with 73.5% non- churners and 26.5% churners. After exploratory data analysis, one-hot encoding, and SMOTE-based class balancing, a Logistic Regression classifier was trained and evaluated using classification-focused metrics. The model achieved an overall accuracy of 77%, with a churn-class Recall of 0.69 and F1-Score of 0.61. Results demonstrate that SMOTE meaningfully improves the model's ability to detect at-risk subscribers. The pipeline provides a practical foundation for proactive customer retention, with future extensions planned for Random Forest and XGBoost ensemble comparisons.
Keywords: customer churn, Telco dataset, logistic regression, SMOTE, class imbalance, binary classification, subscription analytics, machine learning