EMS-DS: Design and Implementation of a Browser-Native Data Science Workflow Engine for Employee Attrition Prediction Using Ensemble Classification and K-Means Segmentation
"EMS-DS: Design and Implementation of a Browser-Native Data Science Workflow Engine for Employee Attrition Prediction Using Ensemble Classification and K-Means Segmentation"
Authors:
Dhruv Ashwinbhai Sathvara
Internal Guide: Prof. Sinal Patel
Abstract — The rapid evolution of corporate environments demands scalable, data-driven solutions for human resource management. This paper presents the design, development, and evaluation of the Employee Management System (EMS) — Full Data Science Edition, a comprehensive full-stack web application built on the Flask (Python) framework that integrates the complete data science lifecycle within a single, browser-accessible interface. The system processes employee datasets through a six-step automated preprocessing pipeline — encompassing missing-value imputation, five domain-engineered features, Mutual Information-based feature selection, and StandardScaler normalisation — before training and comparing eight supervised machine learning classifiers under a rigorous 5-fold stratified cross-validation protocol. Random Forest achieved the highest AUC-ROC among all evaluated models. Unsupervised K-Means clustering with PCA visualisation identifies risk-stratified employee cohorts, and a time series module tracks cohort-level attrition trends with rolling averages and city-wise heatmaps. All visualisations are generated server-side as base64-encoded PNG charts, and results export as CSV and multi-sheet Excel workbooks. The platform enables HR decision-makers to perform end-to-end attrition analytics without requiring programming expertise or external data science tooling.
Keywords — Employee Attrition, Flask, Random Forest, K-Means, PCA, Mutual Information, ROC-AUC, Feature Engineering, scikit-learn, Seaborn, HR Analytics, CRUD, REST API.