Interpretable Loan Default Probability Prediction using Machine Learning

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Interpretable Loan Default Probability Prediction using Machine Learning

Version

File Size 482.72 KB

Downloads 29

Files 1

Published 20 April 2026

Updated 20 April 2026

Interpretable Loan Default Probability Prediction using Machine Learning

Author(s) Priti Vankar

Department of Computer Engineering /

Parul University, Vadodara, India

Abstract—Accurate and interpretable credit risk assessment is a fundamental requirement of modern lending institutions operating under increasingly stringent regulatory frameworks. This paper presents a comparative study of Random Forest and Logistic Regression for loan default prediction, evaluated on a dataset of 15,200 anonymized loan records sourced from the GoMask financial platform. The preprocessing pipeline incorporates median/mode imputation, one-hot encoding, StandardScaler normalization, and SMOTE-based class-imbalance correction. Both classifiers are rigorously assessed using stratified 5-fold cross-validation and a held-out test set (n = 3,040) across accuracy, precision, recall, F1-score, ROC-AUC, and confusion matrix decomposition. Logistic Regression achieves 91% accuracy, an F1-score of 0.90, and a ROC-AUC of 0.94—outperforming Random Forest on probability calibration (ROC-AUC = 0.85)—while providing full model transparency through auditable sigmoid coefficients that directly satisfy GDPR Article 22 and FCRA explainability mandates. The preferred model is deployed within a Flask-based interactive dashboard supporting real-time single-applicant scoring, bulk CSV inference, and a 25-chart exploratory analytics suite. This work demonstrates that an inherently interpretable classifier can match ensemble accuracy in credit scoring while meeting compliance requirements, and provides a fully reproducible, open-source blueprint for regulation-ready financial AI.