Predictive Modelling for Early Diagnosis of Oral Cancer Using Supervised Learning Algorithms: A Comparative Analysis
- Version
- Download 13
- File Size 596.91 KB
- File Count 1
- Create Date 23 July 2025
- Last Updated 23 July 2025
Predictive Modelling for Early Diagnosis of Oral Cancer Using Supervised Learning Algorithms: A Comparative Analysis
Reshath R, Sanjith P, Asma J , Kamali Manickavasagam Lekshmi
Department of Biotechnology, KIT-Kalaignarkarunanidhi Institute of Technology, Coimbatore, Tamil Nadu, India.
reshathrs2004@gmail.com
Abstract
Background: Early identification of oral cancer (OC) is crucial, as it significantly enhances the likelihood of survival. In the realm of modern healthcare, artificial intelligence (AI) has emerged as a promising tool in diagnostic practices. This research undertook a detailed review of current literature to examine how effectively AI can be utilized in the detection of OC, especially emphasizing its accuracy and potential in spotting the disease at its earliest and most treatable stages.
Objective: To create an early diagnostic model to prevent the poor prognosis of oral cancer using non-imaging clinical data.
Methods: This study made use of a publicly available patient-level clinical dataset obtained from Kaggle. After performing thorough preprocessing and crafting relevant features, four machine learning models, Logistic Regression, Random Forest, XGBoost, and CatBoost, were built and evaluated. The predictive performance of each classifier was measured using key metrics, including accuracy, F1-score, area under the ROC curve (ROC-AUC), and confusion matrices.
Results: The output from the Random Forest model outperforms that of other models. The Random Forest gives an accuracy of 90.07% and an F1 of 95%.
Conclusion: This study with various algorithms offers a more efficient method for Oral cancer diagnosis. Our model demonstrates strong potential for assisting in the early detection of oral cancer, enabling timely intervention and improved patient outcomes. Future work can focus on expanding the dataset, incorporating real-time screening data, and deploying the model in clinical decision support systems for community-based screening.
Download