Paper Title
Bank Loan Default Risk Prediction Using XGBoost Algorithm
Abstract
The accurate prediction of bank loan defaults has become a central challenge for modern financial institutions. As banking ecosystems transition toward digitization and data- driven decision-making, the need for reliable, automated, and explainable credit evaluation mechanisms has intensified. Tradi- tional rule-based systems and linear statistical models have been widelyemployedforcreditscoring;however,thesemodelsstrug- gle to generalize when faced with nonlinear, high-dimensional, and imbalanced financial data. In addition, manual decision- making introduces bias, inconsistency, and inefficiency, which canlead toinaccurate assessments ofcreditworthiness andrising non-performing assets (NPAs).
This study introduces a comprehensive machine learning framework that leverages the Extreme Gradient Boosting (XG- Boost) algorithm for robust and interpretable prediction of loan approval outcomes and default risk. The framework includes rigorous data preprocessing—handling missing values, encoding categorical features, scaling numerical attributes, and mitigat- ing class imbalance via the Synthetic Minority Oversampling Technique (SMOTE). A comparative evaluation of Logistic Re- gression, Decision Tree, Random Forest, and XGBoost models was conducted using accuracy, precision, recall, F1-score, and ROC-AUC as performance indicators.
The experimental findings reveal that ensemble-based clas- sifiers, especially XGBoost and Random Forest, significantly outperform traditional models in predictive accuracy and gen- eralization. Feature importance analysis further emphasizes the dominant influence of credit history, applicant income, and loan amount on loan approval outcomes. The proposed framework demonstrates that integrating advanced ensemble learning tech- niques with structured preprocessing can drastically improve decision consistency, reduce default risk, and support financial inclusion through automated credit assessment. This research provides a foundation for building transparent, data-driven, and scalable decision support systems within the banking sector.
Keywords - Loan approval prediction, credit risk modeling, XGBoost, ensemble learning, financial technology, predictive analytics, explainable AI