Enhancing Credit Risk Prediction: A Multi-stage Ensemble Pipeline
arXiv:2509.22381v2 Announce Type: replace Abstract: Effective credit risk management is fundamental to financial decision-making, requiring robust models to predict default probabilities and classify financial entities. Traditional machine learning approaches face significant challenges when confronted with high-dimensional data, limited interpretability, rare-event detection, and multi-class risk imbalance. This research proposes a comprehensive multi-stage ensemble pipeline that synthesizes multiple complementary models: econometric models including Ordered logit and ordered probit, supervis — Haibo Wang, Jun Huang, Lutfu S. Sua, Figen Balo, Burak Dolar
View PDF
Abstract:Effective credit risk management is fundamental to financial decision-making, requiring robust models to predict default probabilities and classify financial entities. Traditional machine learning approaches face significant challenges when confronted with high-dimensional data, limited interpretability, rare-event detection, and multi-class risk imbalance. This research proposes a comprehensive multi-stage ensemble pipeline that synthesizes multiple complementary models: econometric models including Ordered logit and ordered probit, supervised learning algorithms, including XGBoost, Random Forest, Support Vector Machine, and Decision Tree; unsupervised methods such as K-Nearest Neighbors; deep learning architectures like Multilayer Perceptron; alongside LASSO regularization for feature selection and dimensionality reduction; and Error-Correcting Output Codes as an Ensemble classifier for handling imbalanced multi-class problems. We implement Permutation Feature Importance analysis for each prediction class across all constituent models to enhance model transparency. Our framework can optimize predictive performance while providing a more holistic approach to credit risk assessment. This research contributes to the development of more accurate and reliable computational models for strategic financial decision support by addressing three fundamental challenges in credit risk modeling. The empirical validation of our approach involves analyzing the Corporate Credit Ratings dataset, which contains credit ratings for 2,029 publicly listed US companies. Results demonstrate that our multi-stage ensemble pipeline significantly enhances the accuracy of financial entity classification regarding credit rating migrations (upgrades and downgrades) and default probability estimation.
Comments: 39 pages
Subjects:
Machine Learning (cs.LG)
Cite as: arXiv:2509.22381 [cs.LG]
(or arXiv:2509.22381v2 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2509.22381
arXiv-issued DOI via DataCite
Submission history
From: Haibo Wang [view email] [v1] Fri, 26 Sep 2025 14:09:04 UTC (902 KB) [v2] Fri, 27 Mar 2026 19:43:10 UTC (818 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Alibaba Poaches Google DeepMind Research Scientist For Qwen AI Push - Yahoo Finance
<a href="https://news.google.com/rss/articles/CBMijwFBVV95cUxOYTZwZk0walRzazJQampab1FCM2k4Uy1SYk12UWZraENkUXYzZU9kbnlGTGZJS0pFaTZIUFlKZFkwVnJkRzhKbXhNV3lNdUZpdF8tSU1LMklqcTZlUDZERDZ3VzdWbjNQYUN4T2d2ZkRQT1R1MUc0LXdYNndPQTNzbXBXMXJhb3ZEZE00ZFMtaw?oc=5" target="_blank">Alibaba Poaches Google DeepMind Research Scientist For Qwen AI Push</a> <font color="#6f6f6f">Yahoo Finance</font>





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!