Interpretable Machine Learning Using Partial Linear Models*Journal articleEmmanuel Flachaire, Sullivan Hué, Sébastien Laurent et Gilles Hacheme, Oxford Bulletin of Economics and Statistics, Volume 69, 2023

Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes which has raised concerns from practitioners and regulators. As an alternative, we suggest using partial linear models that are inherently interpretable. Specifically, we propose to combine parametric and non-parametric functions to accurately capture linearities and non-linearities prevailing between dependent and explanatory variables, and a variable selection procedure to control for overfitting issues. Estimation relies on a two-step procedure building upon the double residual method. We illustrate the predictive performance and interpretability of our approach on a regression problem.

Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effectsJournal articleElena Dumitrescu, Sullivan Hué, Christophe Hurlin et Sessi Tokpavi, European Journal of Operational Research, Volume 297, Issue 3, pp. 1178-1192, 2022

In the context of credit scoring, ensemble methods based on decision trees, such as the random forest method, provide better classification performance than standard logistic regression models. However, logistic regression remains the benchmark in the credit risk industry mainly because the lack of interpretability of ensemble methods is incompatible with the requirements of financial regulators. In this paper, we propose a high-performance and interpretable credit scoring method called penalised logistic tree regression (PLTR), which uses information from decision trees to improve the performance of logistic regression. Formally, rules extracted from various short-depth decision trees built with original predictive variables are used as predictors in a penalised logistic regression model. PLTR allows us to capture non-linear effects that can arise in credit scoring data while preserving the intrinsic interpretability of the logistic regression model. Monte Carlo simulations and empirical applications using four real credit default datasets show that PLTR predicts credit risk significantly more accurately than logistic regression and compares competitively to the random forest method.

Measuring network systemic risk contributions: A leave-one-out approachJournal articleSullivan Hué, Yannick Lucotte et Sessi Tokpavi, Journal of Economic Dynamics and Control, Volume 100, pp. 86-114, 2019

The aim of this paper is to propose a new network measure of systemic risk contributions that combines the pair-wise Granger causality approach with the leave-one-out concept. This measure is based on a conditional Granger causality test and consists of measuring how far the proportion of statistically significant connections in the system breaks down when a given financial institution is excluded. We analyse the performance of our measure of systemic risk by considering a sample of the largest banks worldwide over the 2003–2018 period. We obtain three important results. First, we show that our measure is able to identify a large number of banks classified as global systemically important banks (G-SIBs) by the Financial Stability Board (FSB). Second, we find that our measure is a robust and statistically significant early-warning indicator of downside returns during the last financial crisis. Finally, we investigate the potential determinants of our measure of systemic risk and find similar results to the existing literature. In particular, our empirical results suggest that the size and the business model of banks are significant drivers of systemic risk.