Emmanuel Flachaire
Chercheur
,
Aix-Marseille Université
, Faculté d'économie et de gestion (FEG)
- Statut
- Professeur des universités
- Domaine(s) de recherche
- Économétrie
- Thèse
- 1998, Aix-Marseille Université
- Adresse
AMU - AMSE
5-9 Boulevard Maurice Bourdet, CS 50498
13205 Marseille Cedex 1
Ouali Maha, Badih Ghattas, Flachaire Emmanuel, Bozzi Laurent, Charpentier Philippe, 06/2025
Résumé
La demande croissante de flexibilité énergétique a conduit au développement de programmes de réduction de la consommation en période de pointe. EDF R&D a lancé plusieurs initiatives utilisant les données des compteurs Linky. Cependant, évaluer leur impact reste difficile en raison des modèles de consommation individualisés, du biais de sélection dû à la participation volontaire et des limites des méthodes classiques d'inférence causale. Nous simulons un environnement contrôlé pour examiner ces défis et comparons les approches traditionnelles avec des méthodes d'apprentissage automatique comme SyncTwin, qui crée des jumeaux synthétiques pour gérer les variables cachées. Nos résultats montrent que les séries temporelles complexes défient les méthodes classiques, surtout avec des variables non observées influençant le traitement. Nous expliquons théoriquement pourquoi ces approches échouent, en raison de covariables inaccessibles influençant le score de propension. Nos travaux ouvrent la voie à de nouvelles techniques pour améliorer l'évaluation des programmes d'économie d'énergie.
Mots clés
Intelligence Artificielle, Contrôle synthétique, Inférence causale
Jean-Marie Dufour, Emmanuel Flachaire, Lynda Khalaf, Abdallah Zalghout, Econometrics and Statistics, Vol. 33, pp. 230-245, 01/2025
Résumé
For standard inequality measures, distribution-free inference methods are valid under conventional assumptions that fail to hold in applications. Resulting Bahadur-Savage type failures are documented, and correction methods are provided. Proposed solutions leverage on the positive support prior that can be defended with economic data such as income, in which case directional non-parametric tests can be salvaged. Simulation analysis with generalized entropy measures allowing for heavy tails and contamination reveals that proposed lower confidence bounds provide concrete size and power improvements, particularly through bootstraps. Empirical analysis on within-country wage inequality and on world income inequality illustrates the usefulness of the proposed lower bound, as opposed to the erratic behavior of traditional upper bounds.
Arthur Charpentier, Emmanuel Flachaire, Economics Bulletin, Vol. 44, No. 1, 12/2024
Résumé
In this paper, we show that a decomposition of changes in inequality, with the mean log deviation index, can be obtained directly from the Oaxaca-Blinder decompositions of changes in means of incomes and log-incomes. It allows practitioners to conduct simultaneously empirical analyses to explain which factors account for changes in means and in inequality indices between two distributions with strictly positive values.
Mots clés
MLD index, Oaxaca-Blinder decomposition, Inequality, Inequality Oaxaca-Blinder decomposition MLD index
Arthur Charpentier, Emmanuel Flachaire, Ewen Gallic, Studies in Systems, Decision and Control, Vol. 483, pp. 45-89, 10/2024
Résumé
Many problems ask a question that can be formulated as a causal question: what would have happened if...? For example, would the person have had surgery if he or she had been Black? To address this kind of questions, calculating an average treatment effect (ATE) is often uninformative, because one would like to know how much impact a variable (such as the skin color) has on a specific individual, characterized by certain covariates. Trying to calculate a conditional ATE (CATE) seems more appropriate. In causal inference, the propensity score approach assumes that the treatment is influenced by , a collection of covariates. Here, we will have the dual view: doing an intervention, or changing the treatment (even just hypothetically, in a thought experiment, for example by asking what would have happened if a person had been Black) can have an impact on the values of . We will see here that optimal transport allows us to change certain characteristics that are influenced by the variable whose effect we are trying to quantify. We propose here a mutatis mutandis version of the CATE, which will be done simply in dimension one by saying that the CATE must be computed relative to a level of probability, associated to the proportion of x (a single covariate) in the control population, and by looking for the equivalent quantile in the test population. In higher dimension, it will be necessary to go through transport, and an application will be proposed on the impact of some variables on the probability of having an unnatural birth (the fact that the mother smokes, or that the mother is Black).
Mots clés
Quantiles, Optimal Transport, Mutatis Mutandis, Counterfactual, CATE, Conditional Average Treatment Effects, Causality
Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire, Ewen Gallic, François Hu, 05/2024
Résumé
A binary scoring classifier can appear well-calibrated according to standard calibration metrics, even when the distribution of scores does not align with the distribution of the true events. In this paper, we investigate the impact of postprocessing calibration on the score distribution (sometimes named "recalibration"). Using simulated data, where the true probability is known, followed by real-world datasets with prior knowledge on event distributions, we compare the performance of an XGBoost model before and after applying calibration techniques. The results show that while applying methods such as Platt scaling, Beta calibration, or isotonic regression can improve the model's calibration, they may also lead to an increase in the divergence between the score distribution and the underlying event probability distribution.
Jean-Marie Dufour, Emmanuel Flachaire, Lynda Khalaf, Abdallah Zalghout, Journal of Economic Inequality, Vol. 22, No. 2, pp. 433-452, 02/2024
Résumé
We propose Fieller-type methods for inference on generalized entropy inequality indices in the context of the two-sample problem which covers testing the statistical significance of the difference in indices, and the construction of a confidence set for this difference. In addition to irregularities arising from thick distributional tails, standard inference procedures are prone to identification problems because of the ratio transformation that defines the considered indices. Simulation results show that our proposed method outperforms existing counterparts including simulation-based permutation methods and results are robust to different assumptions about the shape of the null distributions. Improvements are most notable for indices that put more weight on the right tail of the distribution and for sample sizes that match macroeconomic type inequality analysis. While irregularities arising from the right tail have long been documented, we find that left tail irregularities are equally important in explaining the failure of standard inference methods. We apply our proposed method to analyze income per-capita inequality across U.S. states and non-OECD countries. Empirical results illustrate how Fieller-based confidence sets can: (i) differ consequentially from available ones leading to conflicts in test decisions, and (ii) reveal prohibitive estimation uncertainty in the form of unbounded outcomes which serve as proper warning against flawed interpretations of statistical tests.
Mots clés
Inequality, Generalized entropy, Two samples, Fieller, Identification-robust
Emmanuel Flachaire, Sullivan Hué, Sébastien Laurent, Gilles Hacheme, Oxford Bulletin of Economics and Statistics, 12/2023
Résumé
Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes which has raised concerns from practitioners and regulators. As an alternative, we suggest using partial linear models that are inherently interpretable. Specifically, we propose to combine parametric and non‐parametric functions to accurately capture linearities and non‐linearities prevailing between dependent and explanatory variables, and a variable selection procedure to control for overfitting issues. Estimation relies on a two‐step procedure building upon the double residual method. We illustrate the predictive performance and interpretability of our approach on a regression problem.
Mots clés
Machine leaning, Lasso, Autometrics, GAM
Emmanuel Flachaire, Nora Lustig, Andrea Vigorito, Review of Income and Wealth, 10/2022 (à paraître)
Résumé
Household surveys do not capture incomes at the top of the distribution well. This yields biased inequality measures. We compare the performance of the reweighting and replacing methods to address top incomes underreporting in surveys using information from tax records. The biggest challenge is that the true threshold above which underreporting occurs is unknown. Relying on simulation, we construct a hypothetical true distribution and a “distorted” distribution that mimics an underreporting pattern found in a novel linked data for Uruguay. Our simulations show that if one chooses a threshold that is not close to the true one, corrected inequality measures may be significantly biased. Interestingly, the bias using the replacing method is less sensitive to the choice of threshold. We approach the threshold selection challenge in practice using the Uruguayan linked data. Our findings are analogous to the simulation exercise. These results, however, should not be considered a general assessment of the two methods.
Mots clés
Correction methods, Household surveys, Income underreporting, Inequality, Linked data, Replacing, Reweighting, Tax records
Arthur Charpentier, Emmanuel Flachaire, Journal of Economic Inequality, Vol. 20, No. 1, pp. 1-25, 03/2022
Résumé
Top incomes are often related to Pareto distribution. To date, economists have mostly used Pareto Type I distribution to model the upper tail of income and wealth distribution. It is a parametric distribution, with interesting properties, that can be easily linked to economic theory. In this paper, we first show that modeling top incomes with Pareto Type I distribution can lead to biased estimation of inequality, even with millions of observations. Then, we show that the Generalized Pareto distribution and, even more, the Extended Pareto distribution, are much less sensitive to the choice of the threshold. Thus, they can provide more reliable results. We discuss different types of bias that could be encountered in empirical studies and, we provide some guidance for practice. To illustrate, two applications are investigated, on the distribution of income in South Africa in 2012 and on the distribution of wealth in the United States in 2013.
Mots clés
Inequality measures, Top incomes, Pareto distribution
Frank Cowell, Emmanuel Flachaire, Research on Economic Inequality: Poverty, Inequality and Shocks, Vol. 29, pp. 95-103, 12/2021
Résumé
In the case of ordered categorical data, the concepts of minimum and maximum inequality are not straightforward. In this chapter, the authors consider the Cowell and Flachaire (2017) indices of inequality. The authors show that the minimum and maximum inequality depend on preliminary choices made before using these indices, on status and the sensitivity parameter. Specifically, maximum inequality can be given by the distribution which is the most concentrated in the top or bottom category, or by the uniform distribution.
Mots clés
Uniform distribution, Minimum, Maximum, World Values Survey, Ordinal data, Inequality
Frank Cowell, Emmanuel Flachaire, Springer International Publishing, pp. 1-46, 01/2021 (à paraître)
Résumé
In recent years there has been a surge of interest in the subject of inequality, fuelled by new facts and new thinking. The literature on inequality has expanded rapidly as official data on income, wealth, and other personal information have become richer and more easily accessible. Ideas about the meaning of inequality have expanded to encompass new concepts and different dimensions of economic inequality. The purpose of this chapter is to give a concise overview of the issues that are involved in translating ideas about inequality into practice using various types of data.
Arthur Charpentier, Emmanuel Flachaire, Dynamic Modeling and Econometrics in Economics and Finance, Vol. 27, pp. 355-387, 01/2021
Résumé
The Pareto model is very popular in risk management, since simple analytical formulas can be derived for financial downside risk measures (value-at-risk, expected shortfall) or reinsurance premiums and related quantities (large claim index, return period). Nevertheless, in practice, distributions are (strictly) Pareto only in the tails, above (possible very) large threshold. Therefore, it could be interesting to take into account second-order behavior to provide a better fit. In this article, we present how to go from a strict Pareto model to Pareto-type distributions. We discuss inference, derive formulas for various measures and indices, and finally provide applications on insurance losses and financial risks.
Mots clés
EPD, Expected shortfall, Financial risks, GPD, Hill, Pareto, Quantile, Rare events, Regular variation, Reinsurance, Second order, Value-at-risk
Jean-Marie Dufour, Emmanuel Flachaire, Lynda Khalaf, Journal of Business and Economic Statistics, Vol. 37, No. 3, pp. 457-470, 07/2019
Résumé
Asymptotic and bootstrap tests for inequality measures are known to perform poorly in finite samples when the underlying distribution is heavy-tailed. We propose Monte Carlo permutation and bootstrap methods for the problem of testing the equality of inequality measures between two samples. Results cover the Generalized Entropy class, which includes Theil’s index, the Atkinson class of indices, and the Gini index. We analyze finite-sample and asymptotic conditions for the validity of the proposed methods, and we introduce a convenient rescaling to improve finite-sample performance. Simulation results show that size correct inference can be obtained with our proposed methods despite heavy tails if the underlying distributions are sufficiently close in the upper tails. Substantial reduction in size distortion is achieved more generally. Studentized rescaled Monte Carlo permutation tests outperform the competing methods we consider in terms of power.
Mots clés
Permutation test, Inequality measures, Income distribution, Bootstrap
Martin Biewen, Emmanuel Flachaire, Econometrics, Vol. 6, No. 4, pp. 42, 12/2018
Résumé
It is well-known that, after decades of non-interest in the theme, economics has experienced a proper surge in inequality research in recent years. [...]
Thomas Chuffart, Emmanuel Flachaire, Anne Peguin-Feissolle, Studies in Nonlinear Dynamics and Econometrics, Vol. 22, No. 5, 12/2018
Résumé
In this article, a misspecification test in conditional volatility and GARCH-type models is presented. We propose a Lagrange Multiplier type test based on a Taylor expansion to distinguish between (G)ARCH models and unknown GARCH-type models. This new test can be seen as a general misspecification test of a large set of GARCH-type univariate models. It focuses on the short-term component of the volatility. We investigate the size and the power of this test through Monte Carlo experiments and we compare it to two other standard Lagrange Multiplier tests, which are more restrictive. We show the usefulness of our test with an illustrative empirical example based on daily exchange rate returns.
Frank A. Cowell, Emmanuel Flachaire, Quantitative Economics, Vol. 9, No. 2, pp. 865-901, 07/2018
Résumé
Our new approach to mobility measurement involves separating out the valuation of positions in terms of individual status (using income, social rank, or other criteria) from the issue of movement between positions. The quantification of movement is addressed using a general concept of distance between positions and a parsimonious set of axioms that characterize the distance concept and yield a class of aggregative indices. This class of indices induces a superclass of mobility measures over the different status concepts consistent with the same underlying data. We investigate the statistical inference of mobility indices using two well‐known status concepts, related to income mobility and rank mobility. We also show how our superclass provides a more consistent and intuitive approach to mobility, in contrast to other measures in the literature, and illustrate its performance using recent data from China.
Mots clés
Measurement axiomatic approach, Rank mobility, Income mobility
Jean-Marie Dufour, Emmanuel Flachaire, Lynda Khalaf, Abdallah Zalghout, Springer International Publishing, pp. 143-155, 02/2018
Résumé
Asymptotic and bootstrap inference methods for inequality indices are for the most part unreliable due to the complex empirical features of the underlying distributions. In this paper, we introduce a Fieller-type method for the Theil Index and assess its finite-sample properties by a Monte Carlo simulation study. The fact that almost all inequality indices can be written as a ratio of functions of moments and that a Fieller-type method does not suffer from weak identification as the denominator approaches zero, makes it an appealing alternative to the available inference methods. Our simulation results exhibit several cases where a Fieller-type method improves coverage. This occurs in particular when the Data Generating Process (DGP) follows a finite mixture of distributions, which reflects irregularities arising from low observations (close to zero) as opposed to large (right-tail) observations. Designs that forgo the interconnected effects of both boundaries provide possibly misleading finite-sample evidence. This suggests a useful prescription for simulation studies in this literature.
Martin Biewen, Emmanuel Flachaire, 01/2018
Résumé
This is a reprint of articles from the Special Issue published online in the open access journal Econometrics (ISSN 2225-1146) from 2017 to 2018 (available at: https://www.mdpi.com/journal/ econometrics/special issues/inequality)
Arthur Charpentier, Emmanuel Flachaire, Antoine Ly, Economie et Statistique / Economics and Statistics, No. 505d, pp. 147-169, 01/2018
Résumé
On the face of it, econometrics and machine learning share a common goal: to build a predictive model, for a variable of interest, using explanatory variables (or features). However, the two fields have developed in parallel, thus creating two different cultures. Econometrics set out to build probabilistic models designed to describe economic phenomena, while machine learning uses algorithms capable of learning from their mistakes, generally for classification purposes (sounds, images, etc.). Yet in recent years, learning models have been found to be more effective than traditional econometric methods (the price to pay being lower explanatory power) and are, above all, capable of handling much larger datasets. Given this, econometricians need to understand what the two cultures are, what differentiates them and, above all, what they have in common in order to draw on tools developed by the statistical learning community with a view to incorporating them into econometric models.
Mots clés
Least squares, Modelling, Econometrics, Big data, Learning
Frank Cowell, Emmanuel Flachaire, Economica, Vol. 84, No. 334, pp. 290 - 321, 04/2017
Résumé
The standard theory of inequality measurement assumes that the equalisand is a cardinal quantity, with known cardinalization. However, one often needs to make inequality comparisons where either the cardinalization is unknown or the underlying data are categorical. We propose an alternative approach to inequality analysis that is rigorous, has a natural interpretation, and embeds both the ordinal data problem and the well-known cardinal data problem. We show how the approach can be applied to the inequality of happiness and of health status.
Frank A. Cowell, Emmanuel Flachaire, Elsevier, Vol. 2A, Ch.6, pp. 359--465, 11/2015
Résumé
This Chapter is about the techniques, formal and informal, that are commonly used to give quantitative answers in the field of distributional analysis - covering subjects including inequality, poverty and the modelling of income distributions. It deals with parametric and non-parametric approaches and the way in which imperfections in data may be handled in practice.
Mots clés
Economie quantitative
Arthur Charpentier, Emmanuel Flachaire, Actualite Economique, Vol. 91, No. 1-2, pp. 141--159, 03/2015
Résumé
Standard kernel density estimation methods are very often used in practice to estimate density function. It works well in numerous cases. However, it is known not to work so well with skewed, multimodal and heavy-tailed distributions. Such features are usual with income distributions, defined over the positive support. In this paper, we show that a preliminary logarithmic transformation of the data, combined with standard kernel density estimation methods, can provide a much better fit of the density estimation.
Mots clés
Economie quantitative
Frank A. Cowell, Russell Davidson, Emmanuel Flachaire, Journal of Business and Economic Statistics, Vol. 33, No. 1, pp. 54--67, 01/2015
Résumé
An axiomatic approach is used to develop a one-parameter family of measures of divergence between distributions. These measures can be used to perform goodness-of-fit tests with good statistical properties. Asymptotic theory shows that the test statistics have well-defined limiting distributions which are, however, analytically intractable. A parametric bootstrap procedure is proposed for implementation of the tests. The procedure is shown to work very well in a set of simulation experiments, and to compare favorably with other commonly used goodness-of-fit tests. By varying the parameter of the statistic, one can obtain information on how the distribution that generated a sample diverges from the target family of distributions when the true distribution does not belong to that family. An empirical application analyzes a U.K. income dataset.
Mots clés
Economie quantitative
Emmanuel Flachaire, Cecilia García-Peñalosa, Maty Konte, Journal of Comparative Economics, Vol. 42, No. 1, pp. 212--229, 01/2014
Résumé
After a decade of research on the relationship between institutions and growth, there is no consensus about the exact way in which these two variables interact. In this paper we re-examine the role that institutions play in the growth process using data for developed and developing economies over the period 1975–2005. Our results indicate that the data is best described by an econometric model with two growth regimes. Political institutions are the key determinant of which regime an economy belongs to, while economic institutions have a direct impact on growth rates within each regime. These findings support the hypothesis that political institutions are one of the deep causes of growth, setting the stage in which economic institutions and standard covariates operate.
Mots clés
Growth, Institutions, Mixture regressions
Emmanuel Flachaire, Guillaume Hollard, Jason F. Shogren, Theory and Decision, Vol. 74, No. 3, pp. 431--437, 01/2013
Résumé
This paper tests whether individual perceptions of markets as good or bad for a public good is correlated with the propensity to report gaps in willingness to pay and willingness to accept revealed within an incentive compatible mechanism. Identifying people based on a notion of market affinity, we find a substantial part of the gap can be explained by controlling for some variables that were not controlled for before. This result suggests the valuation gap for public goods can be reduced through well-defined variables.
Mots clés
Willingness to pay, Willingness to accept, Experimental economics
Frank A. Cowell, Emmanuel Flachaire, Sanghamitra Bandyopadhyay, Journal of Economic Inequality, Vol. 11, No. 4, pp. 421--437, 01/2013
Résumé
We investigate a general problem of comparing pairs of distributions which includes approaches to inequality measurement, the evaluation of “unfair” income inequality, evaluation of inequality relative to norm incomes, and goodness of fit. We show how to represent the generic problem simply using (1) a class of divergence measures derived from a parsimonious set of axioms and (2) alternative types of “reference distributions.” The problems of appropriate statistical implementation are discussed and empirical illustrations of the technique are provided using a variety of reference distributions.
Mots clés
C10, D63, Divergence measures, Generalised entropy measures, Income distribution, Inequality measurement
Ibrahim Ahamada, Emmanuel Flachaire, Oxford University Press, pp. 180, 01/2011
Russell Davidson, Emmanuel Flachaire, Econometrics, Vol. 146, No. 1, pp. 162-169, 09/2008
Résumé
The wild bootstrap is studied in the context of regression models with heteroskedastic disturbances. We show that, in one very specific case, perfect bootstrap inference is possible, and a substantial reduction in the error in the rejection probability of a bootstrap test is available much more generally. However, the version of the wild bootstrap with this desirable property is without the skewness correction afforded by the currently most popular version of the wild bootstrap. Simulation experiments show that this does not prevent the preferred version from having the smallest error in rejection probability in small and medium-sized samples.
Mots clés
Wild bootstrap, Heteroskedasticity, Bootstrap inference
Russell Davidson, Emmanuel Flachaire, Econometrics, Vol. 141, No. 1, pp. 141-166, 11/2007
Résumé
A random sample drawn from a population would appear to offer an ideal opportunity to use the bootstrap in order to perform accurate inference, since the observations of the sample are IID. In this paper, Monte Carlo results suggest that bootstrapping a commonly used index of inequality leads to inference that is not accurate even in very large samples, although inference with poverty indices is satisfactory. We find that the major cause is the extreme sensitivity of many inequality indices to the exact nature of the upper tail of the income distribution. This leads us to study two non-standard bootstraps, the m out of n bootstrap, which is valid in some situations where the standard bootstrap fails, and a bootstrap in which the upper tail is modelled parametrically. Monte Carlo results suggest that accurate inference can be achieved with this last method in moderately large samples.
Mots clés
Income distribution, Poverty, Bootstrap inference
Emmanuel Flachaire, Guillaume Hollard, Stéphane Luchini, Recherches Economiques de Louvain - Louvain economic review, Vol. 73, No. 4, pp. 369-385, 01/2007
Résumé
This article addresses the important issue of anchoring in contingent valuation surveys that use the double-bounded elicitation format. Anchoring occurs when responses to the follow-up dichotomous choice valuation question are influenced by the bid presented in the initial dichotomous choice question. Specifically, we adapt a theory from psychology to characterize respondents as those who are likely to anchor and those who are not. Using a model developed by Herriges and Shogren (1996), our method appears successful in discriminating between those who anchor and those who did not. An important result is that when controlling for anchoring - and allowing the degree of anchoring to differ between respondent groups - the efficiency of the double-bounded welfare estimate is greater than for the initial dichotomous choice question. This contrasts with earlier research that finds that the potential efficiency gain from the double-bounded questions is lost when anchoring is controlled for and that we are better off not asking follow-up questions.
Mots clés
Anchoring, Contingent valuation, Heterogeneity, Framing effects
Emmanuel Flachaire, Economie et Prévision, Vol. 142, pp. 183-194, 01/2001
Résumé
Dans la pratique, la plupart des statistiques de test ont une distribution de probabilité de forme inconnue. Généralement, on utilise leur loi asymptotique comme approximation de la vraie loi. Mais, si l'échantillon dont on dispose n'est pas de taille suffisante cette approximation peut être de mauvaise qualité et les tests basés dessus largement biaisés. Les méthodes du bootstrap permettent d'obtenir une approximation de la vraie loi de la statistique en général plus précise que la loi asymptotique. Elles peuvent également servir à approximer la loi d'une statistique qu'on ne peut pas calculer analytiquement. Dans cet article, nous présentons une méthodologie générale du bootstrap dans le contexte des modèles de régression.
Mots clés
Bootstrap, Modèle de régression
Emmanuel Flachaire, Economics Letters, Vol. 64, pp. 257-262, 01/1999
Résumé
In this paper we are interested in heteroskedastic regression models, for which an appropriate bootstrap method is bootstrapping pairs, proposed by Freedman (1981). We propose an ameliorate version of it, with better numerical performance.
Mots clés
Bootstrap, Heteroskedasticity
Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire, Ewen Gallic, François Hu
Résumé
The assessment of binary classifier performance traditionally centers on discriminative ability using metrics, such as accuracy. However, these metrics often disregard the model's inherent uncertainty, especially when dealing with sensitive decision-making domains, such as finance or healthcare. Given that model-predicted scores are commonly seen as event probabilities, calibration is crucial for accurate interpretation. In our study, we analyze the sensitivity of various calibration measures to score distortions and introduce a refined metric, the Local Calibration Score. Comparing recalibration methods, we advocate for local regressions, emphasizing their dual role as effective recalibration tools and facilitators of smoother visualizations. We apply these findings in a real-world scenario using Random Forest classifier and regressor to predict credit default while simultaneously measuring calibration during performance optimization.
Mots clés
Calibration, Binary classification, Local regression
Arthur Charpentier, Emmanuel Flachaire
Résumé
Standard kernel density estimation methods are very often used in practice to estimate density function. It works well in numerous cases. However, it is known not to work so well with skewed, multimodal and heavy-tailed distributions. Such features are usual with income distributions, defined over the positive support. In this paper, we show that a preliminary logarithmic transformation of the data, combined with standard kernel density estimation methods, can provide a much better fit of the density estimation.
Mots clés
Nonparametric density estimation, Heavy-tail, Income distribution, Data transformation, Lognormal kernel
Frank A. Cowell, Emmanuel Flachaire
Résumé
This Chapter is about the techniques, formal and informal, that are commonly used to give quantitative answers in the field of distributional analysis - covering subjects including inequality, poverty and the modelling of income distributions. It deals with parametric and non-parametric approaches and the way in which imperfections in data may be handled in practice.
Mots clés
Goodness of fit, Dominance criteria, Poverty measure, Hypothesis testing, Welfare indices, Inequality measure, Bootstrap, Influence function, Non-parametric methods, Confidence intervals, Parametric modelling