AMU - AMSE

5-9 Boulevard Maurice Bourdet, CS 50498

13205 Marseille Cedex 1

# Flachaire

## Publications

We propose Fieller-type methods for inference on generalized entropy inequality indices in the context of the two-sample problem which covers testing the statistical significance of the difference in indices, and the construction of a confidence set for this difference. In addition to irregularities arising from thick distributional tails, standard inference procedures are prone to identification problems because of the ratio transformation that defines the considered indices. Simulation results show that our proposed method outperforms existing counterparts including simulation-based permutation methods and results are robust to different assumptions about the shape of the null distributions. Improvements are most notable for indices that put more weight on the right tail of the distribution and for sample sizes that match macroeconomic type inequality analysis. While irregularities arising from the right tail have long been documented, we find that left tail irregularities are equally important in explaining the failure of standard inference methods. We apply our proposed method to analyze income per-capita inequality across U.S. states and non-OECD countries. Empirical results illustrate how Fieller-based confidence sets can: (i) differ consequentially from available ones leading to conflicts in test decisions, and (ii) reveal prohibitive estimation uncertainty in the form of unbounded outcomes which serve as proper warning against flawed interpretations of statistical tests.

In this paper, we show that a decomposition of changes in inequality, with the mean log deviation index, can be obtained directly from the Oaxaca-Blinder decompositions of changes in means of incomes and log-incomes. It allows practitioners to conduct simultaneously empirical analyses to explain which factors account for changes in means and in inequality indices between two distributions with strictly positive values.

Many problems ask a question that can be formulated as a causal question: what would have happened if...? For example, would the person have had surgery if he or she had been Black? To address this kind of questions, calculating an average treatment effect (ATE) is often uninformative, because one would like to know how much impact a variable (such as the skin color) has on a specific individual, characterized by certain covariates. Trying to calculate a conditional ATE (CATE) seems more appropriate. In causal inference, the propensity score approach assumes that the treatment is influenced by $$\boldsymbol{x}$$x, a collection of covariates. Here, we will have the dual view: doing an intervention, or changing the treatment (even just hypothetically, in a thought experiment, for example by asking what would have happened if a person had been Black) can have an impact on the values of $$\boldsymbol{x}$$x. We will see here that optimal transport allows us to change certain characteristics that are influenced by the variable whose effect we are trying to quantify. We propose here a mutatis mutandis version of the CATE, which will be done simply in dimension one by saying that the CATE must be computed relative to a level of probability, associated to the proportion of x (a single covariate) in the control population, and by looking for the equivalent quantile in the test population. In higher dimension, it will be necessary to go through transport, and an application will be proposed on the impact of some variables on the probability of having an unnatural birth (the fact that the mother smokes, or that the mother is Black).

Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes which has raised concerns from practitioners and regulators. As an alternative, we suggest using partial linear models that are inherently interpretable. Specifically, we propose to combine parametric and non-parametric functions to accurately capture linearities and non-linearities prevailing between dependent and explanatory variables, and a variable selection procedure to control for overfitting issues. Estimation relies on a two-step procedure building upon the double residual method. We illustrate the predictive performance and interpretability of our approach on a regression problem.

Household surveys do not capture incomes at the top of the distribution well. This yields biased inequality measures. We compare the performance of the reweighting and replacing methods to address top incomes underreporting in surveys using information from tax records. The biggest challenge is that the true threshold above which underreporting occurs is unknown. Relying on simulation, we construct a hypothetical true distribution and a “distorted” distribution that mimics an underreporting pattern found in a novel linked data for Uruguay. Our simulations show that if one chooses a threshold that is not close to the true one, corrected inequality measures may be significantly biased. Interestingly, the bias using the replacing method is less sensitive to the choice of threshold. We approach the threshold selection challenge in practice using the Uruguayan linked data. Our findings are analogous to the simulation exercise. These results, however, should not be considered a general assessment of the two methods.

To compare income and wealth distributions and to assess the effects of policy that affect those distributions require reliable inequality-measurement tools. However, commonly used inequality measures such as the Gini coefficient have an apparently counter-intuitive property: income growth among the rich may actually reduce measured inequality. We show that there are just two inequality measures that both avoid this anomalous behavior and satisfy the principle of transfers. We further show that the recent increases in US income inequality are understated by the conventional Gini coefficient and explain why a simple alternative inequality measure should be preferred in practice.

Top incomes are often related to Pareto distribution. To date, economists have mostly used Pareto Type I distribution to model the upper tail of income and wealth distribution. It is a parametric distribution, with interesting properties, that can be easily linked to economic theory. In this paper, we first show that modeling top incomes with Pareto Type I distribution can lead to biased estimation of inequality, even with millions of observations. Then, we show that the Generalized Pareto distribution and, even more, the Extended Pareto distribution, are much less sensitive to the choice of the threshold. Thus, they can provide more reliable results. We discuss different types of bias that could be encountered in empirical studies and, we provide some guidance for practice. To illustrate, two applications are investigated, on the distribution of income in South Africa in 2012 and on the distribution of wealth in the United States in 2013.

In recent years there has been a surge of interest in the subject of inequality, fuelled by new facts and new thinking. The literature on inequality has expanded rapidly as official data on income, wealth, and other personal information have become richer and more easily accessible. Ideas about the meaning of inequality have expanded to encompass new concepts and different dimensions of economic inequality. The purpose of this chapter is to give a concise overview of the issues that are involved in translating ideas about inequality into practice using various types of data.

In the case of ordered categorical data, the concepts of minimum and maximum inequality are not straightforward. In this chapter, the authors consider the Cowell and Flachaire (2017) indices of inequality. The authors show that the minimum and maximum inequality depend on preliminary choices made before using these indices, on status and the sensitivity parameter. Specifically, maximum inequality can be given by the distribution which is the most concentrated in the top or bottom category, or by the uniform distribution.

The Pareto model is very popular in risk management, since simple analytical formulas can be derived for financial downside risk measures (value-at-risk, expected shortfall) or reinsurance premiums and related quantities (large claim index, return period). Nevertheless, in practice, distributions are (strictly) Pareto only in the tails, above (possible very) large threshold. Therefore, it could be interesting to take into account second-order behavior to provide a better fit. In this article, we present how to go from a strict Pareto model to Pareto-type distributions. We discuss inference, derive formulas for various measures and indices, and finally provide applications on insurance losses and financial risks.