Anne Ruiz-Gazen

Thematic seminars
big data and econometrics seminar

Anne Ruiz-Gazen

Toulouse School of Economics
Statistical data integration using a prediction approach for finite population inference
Joint with
Jean-François Beaumont, Alain Dessertaine, Camelia Goga, Estelle Médous, Pauline Puech
Venue

IBD Salle 21

Îlot Bernard du Bois - Salle 21

AMU - AMSE
5-9 boulevard Maurice Bourdet
13001 Marseille

Date(s)
Tuesday, April 26 2022| 2:00pm to 3:30pm
Contact(s)

Michel Lubrano: michel.lubrano[at]univ-amu.fr
Pierre Michel: pierre.michel[at]univ-amu.fr

Abstract

Combining survey sample data and big data is an important current challenge in finite population inference. While survey sample data are obtained through a probability sampling design, big data consist usually of non-probability samples. Many well-known unbiased or approximately unbiased estimation methods exist for estimating finite population parameters from a probability sample. Inference from a non-probability sample is, however, often subject to selection bias. Recently, a data integration approach has been proposed by Kim and Tam (2021) and incorporates a probability sample to handle the selection bias of non-probability samples. In the first part of the presentation, we propose to revisit their approach and study in detail the gain in terms of efficiency of some estimators when combining probability and non-probability samples. In the second part of the presentation, we focus on the case where the target variable is not observable in the big data source, while the auxiliary information, present in this source, is not measured in the probability sample. In such a situation, new estimators can be defined by following a prediction approach. These estimators are either design-based, model-based, or cosmetic. Their properties in terms of bias and efficiency are studied using theoretical and simulation results. The interest of the new estimators is illustrated in the context of the French postal service, where the objective is to estimate the monthly postal traffic by combining a survey of the mailmen rounds with the database containing the automatically processed postal mail.