Multivariate Statistical Matching Using Graphical Modeling

01 Pubblicazione su rivista
Conti Pier Luigi, Marella Daniela, Vicard Paola, Vitale Vincenzina
ISSN: 0888-613X

The goal of statistical matching, at amacrolevel, is the estimation of the joint distribution of variables separately observed in independent samples. The lack of joint informationon the variables of interest leads to uncertainty about the data generating model. In this paper we propose the use of graphical models to deal withthe statistical matching uncertainty for multivariate categorical variables. The use of Bayesian networks in thestatistical matching context allows both to introduce extra sample information on the dependence structure between the variables of interest andto use such an informationto factorize the joint probability distribution accordingto the graph decomposition ofa multivariate dependence in lower dimension components. This representation of the joint probability distribution, taking advantage of localrelationships, allows to simplifyboth parameters estimation and statistical matching quality evaluation in a multivariate context. A simulation experiment is performed in order to evaluate the performance ofthe proposed methodology with and without auxiliary information, as well as to compareit with the saturated multinomial model, in terms of uncertainty reduction. Finally, anapplication to a real case is provided. Results show a considerable improvement in the quality of statistical matching when the dependence structure is taken into account.

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma