During the last years, National Statistics Institutes have been exploring the possibility to produce statistics based on administrative data only. In particular, the interest in populations' size estimation is increasing. In such cases, some issues naturally emerge due to the fact that the aims of those who collect data and those who use them differ. On the one hand, it is very likely to have out-of-target units in the datasets. On the other hand - and obviously enough, some units that belong to the target population are not observed. In practice, this is the case of incomplete contingency tables whose cells counts consist partially of overestimations. This project's aim is to explore the potential efficiency of data augmentation algorithms which deal with both over and undercoverage issues in a population size estimation problem. Furthermore, we aim to allow the inclusion of important and non-negligible a priori information, in a fully subjective Bayesian approach.
This project fits perfectly into the most recent research interests in the official statistics field. The entity ambiguity problem is a key problem for the National Statistics Institutes. However, although the issue is tackled, it is not solved yet. Each of the existing approaches presents some shortcomings or limits, which we may summarised as follows. On the one hand, the frequentist approach - which deals mainly with log-linear models - struggles because of the degrees of freedom's constraint. In other words, in the case of a few sources, the number of parameters that can be used to specify the model is very low. On the other hand, for computational reasons Bayesian works deal mainly with decomposable graphical models, which are a subgroup of the models we may encounter in real applications. Merging the two approaches overcomes both issues. A Bayesian approach let the number of parameters to be higher (with some weaker bonds though); yet the use of hierarchical models gives more flexibility. Overall, the result would be a very flexible model which can be potentially extended in many directions. According to the NSI's interests, we may include the use of covariates; introduce linkage or matching uncertainty; consider the case of structural zeros.
Another aspect that should be highlighted is the will to introduce a subjective approach.
Nowadays, we have so much information and at the same time we are reluctant to use it. Moving to a subjective approach with scientific rigor would mean a gain in efficiency from many points of view; above all, it allows us to avoid wasting important information that has been previously collected using money and time.