record linkage

New methods for small area estimation with linkage uncertainty

In official statistics, interest for data integration has been increasingly growing, due to the need of extracting information from different sources. However, the effects of these procedures on the validity of the resulting statistical analyses has been disregarded for a long time. In recent years, it has been largely recognized that linkage is not an error-free procedure and linkage errors, as false links and/or missed links, can invalidate the reliability of estimates in standard statistical models.

A unified framework for de-duplication and population size estimation (with Discussion)

Data de-duplication is the process of finding records in one or more datasets belonging to the same entity. In this paper we tackle the de-duplication process via a latent entity model, where the observed data are perturbed versions of a set of key variables drawn from a finite population of $N$ different entities. The main novelty of our approach is to consider the population size $N$ as an unknown model parameter. As a result, one salient feature of the proposed method is the capability of the model to account for the de-duplication uncertainty in the population size estimation.

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma