Estimating population size in multiple record systems with uncertainty of state identification

2019

Estimating population size in multiple record systems with uncertainty of state identification

02 Pubblicazione su volume

DI CECCO, Davide

DOI: 10.1201/9781315120416

We consider the problem of estimating the size of a population of interest, or ``target population'', by integrating multiple data sources. Each source provides a list of the units of our population. In this context, we identify three possible scenarios: - Each unit of our target population is included in at least one of the sources, but the identification of the units is not error free: Some out--of--scope units are erroneously included in the lists and, viceversa, some units of our population are erroneously identified as out--of--scope; - All observed units are correctly identified as belonging or not to the target population. However, some units are not enlisted in any of the available sources. So, we have a problem of undercoverage of our lists; - Not all units are comprised in the data at hand, and the observed units are not correctly classified with respect to the target population. end{enumerate} The first scenario can be essentially characterized as a case of misclassification. We can exploit the information redundancy at our disposal to estimate the misclassification errors by making some assumptions on the randomness of that redundancy, and, as a result, we could even estimate unit--level probabilities of belonging to the target population. The second scenario represents a typical situation of a capture--recapture setting, where we have a set of lists wich are incomplete (they do not cover all units, and some unobserved units are not registered in any list) and overlapping (a unit can be registered in several sources). The event of being captured corresponds to the event of being registered in a list. Unlike the previous scenario, we can just estimate the number of unobserved units. In the third scenario, which is the focus of this chapter, we are assuming that both issues, of uncertainty of detection and uncertainty of state identification, are present in the data at hand. We essentially refer to a capture--recapture setting where the classic assumption of absence of error in the units identification is relaxed. In this context, a misclassification error can be rephrased as an ``erroneous capture''.

capture-recapture latent variable