
In wildlife animal populations, as well as in human populations, appropriate statistical models should be conceived for inferring the unknown size of a finite population when there is no available or reliable information on the complete enumeration and identification of each units of the target population. Although this is the rule in the wildlife management, it is also often the case in the social sciences and medical sciences, where there are multiple sources or registers for a target population, but none of them is fully exhaustive.
A lot of research have been developed on rigorously approaching the problem of inferring the complete enumeration by means of multiple recording systems. Mutuating from the wildlife management, this class of statistical models is often referred to as capture-recapture models. Individual data derives from the results of consecutive capture stages (sources, registers or recording systems) where the unit captured for the first time is marked so that once it is observed again in the next stage can be identified and is considered as a recapture.
In this research we propose to approach the study of the statistical models related to counting distributions as well as multiple binary outcomes from a Bayesian inferential perspective. We propose to extend the available models and tools in the presence of several sources of heterogeneity as well as external auxiliary information on the characteristics of the population units as well as on the recording system. The advantages of the Bayesian approach have been only recently specifically focussed in terms of 1) flexibility of the model framework due to its natural ability to incorporate latent unobserved features and 2) inferential improvements on the precision of the resulting estimators. We also propose to provide suitable Bayesian tools for planning the configuration of the capture-recapture experiments.
(C1) [mixture models for counting data]
At present there is no available reference Bayesian solution for the estimation of the unknown population size in the presence of a counting distribution arising from a compound mixture of Poisson distributions. Some previous attempt in a nonparametric setting is provided in [29] where the focus is on subjective prior specification and no attempt is made in understanding the inferential properties in an increasing finite N scenario.
We aim at innovating and expand current understanding by:
- (C1-1) relying on a moment based parameterization so that a formal reference prior distribution can be derived for the nuisance parameters. This is currently not available in the literature.
- (C1-2) investigating the inferential improvements with respect to the classical solutions in [7] both in terms of point estimate precision and actual coverage of interval estimation
- (C1-3) understanding the limits of all the competing approaches for increasing population size relating these limits with the sources of nonidentifiability already known in the frequentist domain
[29] Guindani M. et al. (2014) A Bayesian Semi-parametric Approach for the Differential Analysis of Sequence Counts Data, J R Stat Soc Ser C Appl Stat. 2014 Apr; 63(3): 385-404.
(C2) [Flexible behavioral model framework]
The flexibility of the model framework in [26] of better fitting and understanding behavioral response and hence more reliably estimating the population size N will be further enhanced with the Bayesian extension by
- (C2-1) relying on a recent proposal of a latent variable strategy [30] for modelling and inferring on a logistic model structure using partially conjugate priors which allows for a reduction in the computational MCMC efforts for approximating the posterior distribution
- (C2-2) understanding the comparative performance with the corresponding frequentist approach
- (C2-3) overcoming some of the model selection drawbacks that has been raised in [22] and only partially addressed with ad hoc frequentist adjustments by means of the natural Bayesian candidate criteria such as those based on DIC and marginal likelihood.
[30] Polson, N. et al. (2013) Bayesian Inference for Logistic Models Using Pólya-Gamma Latent Variables, JASA, 108(504), p. 1339-1349
(C3) [Bayesian planning of capture-recapture experiments]
The use of a Bayesian predictive approach for planning a suitable number of occasions (independent) sources in a capture-recapture setting when the size of the population is unknown to the best of our knowledge is new. The proposed approach can overcome previous theoretical drawbacks highlighted in [26] due to the difficulties of the norml approximation of the profile likelihood in a capture-recapture context especially in some specific model assumptions (heterogeneity and behavioural effects). Moreover, we believe that
- (C3-1) expanding the methodology of planning recapture experiments using more transparent prior assumptions and relying on new criteria which can be shared with researchers represents an interesting addition to the field
- (C3-2) the presence of a prior distribution on N rather than a fixed assumed pre-experimental value N_0 can make the planning more robust to likely deviation of the real data context from the assumed quantities
- (C3-3) the acknowledged (theoretical) superiority of the Bayesian estimation in simpler behavioral recapture models [18] should offer the opportunity of saving resources by providing smaller numbers of planned sampling occasions with respect to the frequentist solutions when the same estimate precision is guaranteed.
(C4) [Overcoverage and undercoverage with latent class modelling]
- (C4-1) The use of latent class for modelling heterogeneity in a Bayesian framework is not new [31, 32] but it is new in the context of modelling and understanding inferential difficulties in the simultaneous presence of overcoverage and undercoverage due to unit mis-identification (false positive and false negative).
- (C4-2) Investigating from a theoretical perspective possible sources of non-identification of the model parameters
- (C4-3) Averaging of a suitable subset of relevant submodels of interest when dealing with a log-linear reparameterization with alternative dependence structure can help assessing more realistically the underlying estimate uncertainty
[31] Bartolucci F., Mira Antonietta, Scaccia L. 2003 Answering two biological questions with a latent class model via MCMC applied to capture-recapture data, in Applied Bayesian Statistical Studies in Biology and Medicine, Kluwer Academic Publishers, pp. 7-23
[32] Manrique-Vallier D. (2016) Bayesian Population Size Estimation Using Dirichlet Process Mixtures, Biometrics 72, 1246-1254
[33] Q. da-Silva C., (2009) Bayesian analysis to correct false-negative errors in capture¿recapture photo-ID abundance estimates. Braz. J. of Prob. Stat., 23(1), 36-48