Small area estimation: measurement error, benchmarking and record linkage.

Inviato da Anonimo (non verificato) il Lun, 18/04/2022 - 18:24

Anno:

2017

Nome e qualifica del proponente del progetto:

sb_p_602757

Abstract:

Small area models are mixed effects regression models that link the small areas and borrow strength from similar domains. When the auxiliary variables used in the models are measured with error, small area estimators that ignore the measurement error may be worse than direct estimators. Alternative small area estimators accounting for measurement error have been proposed but only for continuous auxiliary variables. Adopting a Bayesian approach, we plan to extend the unit-level model in order to account for measurement error in both continuous and categorical covariates. However, it is not always straightforward to choose which variables should be considered affected by measurement error: we explore the possibility of modeling the presence of measurement error using specific prior distributions, such as spike-and-slab or global-local shrinking priors. Once the estimates have been obtained, they should be calibrated since model-based estimates from the small areas do not usually match the value of the single estimate for the large area. Benchmarking is done by applying a constraint to ensure that the total of the small areas matches the grand total. We propose two alternative benchmarking strategies: one based on constrained prior distribution and the second one based on sequential Monte Carlo.
The results of the proposed methodologies will be discussed in light of an extensive simulation study and real data applications in demographic and economic context.
A second part of the project will deal with the possibility of calibrating small area models using information coming from two different sources, either survey data or administrative list. In order to tackle this problem we plan to improve and generalize a Bayesian record linkage strategy which is able to account, in a proper way, for the uncertainty due to the linkage step, into the standard error of the estimates of the small area model and of the predictors.

Componenti gruppo di ricerca:

sb_cp_is_758478

sb_cp_is_822230

sb_cp_is_757530

sb_cp_es_84512

sb_cp_es_84513

Innovatività:

Measurement error is an important issue in model-building. Whereas continuous covariates measured with error are more easily dealt with in the modelling stage, the literature on missclassified auxiliary variables is still lacking, especially in the context of small area models. Extending the approach in [10], we
model the measurement error in discrete covariates in terms of misclassification probabilities. In particular, the project will focus on a unit level model under which a Dirichlet prior distribution is placed over the misclassification probabilities and the joint posterior distribution of all unknown parameters can be obtained.
the Choice of which variables should be considered affected by measurement error is not an easy job.
Potentially all covariates can be considered affected by measurement error, because, as the response variable, they are sampled from a population. It is well recognized that accounting for the measurement error increases variability of the estimators; on the other side, if the measurement error is ignored, the estimates are not consistent. When accounting for measurement error in covariates, a trade-off emerges: the more persistent the error is, the less consistent are the estimators that ignore it. On the other side, if the measurement error is modest and it is accounted for in the model, the obtained estimates are more variable than those obtained ignoring it, leading to larger credible intervals and biased inferences. Therefore, excluding those situations in which the presence of measurement error is undoubted, the issue of choosing which variables must be considered affected by measurement error and which are not remains an open problem and the literature, to our knowledge, lacks on this topic. Datta et al. in [7] propose a possibly releated model-based approach to test for the presence of the mixed effects in a mixed linear regression, with particular focus on small area models. We extend their idea in the context of measurement error in covariates, developing a testing procedure for selecting which variables can be considered affected by measurement error.
We propose to model the presence or not of measurement error directly in the model specification. In particular, in order to model the presence or not of the measurement error, we propose to describe the distribution of the covariate with a spike-and-slab prior or with global-local shrinkage priors. We start with regression models and then generalize the proposed approach to include mixed effect models with particular focus on small area models.

Once small area estimates have been obtained, benchmarking techniques are needed that modify model-based estimates to get the same aggregate estimate for the larger geographical area. The approaches proposed in literature mainly rely on posterior ad-hoc adjustements of the model based estimates. We propose to embed the benchmaring constraint directly in the model in order to take into account the constraint also in the parameter estimation. In particular, we propose to use a constrained prior distribution over the small area mean, such as a logistic normal distribution originally proposed in [3] . The main advantage of this prior distribution is that its parameters can be easily interpreted as mean and variance: indeed, we propose to model the small area means according to a logistic normal distribution whose parameters are specified as function of the auxiliary variables. In this way, the resulting small area means are constrained and their interpretation in terms of regression parameters is unchanged. We will compare several constrained distributions in an extensive simulation study and we will compare the properties of the small area means estimates with those obtained with the methods in [6]. We also propose to embed the constraints in the estimation procedure, such as sequential Monte Carlo approach.
We also plan to propose innovative methods for the problem of accounting for matching uncertainty in the record linkage step. This could be done by implementing a Metropolis Hastings algorithm which is able to explore the discrete parameter space given by a Configuration Matrix C, already introduced in [22], whose generic element C(i,j) is either 0 or 1 according whether the i-th record of the first file and the j-th record of the second file correspond to the same statistical unit.
This algorithm should be tailored to the small area problem according to the specific model under study: when considering unit level small area model one should assume that mathches among records are only possible within the same area. In area level model this cannot be assumed and a different models must be considered.

Codice Bando:

602757

Keywords:

name