The project focuses on the extension of parametric modeling for ranked data within the Bayesian framework, with a special attention on methodological and computational innovations for an efficient implementation of ranking data analysis. The ranking literature offers numerous parametric distributions but, despite the large availability of options, models in their basic form are often unable to embody the appropriate flexibility to represent sample heterogeneity. Consequently, it is natural to extend them to the mixture context for capturing possible patterns of rankers with similar preferences. Our interest concerns the finite mixture approach from the Bayesian inferential perspective, by developing a generalization of the popular Plackett-Luce model (PL) as mixture component. The PL assumes that the ranking process is performed sequentially by assigning the ranks from the top to the bottom one (forward order). A recent extension relaxed this assumption with the addition of the reference order parameter, yielding the novel Extended PL (EPL). A first contribution could be the investigation of a restricted version of the EPL with order constraints on the reference order that reflect a sensible and interpretable rank assignment process. The parameter restrictions could be fruitfully combined with the data augmentation strategy for the mixture setting and the existence of a conjugate prior to ease the construction of an MCMC algorithm and hence the Bayesian estimation of the new mixture model. From a computational perspective, ranking data analysis can be challenging due to the special structure of observations taking values in the set of permutations. This typically requires the development of specialised software which is not available for a wider use. The project additionally aims at building an R package to promote the use of sophisticated ranking models in practice. The usefulness of the proposals will be widely investigated with applications to real experiments.
The main contributions of our research project concern the extension of parametric models for partially ranked data, with an efficient account of the computational complexity related to their application. The methodological proposals aim at introducing some relevant novelties regarding the probabilistic modelling of choice behavior and preferences, by starting from the well-established PL class and recent generalisations thereof. Furthermore, we would like to provide a concrete contribution on the computational level to efficiently manage the implementation of a model-based ranking analysis.
We will add flexibility to the state-of-art of ranking modelling in several directions:
i) introduction of constraints on the reference order parameter in the EPL formulation, specifying a meaningful process for the sequential assignment of the ranks;
ii) generalisation of the restricted EPL into the finite mixture context;
iii) exploration of the theoretical properties of the new parametric model class as well as of the issues due to the presence of partial observations and a mixed-type parameter space;
iv) development of effective inferential procedures within the Bayesian domain for a full account of estimation uncertainty.
The idea (i) is motivated by the possible insights on the sequential mechanism of formation of preferences, in particular whether the choice privileges a more or less naturally ordered assignment of the most extreme ranks, and can be formalized with the addition of monotonicity restrictions on the reference order. Point (ii) further extends the constrained EPL proposal by relaxing the homogeneous population assumption, in order to improve the overall data description, especially in large sample situations. Inference on the new model class within the Bayesian framework (iv) would require a suitable handling of data censoring and of the mixed nature of the EPL parameters (iii). Point (iii) is expected to bring on some difficulties in implementing a well-behaved MCMC approximation, to be solved with the need of an ad hoc hybrid algorithm (for example, a Metropolis Hastings-within-Gibbs sampling) based on an appropriate tuning and combination of the underlying transition kernels. This would allow for an effective evaluation of the uncertainty on the EPL parameters, which has not been addressed earlier in the Bayesian literature.
When approaching a ranking data analysis, several subtle issues may arise, mainly due to the peculiar structure of ranked sequences as multivariate ordinal data. Ranking data take values in the discrete set of permutations, whose size $K!$ grows-up rapidly. In this perspective, some care is necessary for the possible occurrence of sparse data situations and the need of a manageable exploration of the ranking space, especially when the model is indexed by a permutation parameter such as the reference order. Moreover, the presence of partial observations adds further complications. All these issues lead to computationally demanding methods and to the need of developing specialized softwares. On the other hand, this has been traditionally an obstacle for a wider use of more sophisticated models. The existing free open-source softwares, such as R libraries, cover a wide range of the parametric ranking, accounting also for incomplete observations, as well as for the generalization to the mixture framework. Nevertheless, with only very few exceptions, all the available packages address inference only from the frequentist point of view and typically lack computational efficiency, making sometimes prohibitive to perform a partial ranking analysis.
From a computational point of view, we propose to address the aforementioned practical issues with the development of a novel R package with the following features:
(a) first R library devoted to the partial ranking analysis mainly from a Bayesian inferential perspective;
(b) availability of a routines to perform a comprehensive Bayesian mixture analysis;
(c) hybrid source code combining R and C++ to exploit the advantages of both programming languages;
(d) possibility of parallel execution for mixture with an alternative number of components;
Regarding points (a) and (b), the new package could contribute to fill the gap concerning Bayesian estimation of ranking models in R, by focusing on the EPL finite mixture as sampling distribution. It would not be limited to inferential techniques, by providing users with a toolkit that covers each step of the mixture approach (for instance, selection of the optimal number of components and goodness-of-fit assessment). To avoid prohibitive execution time, it could takes advantage of a hybrid code linking the flexibility of the R environment with the speed of compiled C++ code (c), as well as the parallelization option for the simultaneously analysis of mixtures with a different number of components (d).