Estimation of Quality Scores from Subjective Tests - beyond Subjects' MOS
Subjective tests for the assessment of the quality of experience (QoE) are typically run with a pool of subjects providing their opinion score using a 5-level scale. The subjects? Mean Opinion Score (MOS) is generally assumed as the best estimation of the average score in the target population. Indeed, for a large enough sample we can assume that the mean of the variations across the subjects approaches zero, but this is not the case for the limited number of subjects typically considered in subjective tests. In this paper we propose a model for the estimation of the population average QoE. We apply such model to a dataset composed of the individual scores assigned by 25 subjects to a set of gaming videos evaluated under different resolutions and compression rates. The model recognizes the ordinal multinomial nature of the data and allows for correlation between scores of the same subject on different data. The resulting estimated average QoE is shown to follow more credible patterns than the MOS, in particular with respect to improved compression rates, for which model estimates present a more coherent behaviour. In order to favour reproducibility and application for different datasets, the software that implements the model is also made publicly available.