In the last years, we have experienced a substantial increase of applications in several empirical domains where high-dimensional data were entailed. With high-dimensional data we mean hundreds or thousands of variables for each unit in the observed sample. This has attracted the interest of a growing number of researchers due to the need of data reduction techniques in such applications, and several new approaches have been investigated is these areas.
Biclustering techniques have been proposed in several scientific fields especially to analyze data matrices where the two modes, which are usually units (rows) and variables (columns), can play the same role. In such cases, subsets of units may in fact be homogeneous only under a limited set of conditions (variables) while showing little similarities outside these.
One of the main tasks for modern statistical approaches to biclustering is to develop techniques for handling categorical (nominal and ordinal) and mixed-type data. Such data are encountered very frequently in practice whenever, for example, attitudes, abilities, or opinions are quantities of interest. However, practitioners often apply in such a context techniques developed for continuous data that can often be found to be inappropriate. This can lead to wrong results and, therefore, it would be worth taking the essential characteristics and features of these data into proper account to develop more appropriate techniques.
Our research project aims at defining new biclustering approaches for categorical and mixed-type data; specifically, we will start by extending clustering methods for categorical or mixed-type data in a two-mode setting, through both heuristic and model-based approaches. We will also look at extensions of such techniques to evaluate the impact of time in the model specification when longitudinal data (the units are followed in time) are available.