Large-scale graph mining and clustering
The ability to mine data to extract useful knowledge has become an important challenge and it is widely studied in the data mining and machine learning communities.
This proposal focuses on two basic problems: clustering and graph mining. If the data to be mined represents a set of independent entities and their attributes clustering is the basic problem of classification and data analysis. In particular, given a metric that measures similarity and dissimilarity of items the main goal of clustering is to categorize a set of data items into clusters such that items are grouped in the same cluster when they are similar and items that are in different clusters must be different as much as possible.
However, in many cases, there is interesting knowledge to be mined from the relationships between entities (e.g. friendship relations among members of a social network). The graph representation, that is, a collection of nodes and links between nodes, easily represents entities, their attributes, and their relationships to other entities. Graph mining is the extraction of novel and useful knowledge from a graph representation of data.
Clustering and graph mining have been extensively studied by researchers belonging to different areas. As a result there are many results but many issues are still open. Main open issues are the ability of dealing with big data and with sophisticated models. We will address these problems with a special focus on social networks and bioinformatics networks.
The group has an excellent record in the analysis and design of combinatorial algorithms and especially approximation algorithms and on-line algorithms. Many of the problems that we plan to study are computationally hard and therefore intractable when the input size is large. Also in many cases the input is presented on-line and irrevocable decisions should be taken immediately. For these reasons techniques of approximation algorithms and on-line algorithms are deemed.