clustering | Ricerc@Sapienza

Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques.

Fully Dynamic Consistent Facility Location

We consider classic clustering problems in fully dynamic data streams, where data elements can be both inserted and deleted. In this context, several parameters are of importance: (1) the quality of the solution after each insertion or deletion, (2) the time it takes to update the solution, and (3) how different consecutive solutions are. The question of obtaining efficient algorithms in this context for facility location, k-median and k-means has been raised in a recent paper by Hubert-Chan et al.

Polynomial Time Approximation Schemes for All 1-Center Problems on Metric Rational Set Similarities

In this paper, we investigate algorithms for finding centers of a given collection N of sets. In particular, we focus on metric rational set similarities, a broad class of similarity measures including Jaccard and Hamming. A rational set similarity S is called metric if D= 1 - S is a distance function. We study the 1-center problem on these metric spaces. The problem consists of finding a set C that minimizes the maximum distance of C to any set of N. We present a general framework that computes a (1 + ε) approximation for any metric rational set similarity.

An Introduction to Clustering with R

The purpose of this book is to thoroughly prepare the reader for applied research in clustering. Cluster analysis comprises a class of statistical techniques for classifying multivariate data into groups or clusters based on their similar features. Clustering is nowadays widely used in several domains of research, such as social sciences, psychology, and marketing, highlighting its multidisciplinary nature.

Unsupervised Energy Trees: Clustering With Complex and Mixed-Type Variables

In the spirit of the recently developed and successful Object Oriented Data Analysis, we introduce Energy Trees as a model to perform classification and regression using complex and mixed-type covariates. Energy Trees may be seen as a generalization of Conditional Trees, where the testing

A parallel hardware implementation for 2D hierarchical clustering based on fuzzy logic

In this paper we propose a novel hardware implementation for a bidimensional unconstrained hierarchical clustering method, based on fuzzy logic and membership functions. Unlike classical clustering approaches, our work is based on an advanced algorithm that shows an intrinsic parallelism. Such parallelism can be exploited to design an efficient hardware implementation suitable for low-resources, low-power and highcomputational demanding applications like smart-sensors and IoT devices. We validated our design by an extensive simulation campaign on well known 2D clustering datasets.

Cognitive analytics management of the customer lifetime value: an artificial neural network approach

Purpose: The purpose of this study is to show that the use of CAM (cognitive analytics management) methodology is a valid tool to describe new technology implementations for businesses. Design/methodology/approach: Starting from a dataset of recipes, we were able to describe consumers through a variant of the RFM (recency, frequency and monetary value) model. It has been possible to categorize the customers into clusters and to measure their profitability thanks to the customer lifetime value (CLV).

Automatic classification of herbal substances enhanced with an entropy criterion

This paper presents a novel automatic pattern recognition system for the classification of herbal substances, which comprises the analysis of chemical data obtained from three analytical techniques such as Thin Layer Chromatography (TLC), Gas Chromatography (GC) and Ultraviolet Spectrometry (UV), composed of the following stages. First, a preprocessing stage takes place that ranges from the TLC plate image conversion into a spectrum to the normalization and alignment of spectral data for all techniques.

Analysing the diffusion of the ideas and knowledge on economic open problems on female entrepreneur in US over time: the case of Wikipedia (Year 2015–2017)

An important problem on the entrepreneurship field is the precise comprehension of the diffusion dynamics of the ideas and knowledge. In fact ideas can have an important impact on the business and on the managerial decisions. So in this sense the analysis of the evolution of the ideas need to be carefully considered and evaluated. In this work we will propose a time-series cluster analysis of pageviews data of selected topics on Gender in Wikipedia. Results give relevant insights on the evolution of relevant topics as the gender pay and role at work over time.

GINDCLUS: Generalized INDCLUS with External Information

A Generalized INDCLUS model, termed GINDCLUS, is presented for clustering three-way two-mode proximity data. In order to account for the heterogeneity of the data, both a partition of the subjects into homogeneous classes and a covering of the objects into groups are simultaneously determined.