data mining

Open data and energy analytics

This pioneering Special Issue aims at providing the state-of-the-art on open energy data analytics; its availability in the different contexts, i.e., country peculiarities; and at different scales, i.e., building, district, and regional for data-aware planning and policy-making. Ten high-quality papers were published after a demanding peer review process and are commented on in this Editorial.

The impact of the impact of meta-data mining from the SoReCom “A.S. de Rosa” @-Library

The objective of this chapter is to address the following question: what is the value of the scientific networking, training and documentation activities in the new academic scenario dominated by the bibliometric assessment culture and by the impact of the technology to the science production and sharing (data-driven science, big data, open data, open access, etc.).

Motif counting beyond five nodes

Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees.

On the power laws of language: word frequency distributions

About eight decades ago, Zipf postulated that the word frequency distribution of languages is a power law, i.e., it is a straight line on a log-log plot. Over the years, this phenomenon has been documented and studied extensively. For many corpora, however, the empirical distribution barely resembles a power law: when plotted on a loglog scale, the distribution is concave and appears to be composed of two differently sloped straight lines joined by a smooth curve. A simple generative model is proposed to capture this phenomenon.

Facing Big Data by an agent-based multimodal evolutionary approach to classification

Multi-agent systems recently gained a lot of attention for solving machine learning and data mining problems. Furthermore, their peculiar divide-and-conquer approach is appealing when large datasets have to be analyzed. In this paper, we propose a multi-agent classification system able to tackle large datasets where each agent independently explores a random small portion of the overall dataset, searching for meaningful clusters in proper subspaces where they are well-formed (i.e., compact and populated).

A learning intelligent system for classification and characterization of localized faults in Smart Grids

The worldwide power grid can be thought as a System of Systems deeply embedded in a time-varying, non-deterministic and stochastic environment. The availability of ubiquitous and pervasive technology about heterogeneous data gathering and information processing in the Smart Grids allows new methodologies to face the challenging task of fault detection and modeling. In this study, a fault recognition system for Medium Voltage feeders operational in the power grid in Rome, Italy, is presented.

An agent-based algorithm exploiting multiple local dissimilarities for clusters mining and knowledge discovery

We propose a multi-agent algorithm able to automatically discover relevant regularities in a given dataset, determining at the same time the set of con?gurations of the adopted parametric dissimilarity measure that yield compact and separated clusters. Each agent operates independently by performing a Markovian random walk on a weighted graph representation of the input dataset. Such a weighted graph representation is induced by a speci?c parameter con?guration of the dissimilarity measure adopted by an agent for the search.

On Mining IoT Data for Evaluating the Operation of Public Educational Buildings

Public educational systems operate thousands of buildings with vastly different characteristics in terms of size, age, location, construction, thermal behavior and user communities. Their strategic planning and sustainable operation is an extremely complex and requires quantitative evidence on the performance of buildings such as the interaction of indoor-outdoor environment. Internet of Things (IoT) deployments can provide the necessary data to evaluate, redesign and eventually improve the organizational and managerial measures.

Big Data Management System - BDMS

Italiano

Il Big Data Management System è un sistema hardware integrato le cui componenti principali sono rappresentate da server e storage altamente performanti.
Si tratta di una potente attrezzatura hardware adatta a gestire una vasta gamma di materiali/flussi di informazioni: 1. le imponenti basi di dati amministrativi
relazionali; 2. le masse di dati numerici e testuali provenienti dalla Rete, riferibili alle diverse forme di comunicazione online su più

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma