svm | Ricerc@Sapienza

Bot and gender detection of twitter accounts using distortion and LSA notebook for PAN at CLEF 2019

In this work, we present our approach for the Author Profiling task of PAN 2019. The task is divided into two sub-problems, bot, and gender detection, for two different languages: English and Spanish. For each instance of the problem and each language, we address the problem differently. We use an ensemble architecture to solve the Bot Detection for accounts that write in English and a single SVM for those who write in Spanish. For the Gender detection we use a single SVM architecture for both the languages, but we pre-process the tweets in a different way.

Cross-domain authorship attribution combining instance-based and profile-based features notebook for PAN at CLEF 2019

Being able to identify the author of an unknown text is crucial. Although it is a well-studied field, it is still an open problem, since a standard approach has yet to be found. In this notebook, we propose our model for the Authorship Attribution task of PAN 2019, that focuses on cross-domain setting covering 4 different languages: French, Italian, English, and Spanish. We use n-grams of characters, words, stemmed words, and distorted text. Our model has an SVM for each feature and an ensemble architecture. Our final results outperform the baseline given by PAN in almost every problem.

A multivariate statistical approach for the estimation of the ethnic origin of unknown genetic profiles in forensic genetics

DNA typing and genetic profile data interpretation are among the most relevant topics in forensic science; among other applications, genetic profile’s capability to distinguish biogeographic information about population groups, subgroups and affiliations have been largely explored in the last decade. In fact, for investigative and intelligence purposes, it is extremely useful to identify subjects and estimate their biogeographic origins by examining the recovered DNA profiles from evidence on a crime scene.

A cluster-based dissimilarity learning approach for localized fault classification in Smart Grids

Modeling and recognizing faults and outages in a real-world power grid is a challenging task, in line with the modern concept of Smart Grids. The availability of Smart Sensors and data networks allows to “x-ray scan” the power grid states. The present paper deals with a recognition system of fault states described by heterogeneous information in the real-world power grid managed by the ACEA company in Italy.