machine learning

EnCoD: Distinguishing Compressed and Encrypted File Fragments

Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g. office documents, media files, etc.) are highly compressed for storage and transmission efficiency. Compression algorithms also output high-entropy data, thus reducing the accuracy of entropy-based encryption detectors.

A Survey of Machine Learning approaches for Student Dropout Prediction in Online Courses

The recent diffusion of online education (both MOOCs and e-courses) has led to an increased economic and scientific interest in e-learning environments. As widely documented, online students have a much higher chance of dropping out than those attending conventional classrooms. It is of paramount interest for institutions, students, and faculty members to find more efficient methodologies to mitigate withdrawals. Following the rise of attention on the Student Dropout Prediction (SDP) problem, the literature has witnessed a significant increase in contributions to this subject.

Mitch: A Machine Learning Approach to the Black-Box Detection of CSRF Vulnerabilities

Cross-Site Request Forgery (CSRF) is one of the oldest and simplest attacks on the Web, yet it is still effective on many websites and it can lead to severe consequences, such as economic losses and account takeovers. Unfortunately, tools and techniques proposed so far to identify CSRF vulnerabilities either need manual reviewing by human experts or assume the availability of the source code of the web application. In this paper we present Mitch, the first machine learning solution for the black-box detection of CSRF vulnerabilities.

SF-UDA-3D: Source-Free Unsupervised Domain Adaptation for LiDAR-Based 3D Object Detection

3D object detectors based only on LiDAR point clouds hold the state-of-the-art on modern street-view benchmarks. However, LiDAR-based detectors poorly generalize across domains due to domain shift. In the case of LiDAR, in fact, domain shift is not only due to changes in the environment and in the object appearances, as for visual data from RGB cameras, but is also related to the geometry of the point clouds (e.g., point density variations).

Digital biomarker-based individualized prognosis for people at risk of dementia

Background: Research investigating treatments and interventions for cognitive decline fail due to difficulties in accurately recognizing behavioral signatures in the presymptomatic stages of the disease. For this validation study, we took our previously constructed digital biomarker-based prognostic models and focused on generalizability and robustness of the models.

Molecular design aided by random forests and synthesis of potent trypanocidal agents as cruzain inhibitors for Chagas disease treatment

Cruzain is an established target for the identification of novel trypanocidal agents, but how good are in vitro/in vivo correlations? This work describes the development of a random forests model for the prediction of the bioavailability of cruzain inhibitors that are Trypanosoma cruzi killers. Some common properties that characterize drug-likeness are poorly represented in many established cruzain inhibitors. This correlates with the evidence that many high-affinity cruzain inhibitors are not trypanocidal agents against T. cruzi.

Claim watching and individual claims reserving using classification and regression trees

We present an approach to individual claims reserving and claim watching in general insurance based on classification and regression trees (CART). We propose a compound model consisting of a frequency section, for the prediction of events concerning reported claims, and a severity section, for the prediction of paid and reserved amounts. The formal structure of the model is based on a set of probabilistic assumptions which allow the provision of sound statistical meaning to the results provided by the CART algorithms.

Multiresolution topological data analysis for robust activity tracking

Multidimensional sensors represent an increasingly popular, yet challenging data source in modern statistics. Using tools from the emerging branch of Topological Data Analysis (TDA), we address two issues frequently encountered when analysing sensor data, namely their (often) high dimension and their sensibility to the reference system. We show how topological invariants provide a tool for detecting change--points which is robust with respect to both the time resolution we consider and the sensor placement.

Machine learning and network medicine: a novel approach for precision medicine and personalized therapy in cardiomyopathies

: The early identification of pathogenic mechanisms is essential to predict the incidence and progression of cardiomyopathies and to plan appropriate preventive interventions. Noninvasive cardiac imaging such as cardiac computed tomography, cardiac magnetic resonance, and nuclear imaging plays an important role in diagnosis and management of cardiomyopathies and provides useful prognostic information.Most molecular factors exert their functions by interacting with other cellular components, thus many diseases reflect perturbations of intracellular networks.

Coronavirus disease (COVID-19): a machine learning bibliometric analysis

Background/Aim: To evaluate the research trends in coronavirus disease (COVID-19). Materials and Methods: A bibliometric analysis was performed using a machine learning bibliometric methodology. Information regarding publication outputs, countries, institutions, journals, keywords, funding and citation counts was retrieved from Scopus database. Results: A total of 1883 eligible papers were returned. An exponential increase in the COVID-19 publications occurred in the last months.

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma