computer vision

OmnAI Lab

OmnAI Lab

The OmnAI Lab is a research laboratory dedicated to studying computational models of learning and inference, with the goal of understanding and developing forms of artificial intelligence. Its research activities focus primarily on Trustworthy and Robust AI, with applications in computer vision, while also drawing inspiration from related fields such as computer graphics and natural language processing.

Perception and Intelligence Lab (PINlab)

Perception and Intelligence Lab (PINlab)

The Perception and Intelligence Lab (PINLab, www.pinlab.org) conducts fundamental research and innovation transfer in computer vision and machine learning, including applications to language models and robotics. Specific research interests include distributed and multi-agent intelligent systems, human-robot interaction, embodied perception, multi-modal learning involving hierarchical knowledge and uncertainty estimation, foundation and world models, general intelligence and reasoning, interpretable and safe AI.

EViLBERT: Learning Task-Agnostic Multimodal Sense Embeddings

The problem of grounding language in vision is increasingly attracting scholarly efforts. As of now, however, most of the approaches have been limited to word embeddings, which are not capable of handling polysemous words. This is mainly due to the limited coverage of the available semantically-annotated datasets, hence forcing research to rely on alternative technologies (i.e., image search engines). To address this issue, we introduce EViLBERT, an approach which is able to perform image classification over an open set of concepts, both concrete and non-concrete.

Fatality Killed the Cat or: BabelPic, a Multimodal Dataset for Non-Concrete Concepts

Thanks to the wealth of high-quality annotated images available in popular repositories such as ImageNet, multimodal language-vision research is in full bloom. However, events, feelings and many other kinds of concepts which can be visually grounded are not well represented in current datasets. Nevertheless, we would expect a wide-coverage language understanding system to be able to classify images depicting recess and remorse, not just cats, dogs and bridges.

SF-UDA-3D: Source-Free Unsupervised Domain Adaptation for LiDAR-Based 3D Object Detection

3D object detectors based only on LiDAR point clouds hold the state-of-the-art on modern street-view benchmarks. However, LiDAR-based detectors poorly generalize across domains due to domain shift. In the case of LiDAR, in fact, domain shift is not only due to changes in the environment and in the object appearances, as for visual data from RGB cameras, but is also related to the geometry of the point clouds (e.g., point density variations).

Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images

Beyond sample curation and basic pathologic characterization, the digitized H&E-stained images
of TCGA samples remain underutilized. To highlight this resource, we present mappings of tumorinfiltrating lymphocytes (TILs) based on H&E images from 13 TCGA tumor types. These TIL
maps are derived through computational staining using a convolutional neural network trained to
classify patches of images. Affinity propagation revealed local spatial structure in TIL patterns and

Transformer Networks for Trajectory Forecasting

Most recent successes on forecasting the people motion are based on LSTM models and all most recent progress has been achieved by modelling the social interaction among people and the people interaction with the scene. We question the use of the LSTM models and propose the novel use of Transformer Networks for trajectory forecasting. This is a fundamental switch from the sequential step-by-step processing of LSTMs to the only-attention-based memory mechanisms of Transformers.

Integration of close-range underwater photogrammetry with inspection and mesh processing software: a novel approach for quantifying ecological dynamics of temperate biogenic reefs

Characterizing and monitoring changes in biogenic 3-dimensional (3D) structures at multiple scales over time is challenging within the practical constraints of conventional ecological tools. Therefore, we developed a structure-from-motion (SfM)-based photogrammetry method, coupled with inspection and mesh processing software, to estimate important ecological parameters of underwater worm colonies (hummocks) constructed by the sabellariid polychaete Sabellaria alveolata, using non-destructive, 3D modeling and mesh analysis.

'Seeing is believing': pedestrian trajectory forecasting using visual frustum of attention

In this paper we show the importance of the head pose estimation in the task of trajectory forecasting. This cue, when produced by an oracle and injected in a novel socially-based energy minimization approach, allows to get state-of-the-art performances on four different forecasting benchmarks, without relying on additional information such as expected destination and desired speed, which are supposed to be know beforehand for most of the current forecasting techniques.

MX-LSTM: mixing tracklets and vislets to jointly forecast trajectories and head poses

Recent approaches on trajectory forecasting use tracklets to predict the future positions of pedestrians exploiting Long Short Term Memory (LSTM) architectures. This paper shows that adding vislets, that is, short sequences of head pose estimations, allows to increase significantly the trajectory forecasting performance. We then propose to use vislets in a novel framework called MX-LSTM, capturing the interplay between tracklets and vislets thanks to a joint unconstrained optimization of full covariance matrices during the LSTM backpropagation.

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma