On-line Hardware-assisted profling for Low-impact computing systems monitoring

Inviato da Anonimo (non verificato) il Mar, 19/04/2022 - 10:30

Nome e qualifica del proponente del progetto:

sb_p_1781174

Anno:

2019

Abstract:

Modern systems are combinations of complex parallel workloads running on massively parallel heterogeneous processors equipped multiple memory levels. In such environments, several running programs result to hinder each other, exacerbating effects of nondeterminism and most of the time suppressing the operating system attempts of optimization. Nowadays, understanding software structure and thoroughly analysing its behaviour at runtime represent a fundamental, yet challenging, tasks to address efficiency problems or security issues. Pro¿lers are special tools designed to carry out these investigations. A technique they rely on is instrumentation, either software- or hardware-based. While the former may operate at several abstraction levels (e.g. source code or compiled object), the latter stemmed from the introduction of Performance Monitor Units (PMU), on-chip hardware supports available on most of the current CPUs. PMUs directly operate at the hardware level, observing effects of the executed code on the hardware. However, regardless of the instrumentation technique, pro¿ling is a critical task which may either tamper with the original execution of the software or introduce self-generated noise into the gathered information. In this research project, we aim to devise novel methodologies and implement a solution on top of those for enabling online profiling support able to preserve the goodness of the gathered data without contaminating it with the effects of its execution. This result will be pursed taking advantage of the PMU and their ability to grab data producing low overhead and impact on the system. During this project, the hardware support will be a vertical attribute of the engine. Moreover, a strong focus will be posted on finding a correlation between hardware, software and high-level metrics, building an associative table for choosing the metric that should be considered to study a specific problem maximizing the efficiency of the solution.

ERC:

PE6_2

PE6_3

Componenti gruppo di ricerca:

sb_cp_is_2273810

Innovatività:

Most of the profiling solution available nowadays provide a rich set of metrics to explore the evolution of the software in combination with sophisticated analysis tools, providing the ability of study a program focusing on one or more specific domain (e.g. energy efficiency, security audit, code optimization). However, most of the time they provide offline analysis, making the corrective actions possible only in a pre-execution phase, or any way to drive the execution by consulting information obtained in previous runs. Although, this approach may be enough in a lot of contexts, when we want to provide the system with an automated way to self-adjust its parameters to accomplish some conditions, the offline analysis may be a wrong choice. The operating system, as well as programs, immersed in a general and variable workload, need up-to-date data trace and runtime action to be properly managed considering the entire environment.
On the other side, relying on the user guide to manage the profiling session may turn out being unsuitable under some circumstances. In fact, the knowledge of the underlying architecture is required in order to know what and how to ask for the system, but that may be not enough because the manner we query the system may produce very different results according to current workload.
For this reason, profiler must be the sole executive director of the profiling action demanded by the user, exposing to her/him just an interface to receive a profiling session setup.

This project provides solutions for a synergetic resolution of a set of problems related to continuous profiling and runtime analysis which were not tackled by the literature.
They are:
¿ An OS integrated solution able to access without extra penalties high-privilege resources and be aware of the entire workload present in the system. The analysis activity is part of the profiler itself, and it performed taking care of the impact it produces on the system, lowering down the side-effects of the support.
¿ Management of inner components according to the overall state of the system. This implies the orchestration of profiling, analysis and potential corrective actions as well as the multiplexing of the limited hardware resources to improve the efficiency of the monitoring phase. This avoids the usage of a third-party library that provides a more portable interface but adds an extra abstraction layer which makes harder to be directly supervised.

A completely new point of view, methodology introduced in this project considers the relation between high-level (e.g. memory footprint) and low-level (hardware or software) metrics. As a matter of fact, relieving the user from the technical knowledge providing them with an abstraction layer that answers for what they are looking for represents a key point. Finding a correlation between architectural events (i.e. hardware events generate inside CPU by instruction execution) is not always an easy task and require a deep knowledge of both software and specific architecture model. Deferring this task directly to the profiling support helps to conduct analysis without worrying about what ask for the system.

On the downside, the high reliance on hardware support as well as heuristic rules based on specific computing architecture makes the solution hard to be portable also requiring an effort approximately proportional to the number of models we want to support.
On the other hand, handling intensive workloads is impossible without affecting the execution with the profiler activity, which, even if low, requires for system resources interfering with the normal execution of the application.
To minimize system impact during such scenarios the profiler will be able to use part of the gathered data to manage its execution at runtime, lowering the overhead down, especially when complex workloads dry out most of the system availability.

The result of this project will be released as open source and an extensive comparative experimental phase will be carried out in order to support with real data the result achieved.

Codice Bando:

1781174

Keywords:

OTTIMIZZAZIONE

CALCOLO PARALLELO E DISTRIBUITO

SISTEMI OPERATIVI