Scalable program analysis for software performance and reliability
Componente | Categoria |
---|---|
Adolfo Piperno | Componenti il gruppo di ricerca |
Componente | Qualifica | Struttura | Categoria |
---|---|---|---|
Emilio Coppa | Assegnista | DIAG | Altro personale Sapienza o esterni |
Daniele Cono D'Elia | Assegnista | DIAG | Altro personale Sapienza o esterni |
Since the early years of the new millennium, there has been a proliferation of programming models and software frameworks for large-scale data analysis. The ease of programming and the capability to express ample sets of algorithms have been the primary concerns in high-level big-data systems, that largely succeeded at simplifying the development and execution of applications on massively distributed platforms. However, efficient provisioning and fine-tuning of computations at large scale still remains a non-trivial undertaking, even for experienced programmers, as interesting and unique performance issues appear as the volume of data and the degree of parallelization increase. At the core of the performance engineering challenge lies a dichotomy between high-level programming abstractions exposed to developers and complexity of the
underlying hardware/software stack.
In this project, we propose to design and implement a software infrastructure to address these issues, producing cutting-edge methodologies and toolkits for identifying and optimizing crucial performance and reliability features of big data applications: we will devise methodologies and construct software tools to help developers understand the multiple, interdependent effects that their algorithmic, programming, and deployment choices have on an application's performance.
This goal will be achieved through the development of program analysis and profile-driven optimization techniques, exploiting information collected from the application, its workloads, and the underlying runtime system at different levels of granularity in big-data software stacks. By combining static and dynamic analyses and by leveraging novel data streaming methods and compact data sketches, we plan to manage huge volumes of profile data with different time/space/accuracy tradeoffs, enabling analyses and optimizations that are currently infeasible.