Huge-scalable Non-Blocking Share-everything PDES Platform on NUMA

Inviato da Anonimo (non verificato) il Mar, 19/04/2022 - 07:14

Anno:

2018

Nome e qualifica del proponente del progetto:

sb_p_1206358

Abstract:

Modern advancements in computing architectures have been accompanied by new emergent paradigms to run Parallel Discrete Event Simulation (PDES) models efficiently. Indeed, many new paradigms to effectively use the available underlying hardware have been proposed in the literature. Among these, the Share-Everything paradigm, where the simulation state and the pending event queue are fully-shared among all the worker threads, tackles massively-parallel shared-memory machines, in order to support speculative simulation by taking into account the limits and benefits related to this family of architectures. Previous results have shown how this paradigm outperforms traditional speculative strategies (such as data-separated Time Warp systems) whenever the granularity of executed events is small. In particular, this paradigm has shown to benefit from the use of non-blocking algorithms. However, recent works have shown that as soon as the event¿s granularity becomes too large and/or the level of parallelism is increased, this paradigm is not more able to efficiently scale, a problem linked to the latency in memory accesses and the way in which the processed unsafe events are managed.

In this research project, we aim to design and implement a lightweight huge-scalable share-everything speculative PDES engine able to progress in the simulation independently by the amount of unsafe processed events. Among all, in order to purse scalability, particular attention will be placed facing NUMA architecture, then reducing latencies in memory management.

During this project, the non-blocking property will be seen as a vertical attribute of the engine. Moreover, a strong focus will be posed on reducing (both explicit and implicit) synchronization, in order to increase the efficient use of the underlying hardware.

ERC:

SH1_11

PE6_12

PE6_2

Innovatività:

Almost all parallel discrete event simulation research is focused on the development of platforms based on a distributed execution model. Only few share-everything platforms have been proposed, none of them able to be efficient in a real environment. This project will put in place a final change of direction sliding the execution model from the distributed to the share-everything one, so following an almost unexplored direction. This project will be a source of stimuli for the whole research community, opening the way for numerous new solutions. In fact, although this project contrasts with what is the historical trend, it does not exclude the possibility to combine its outcomes with the distributed approach, with the goal to devise efficient distributed platform based on share-everything multiple kernels.

Moreover, as for the works in [1] and [2], non-blocking solutions designed for specific simulation problems will be easily exportable and adaptable to more general scenarios.

This project provides solutions for a set of problems related to speculative share-everything PDES systems, which were not tackled by the literature.
They are:
- the definition of non-blocking algorithms for managing a fully-shared pending-event set that contains both schedule-committed events and non-committed ones, which might need to be (logically) canceled, still guaranteeing constant time access for fetch operations;
- the definition of non-blocking algorithms for dispatching the events to be processed across threads in such a way that threads never collide on a same simulation object;
- the definition of NUMA aware policies (e.g. gang scheduling on NUMA nodes) able to deliver work (namely events to be processed) to worker threads characterized by low latency accessing the corresponding simulation object, namely the one on the same NUMA node, while keeping low the rollback probability.

As a matter of fact, the result of this project will cope with hard-workload scenarios, where there are (sudden) skews in the distribution of the events across simulation objects along virtual time, on top of NUMA many-core machines. These skews possibly create relatively short bursts of events to be processed at a subset of the objects, while other objects have no (or few) events to be processed along that same virtual time window. In these scenarios traditional PDES-oriented load balancing approaches, based on medium-term binding between objects and threads, have scarce capability to react to the sudden unbalance that may materialize, which can lead to an increase of the likelihood of wasted computation in case of speculative processing.

As a completely new point of view, this project considers events as fully-shared workload units, thus being able to concentrate the computing power, say threads, on any burst of events that materializes among subsets of objects-any thread can in fact take care of processing whatever event in these bursts, thus contributing to promptly advance the currently hot portions of the simulation model. Furthermore, the speculative processing capabilities considered in this project enable threads to process these bursts with no blocking phase along virtual time while reducing latencies accessing the relative simulation state portion.

On the downside, the price our share-everything PDES system pays stands in the impossibility to exploit large or extreme scale clusters of distributed memory resources. However, the perspective of our design is strengthened by the always rising trend towards larger numbers CPU-cores on a same shared-memory chipset, motivated by the already reached power wall affecting the growth of the computing speed of individual CPU-cores. On the other hand, future PDES architectures could be envisaged where on each individual shared-memory machine an instance of our share-everything PDES platform could be run, and the instances could, in their turn, be clustered via additional coordination mechanisms on a distributed memory platform. At the same time, increasing the efficiency of a share-everything platform, allows to reduce the energy footprint of high-scale computing systems, making parallel simulation as well energy-efficient.

The result of this project will be released as open source and an extensive comparative experimental phase will be carried out in order to support with real data the results achieved.

References
[1] R. Marotta, M. Ianni, A. Pellegrini, and F. Quaglia
"A lockfree O(1) event pool and its application to share-everything pdes platforms"
in DS-RT, 2016.
[2] R. Marotta, M. Ianni, A. Pellegrini, and F. Quaglia
"A conflict-resilient lock-free calendar queue for scalable share-everything PDES platforms"
in PADS, 2017.

Codice Bando:

1206358

Keywords:

INGEGNERIA INFORMATICA

SIMULAZIONE NUMERICA

EFFICIENZA ENERGETICA

CALCOLO PARALLELO E DISTRIBUITO

SISTEMI PARALLELI E DISTRIBUITI