We propose to acquire a state-of-the-art HPC cluster dedicated to material-design activities of the Condensed Matter Theory sector of the Physics Department.
Thanks to astonishing progress in computational techniques, theoretical condensed matter physics is evolving from material understanding to actual material design. Problems once considered unsolvable, like crystal structure prediction and reaction dynamics, can now be tackled on a computer, providing an invaluable tool to guide experiments. Our project takes full advantage of these techniques, to address different problems (superconductivity, phase-change materials, self-assembly of nano-particles, bioinformatics), reflecting the variety of interests in our sector.
A dependable HPC infrastructure is vital to stay competitive in our field. Many essential activities, i.e. student training, code development, benchmarks, urgent runs, cannot rely exclusively on grants, but require local, free resources. This is recognized by most european universities, where start-up packages for new faculties include access to computational resources.
Unfortunately, this is not the case for Sapienza. In the last five years, following a conspicuous generational turnover, our sector hired eight faculties, coming mostly from foreign institutions, who are internationally recognized experts in the development and application of advanced numerical methods to CMP problems. Their skills nicely complement the existing expertise of a very vital area of our department. However, most of these researchers do not have access to local resources provided by Sapienza or INFN HPC services, and the only cluster in the Physics Department, financed by a previous Ateneo call, is heavily geared towards machine learning applications.
Our configuration is tailored to the needs of our sector and guided by four principles: 1) Maximum CPU power; 2) Maximal flexibility; 3) Expandability; 4) Integrability into Sapienzas future cloud computing initiative.
The cluster we plan to acquire is a state-of-the-art architecture specifically designed to meet the needs of our sector, and to be easily expandable and integrable into a possible future Sapienza¿s cloud computing environment. Typical condensed matter applications make massive use of CPU power and often require parallelization over many cores.
The current setup has been specifically designed to maximize the CPU power. It comprises three high-end servers, equipped with 128 GB of ECC RAM, 2 CPUs with 48(96) physical (logical) cores each, 32 TB of hard-disk storage and a RAID system for efficient data backup, connected by an Infiniband switch. In total, we will thus have access to 288 (576) physical (logical) cores and 96 TB of disk space.
For reference, the last iteration of the main CINECA CPU-based cluster, Marconi-A3, features half of the CPU cores per node (2x24) of an older generation.
With the installation of a batch queuing system, we will have at our disposal a very flexible architecture, able to handle a variety of different applications: crystal structure prediction algorithms are based on numerous small (few CPU, short) DFT relaxations, while the large supercells needed to describe interfaces and/or to diagonalize effective force constant matrices for SSCHA applications typically require large jobs on many CPUs. Also classical Montecarlo and Molecular dynamics algorithms can be very efficiently parallelized over CPus.
Thanks also to the use and development of numerical techniques, our sector has gained over the years a significant international visibility and recognition, giving an important contribution to the excellent ANVUR evaluation of our Department. In highly-competitive fields, stable computational resources can often have a dramatic impact in terms of priority and international visibility. This is recognized by most institutions, which offer computational resources as part of the startup package of a new hire in computational physics.*
However, this is currently not the case at Sapienza; at present, most of us do not have access to any local computing resources provided by Sapienza** or other italian institutions like INFN, to which none of us is affiliated. Instead, we mostly rely on resources provided on a limited time frame by our former foreign institutions, or on national and international HPC clusters through competitive calls. However, many essential activities, such as student training, code development, benchmarks, urgent runs, cannot rely on grant access to external resources but require access to local, immediately-available resources.
Experience on use of HPC clusters is also becoming an increasingly important requirement for access to the job market, in Research and Development, but also in other sectors, such as financial services, market analysis, consulting, which are traditional employment sectors for our students. We have restructured the condensed matter curriculum to include advanced computational courses in soft and hard condensed matter. Access to a dedicated cluster for master and doctoral students will give them the possibility to apply and test the acquired knowledge in a suitable environment which is currently missing.
Sapienza has recently recognized the importance of HPC computing by launching an inter-departmental initiative for cloud computing, announced at the Ateneo Conference of the 17th of June 2021. When the initiative is launched, our cluster can be easily integrated into a distributed computing environment.
*A cursory review of start-up computational resources for a professor starting a theoretical condensed matter group in a EU institution ranges from £30k for a junior lecturer in Oxford to €100-200k for a full professor in Heidelberg. Most universities also provide free or convenient access to Departmental or University clusters. (For example the University of Erlangen, a mid-size German university, grants unlimited unrestricted access to several high-end CPU clusters to all groups, as well as limited restricted access to newly-established groups on more specialized architectures: https://hpc.fau.de/systems-services/systems-documentation-instructions/c...).
** Our Department hosts a GPU cluster, which was obtained through an internal Sapienza call in 2018 (PI Stefano Giagu). However, this cluster is geared exclusively towards Machine Learning Applications, and therefore it is not suitable for applications described in this proposal, which require many cores and gain little from GPU acceleration.