Agile High-Performance Data Analytics (Rocket)
Continuous, interactive, and exploratory analysis of extreme data sets lies at the heart of a $42B market in 2018, projected to grow at a 10.48% annual rate by 2027. Real-time, interactive, extreme-scale data analytics remains an elusive goal.
Our academic and industrial collaborators report challenges of agility, scale, and complexity. Application logic demands rapid and often exploratory development of new queries. Yet agile query development is rendered difficult by the size of the data sets that have to be analyzed. While HPC computing systems are key to the big data promise, their cost, complexity and steep of learning curve for non-specialist is an obstacle that inhibits adoption.
Rocket aims to lower the barrier to entry for data analytics on HPC systems by providing language technologies to address key technical research priorities of the ETP4HPC Strategic Research Agenda.
Rocket decouples data scientists, who specify what is to be computed, from data engineers, who define how it is to be computed efficiently. Both interact through a shared, flexible and deployment-agnostic, scripting language. Data scientists use it as part of an agile software engineering process to prototype and explore the design space of data analysis solutions. Rocket leverages mainstream distributed infrastructures to run their queries trading latency for accuracy with a novel model-driven sampling infrastructure. Data engineers use a novel annotation mechanism to specify deployment characteristics of the code and tune performance for a particular hardware platform.
By increasing productivity, Rocket aims contribute to expanding the HPC ecosystem towards SMEs, fostering novel opportunities, improved return of investment, and sustainable development in sectors such as intelligence, finance, healthcare, and multimedia.