Benchmarking in the Data Center: Expanding to the Cloud

Workshop held in conjunction with ICPE 2024: the 15th ACM/SPEC International Conference on Performance Engineering 2024

Workshop Scope

High performance computing (HPC) is no longer confined to universities and national research laboratories, it is increasingly used in industry and in the cloud. Education of users also needs to take this into account. Users need to be able to evaluate what benefits HPC can bring to their companies, what type of computational resources (e.g. multi-, many-core CPUs, GPUs, hybrid systems) would be best for their workloads and how they can evaluate what they should pay for these resources. Another issue that arises in shared computing environments is privacy: in commercial HPC environments, data produced and software used typically has commercial value, and hence needs to be protected.

Recent general adoption of machine learning has motivated migration of HPC workloads to cloud data centers, and there is a growing interest by the community on performance evaluation in this area, especially for end-to-end workflows. In addition to traditional performance benchmarking and high performance system evaluation (including absolute performance, energy efficiency), as well as configuration optimizations, this workshop will discuss issues that are of particular importance in commercial HPC. Benchmarking has typically involved running specific workloads that are reflective of typical HPC workloads, yet with growing diversity of workloads, theoretical performance modeling is also of interest to allow for performance prediction given a minimal set of measurements. The workshop will be composed of submitted papers, invited talks and a panel composed of representatives from industry.

Submission

We invite novel, unpublished research paper submission within the scope of this workshop. Paper submission topics include, but are not limited to, the following areas:

Authors are invited to submit work as regular paper (up to 8 pages including references). All papers must be prepared in ACM Primary Article Template format: proceedings template. The submitted work shall be in the English language. Inclusion of artefact evaluation/description appendix is encouraged and does not count towards the page limit.

Submitted papers will be peer-reviewed by the technical program committee (TPC). Review of supplementary material is at the discretion of the reviewers; papers must be complete and self-contained.

All accepted papers will be indexed and published in the ACM Digital Library after the workshop as part of the ICPE’24’s workshop track.

Workshop submission site: BID 24 (EasyChair)

Schedule

Paper submission deadline: 01. February 2024 (23:59 AoE) Final extension: 15. February 2024 (23:59 AoE)
Author notification: 01. March 2024
Camera ready deadline: 07. March 2024 (23:59 AoE)
Workshop date: 07. May 2024

Date and Location

7th May 2024
Department of Computing, Imperial College London, 180 Queen's Gate, SW7 2RH, United Kingdom (Room: LT308)
Co-located with ICPE 2024

Workshop Program

Please refer to the full ICPE programme for other co-located events.
Time
Title
~9:00 Welcome remarks

Jens Domke - RIKEN R-CCS, Workshop chair
Tom Lin - University of Bristol, Workshop chair
Session 1 - [Room LT308]
9:00 ~ 9:30
Challenges of Benchmarking New and Emerging Technologies: A Case Study on DAPHNE
Abstract

In the rapidly evolving landscape of applications and systems, innovative software platforms like DAPHNE open opportunities and challenges. DAPHNE is an extensible system infrastructure for integrated data analysis pipelines, including data management and processing, high-performance computing (HPC), and machine learning (ML) training and scoring. This talk will introduce DAPHNE and delve into the multifaceted challenges encountered in the adoption and benchmarking of new and emerging technologies, using DAPHNE as an example. We will discuss aspects such as low community and industry adoption of emerging technologies and how this affects benchmarking and development in DAPHNE. Finally, we will also discuss broader topics such as the challenges behind benchmarking specific components such as the DAPHNE system scheduler and how to report results from large sets of experiments.

Jonas Korndörfe - University of Basel
Bio

I am a PhD researcher at the Department of Mathematics and Computer Science of the University of Basel, Switzerland. I am working as a member of the High Performance Computing (HPC) group. My main topics of interest are, simulations, optimization / performance analysis, parallel programming and specific scheduling techniques for large scale platforms.

9:30 ~ 10:00
Speeding up your time to experiment in the cloud
Abstract

Navigating the cloud can sometimes be confusing for benchmarkers or performance tuning specialists. But when you work in new environments, you can make like much simpler if you understand your tools, their limits, and some of their hidden capabilities. In this talk, we'll discuss AWS ParallelCluster, its structure, limits, and the hidden toolbox that will help you get your environments running smoothly and quickly.

Brendan Bouffle - AWS HPC Engineering
Bio

Boof (Brendan Bouffler) is the head of our Developer Relations group in HPC Engineering at AWS. He's been guilty of responsible for designing and building hundreds of HPC systems in all kind of environments, all over the world. However, he joined AWS when it became clear to him that cloud would become the exceptional tool the global research & engineering community needed to bring on the discoveries that would change the world for us all. He holds a degree in Physics and an interest in testing several of its laws as they apply to bicycles. This has, unfortunately, frequently resulted in hospitalization.

10:00 ~ 10:30
Nunc scio - Application Performance Analysis using Cray CPE - the road from CPU to GPU
Abstract

This presentation is an introduction to the HPE Cray Programming Environment Performance Analysis tools. This is a flexible tool that works at very large scale, and can go from simple analysis to support of applications porting from CPU to GPU. We’ll present the different components and how to use them, from simple code profiling to advanced features.

Gabriel Koren - HPE
Bio

Master Technologist at HPE. Member of the EMEA Performance Engineering Team, working on HPC benchmarks and optimization. Previously at SGI from 2001-1017. Education: Ph.D in Physics, Institute for Theoretical and Applied Optics, University of Paris XI, France.

10:30 ~ 11:00 Break
11:00 ~ 11:30
Isambard-AI - bringing leadership AI Supercomputing to all
Abstract

Isambard-AI is a new leadership class AI Supercomputer being installed for open research use in the UK. It comprises >5000 Grace Hoppers connected via a high-performance slingshot interconnect. It will be one of the world's first, large-scale, open AI supercomputers. It will be used by a wide range of organisations from across the UK to harness the power of AI, which is already the main driver of emerging technologies such as training large language models (LLMs), big data and robotics. The new supercomputing facility will also play a vital role in important areas such as accelerating automated drug discovery and climate research. Key to this will be creating interfaces and software stacks that are welcoming and easy to use for researchers looking to explore how AI can transform their research. In this talk, I will discuss what Isambard AI is, plus, more importantly, how it is being presented to researchers so that they can make the most of the opportunity that this level of compute capability can provide for their work.

Christopher Woods - University of Bristol
Bio

I am an EPSRC Research Software Engineer (RSE) Fellow. My role is to help researchers develop more flexible, performant and sustainable software. I work in the Advanced Computing Research Centre and the School of Chemistry, and have developed a range of programming and computational chemistry teaching resources that are available here. I am a strong supporter of the campaign to recognise the importance of research software, and to improve career pathways for research software engineers and developers, and am one of the founding trustees of the Society of Research Software Engineering.

Session 2 - [Room LT308]
11:30 ~ 12:30
Performance Optimization for Grace CPU Superchip
Abstract

The Grace CPU is NVIDIA's first CPU for the data center. Whether paired with an NVIDIA Hopper H100 GPU in the Grace-Hopper Superchip or used in combination with a second Grace CPU in the Grace Superchip, it provides outstanding performance. We'll dive deep into Grace CPU best practices to squeeze the highest level of performance out of its 72 ARMv9 cores and the up to 500 GB/s memory bandwidth of its LPDDR5 memory. The covered topics range from recommendations for compilers from NVIDIA to open-source performance libraries and developer tools, while also emphasizing coding considerations when targeting NVIDIA Grace CPU, along with SVE vectorization.

Gabriele Paciucci - NVIDIA
Bio

Experienced and accomplished Senior Solution Architect Engineer with 21 years total experience, 10 years leading complex projects and managing cross-functional resources to optimize datacenter technology. Committed to quality and service excellence with aptitude for launching new technology operations. Proved ability to execute with urgency to achieve customer's satisfaction. As team leader, able to mentor junior members and to use his influence and credibility across the industry in order to improve products, fix issues or create new solutions.




Registration

ICPE 2024 Registration

Attending

We will support presenters with travel restrictions by setting up a way to present remotely. However, regular registration fees, as stated on ICPE website, will still apply.

Organizing Committee

Wei-Chen (Tom) Lin (Department of Computer Science, University of Bristol)

Jens Domke (RIKEN Center for Computational Science)

Contact: chairs2024(at)parallel.computer

Program Committee

Aksel Alpay (Universität Heidelberg)

Aleksandar Ilic (Universidade de Lisboa)

Andrey Alekseenko (KTH Royal Institute of Technology)

Filippo Spiga (NVIDIA)

Swapna Raj (NVIDIA)

Joseph Schuchart (University of Tennessee)

Mustafa Abduljabbar (Ohio State University)

Mubarak Ojewale (KAUST)

Ravi Reddy Manumachu (University College Dublin)

Tom Deakin (University of Bristol)

Advisory Committee

Samar Aseeri (King Abdullah University of Science and Technology)

Juan (Jenny) Chen (National University of Defense Technology, China)

Benson Muite (Kichakato Kizito)

Kevin A. Brown (Argonne National Laboratory, US)