Executive Summary
Kyoto University (Kyoto U), with collaborative campuses across Japan and extended schools around the world, hosts its Academic Center for Computing and Media (ACCMS). The ACCMS supports academic studies in computing and media and hosts several High Performance Computing (HPC) systems for computational research.
Existing supercomputers run by the ACCMS were installed in 2016. Research and development of simulation codes and computational practices used by investigators have evolved over the years. Many codes are memory-bandwidth-bound by the existing HPC resources.
After two years of technology research, design, and a tender, in 2023 the ACCMS will install three new supercomputers. The new systems are built on the latest generation of Intel® Xeon® processors to address user needs for high-performance memory bandwidth, large memory, and high parallel performance in a balanced HPC infrastructure.
Challenge
“Many users of our supercomputers have developed their own codes with very long run times for their simulations,” explained Keiichiro Fukazawa, ACCMS Associate Professor. “Most of our researchers run codes for plasma physics, molecular dynamics, and fluid dynamics. Other users employ ISV applications like Gaussian, LS-DYNA, ANSYS, Mathematica, and others.”
Kyoto’s Laurel 3 supercomputer, based on 4th Gen Intel® Xeon® processors, will run commercial software and other user codes.
As researchers using the existing ACCMS supercomputers deployed in 2016 were developing more intensive codes for their projects, they needed more powerful resources that could deliver results in shorter times. Additionally, needed support was running out for their older systems.
“We think the most important thing is to be able to obtain scientific results quickly,” Fukazawa added. “Thus, we need to at least increase the execution efficiency of applications from the hardware and system designs. Additionally, by optimizing the software, we can further improve performance and shorten run times.”
Shorter run times not only deliver faster results, but they also reduce the power—and thus cost for computing—per project. It was time for the ACCMS to benefit from many generations of technology evolution.
Solution
Professor Fukazawa and team designed the new systems with technologies to address the critical needs of higher memory bandwidth performance. NEC was awarded the tender and worked with Dell Technologies to install the new systems, which will replace the existing supercomputers.
“We have three configurations,” he explained, “one for users’ own codes, one for general commercial applications from software vendors, and one for those that require large memory.”
“The largest system includes high-bandwidth memory because the existing systems limit the performance of many applications.”
The three systems will replace their predecessors Camphor 2, Laurel 2, and Cinnamon 2. The new systems include:
- Camphor 3: A 7.63 petaFLOPS system comprised of 1,120 nodes of Dell PowerEdge C6620 servers with 56-core Intel® Xeon® CPU Max Series 9480 and 128 GB of memory. With plans to add Intel Xeon Max Series CPUs, a processor family with integrated High Bandwidth Memory (HBM), later this year, Camphor 3 will provide 3.2 terabytes per second of memory bandwidth to accelerate solution times on many codes.
- Laurel 3: A 2.65 petaFLOPS system using 370 nodes of Dell PowerEdge C6620 servers with 56-core Intel® Xeon® Platinum 8480+ processors and 512 GB of memory. This is the general-purpose system, which will run commercial software and other user codes.
- Cinnamon 3: A 114.6 teraFLOPS system comprised of 16 nodes of Dell PowerEdge C6620 servers with 56-core Intel Xeon Platinum 8480+ processors and 2 TB of memory. Cinnamon 3 will support applications needing very large memory.
A fourth system (Gardenia) built with other technologies will support codes that use GPUs, while an existing Cloud System built on Intel® Xeon® 6354 processors supports interconnecting users around the country. All supercomputers will be supported by a 40.32 PB Lustre file system and a 4.06 PB flash storage system. The nodes and systems are interconnected on a 400 Gbps InfiniBand fabric.
To arrive at the new designs, Fukazawa and his team researched the latest technologies and ran benchmarks of the Intel Xeon processors.
Figure 1. Benchmarked code performance on Laurel 3 compared to Laurel 2 (provided by Kyoto U).1
“For our user applications, in particular,” he commented, “the ratio of memory bandwidth to CPU throughput performance is most important. Thus, we desired a high-performance CPU with high bandwidth memory. The new HBM2e memory technology that we will add later this year will give us high execution efficiency.”
Many investigators write their own vector processing codes. On the older Laurel 2 system, the processor supports Intel® Advanced Vector Extensions 2 (Intel® AVX2). The 4th Gen Intel® Xeon® processors of Laurel 3 integrate Intel® Advanced Vector Extensions 512 (Intel® AVX-512), which doubles the width of vector registers. According to Professor Fukazawa, they expect to see at least 1.5X faster vector codes compared to the older system, helped by using the Intel® Math Kernel Library (Intel® MKL).1
A complete listing of their systems can be found on the ACCMS supercomputer website.
Results
The first of Kyoto U’s new supercomputers will achieve pro¬duction status in mid-2023. Fukazawa and his team will begin acceptance tests and open it to early users to run their codes. According to Professor Fukazawa, benchmarked vectorized codes on the new Laurel 3 system are already achieving 3.8X better node performance compared to Laurel 2, on average (Figure 1).1
“My MagnetoHydroDynamic simulation code can achieve about five times better node performance than the old system,” he concluded. “I was also able to use the test model with the Intel Xeon Max Series CPUs with HBM2e and saw three times better node performance than the new Laurel 3. Thus, I really expect high performance with Intel Max Series CPUs.”—ACCMS Associate Professor Keiichiro Fukazawa
Solution Summary
Kyoto U’s ACCMS supports the supercomputing needs of university researchers with HPC resources across many systems. In particular, users have been running codes that needed higher memory bandwidth performance than in their existing systems. ACCMS designed and will deploy three new systems built on Intel Xeon processors. The upcoming system with Intel Xeon CPU Max 9480 will address the needs of high bandwidth memory, while two systems with Intel Xeon 8480+ processors will serve general and large memory computing needs. The Laurel 3 and Cinnamon 3 systems are active now and Camphor 3 will become operational in October 2023.
Solution Ingredients
- Supercomputers support Kyoto U Academic Center for Computing and Media Studies (ACCMS)
- 386 nodes Intel® Xeon® Platinum 8480+ processor (16 nodes with large memory)