Executive Summary
The Japan Aerospace Exploration Agency (JAXA) is the core agency that supports Japan’s overall space development and utilization. The JAXA Supercomputing Systems (JSS) deliver the computational resources to enable JAXA to conduct everything from basic research to development and utilization in the field. Their former JSS2 High Performance Computing (HPC) systems, named SORA (Supercomputer for earth Observation, Rockets, and Aeronautics), comprise several clusters across multiple facilities.
A new JSS3 system, called TOKI, was recently installed and hosts a large-memory, general-purpose HPC cluster. TOKI-RURI and so on is built on Intel HPC technologies with 2nd Gen Intel® Xeon® Gold 6240 and 6240L processors and Intel® Optane™ persistent memory (Intel Optane PMem).
Challenge
The predecessor JSS2 system is comprised of several clusters across multiple facilities. These locations include the Chofu, Tsukuba, and Kakuda space centers and the Sagamihara Campus of the JAXA Institute of Space and Astronautical Science. SORA-MA, the main cluster, is located at the Chofu Aerospace Center in Tokyo along with pre-post processing, large memory, and login clusters.
The JSS2 SORA-MA cluster was upgraded in 2016 to a 3.49 PFLOPS machine, but even with the upgrade, JAXA scientists were constantly limited on computing resources. They could not meet their needs for traditional computational workloads (e.g., computational fluid dynamics) involving massive, parallel operations. Nor could the system readily support emerging methods, such as artificial intelligence (AI) and machine learning (ML). Limited power capacity prevented expanding existing systems further. Newer and more efficient technologies were available for AI, such as Intel® DL Boost. Additionally, they were running out of storage capacity with their older archival cluster. To continue to advance space exploration, discovery, design, and implementation, JAXA needed mainstream support for AI and to advance their computational workloads with higher performance computing.
The next-generation H3 rocket scheduled to be launched from the Tanegashima Space Center in 2021. (Photo courtesy of JAXA)
Solution
JSS3 TOKI is a multi-cluster system with racks located in both TOkyo and IbaraKI prefectures (thus the name TOKI). Toki is a Japanese expression of “time and space” and “solution.” The word is also the common name of the Japanese Crested Ibis, a bird that has come back from the brink of extinction, thanks to efforts of Japanese wildlife conservationists. For JAXA, TOKI is about new opportunities and discoveries.
Fujitsu designed JSS3 TOKI to meet the requirements of highest performance within the center’s available power resources. The new system was designed to support the following computational domains:
- Numerical simulations to empower Japan’s international competitiveness in aerospace.
- Large-scale data analytics.
- Research and development for discovery of solutions to emerging needs.
TOKI comprises the following clusters at the Chofu Aerospace Center:
- TOKI-SORA—a large HPC system specifically built for supporting SORA activities, such as computational fluid dynamics (CFD).
- 1.24 PFLOPS1 TOKI-RURI (all-RoUnd Role Infrastructure)—a general purpose supercomputer built on Fujitsu PRIMERGY RX2540 M5 nodes with 2nd Gen Intel Xeon Gold 6240 and 6240L processors. TOKI-RURI hosts general-purpose nodes (TOKO-RURI GP), large memory nodes with 1.5 TB of Intel Optane PMem each (TOKO-RURI LM) plus 192 GB of DRAM, and extreme memory nodes with 6 TB of Intel Optane PMem per node (TOKO-RURI XM) plus 768 GB of DRAM. Total memory capacity is 104 TB.
- TOKI-FS (File System)—also built on PRIMERGY RX2540 M5 nodes and 2nd Gen Intel Xeon Scalable processors with 10 PB of all-flash and 40 PB of hard drive storage.
- TOKI-LI (Login system)—14 PRIMERGY RX2540 M5 nodes with 2nd Gen Intel Xeon Scalable processors.
Figure 1. TOKI-RURI system at Chofu Aerospace Center
Results
TOKI-RURI’s large memory nodes with Intel Optane PMem will provide improved performance and capacity to support the commercial ISV applications and highly portable workloads that JAXA depends on. These applications include Ansys ICEM CFD, Fluent, and Chemkin, plus FieldView, CRUNCH CFD, Siemens STAR-CCM+, Metacomp Technologies, CFD++, Dassault Systemes ABAQUS CAE, and Mechanica.
Martian orbital maneuver of the latest version of the MMX spacecraft in 2019. (Photo courtesy of JAXA)
For other HPC workloads, JAXA computational scientists use Intel HPC software tools to help optimize performance on the new supercomputer. With the new capabilities of the 2nd Gen Intel Xeon Scalable processors, they can also take advantage of new software development approaches, including oneAPI, an open, unified programming model built on standards to simplify development and deployment of data-centric workloads across CPUs, GPUs, FPGAs, and other accelerators.
“JAXA scientists are efficiently developing wider area applications using Intel Advanced Vector Extensions 512 (Intel AVX-512) and Intel DL Boost with Intel oneAPI Base toolkit and the Intel oneAPI HPC toolkit,” explained Naoyuki Fujita, manager of Supercomputer Division of JAXA.
The Intel oneAPI Base Toolkit is a core set of tools and libraries for developing high-performance, data-centric applications across diverse architectures. It features an industry-leading C++ compiler and the Data Parallel C++ (DPC++) language, an evolution of C++ for heterogeneous computing. The Intel oneAPI HPC toolkit is an add-on to the base toolkit. It also includes access to the Intel Distribution for Python, the Intel oneAPI DPC++/C++ Compiler, powerful data-centric libraries, and advanced analysis tools.
While TOKI is readied for production, benchmarking indicates the system’s performance meets JAXA users’ needs based on five in-house workloads and traditional benchmarks: HINOCA (combustion simulation), FaSTAR (a high-efficiency CFD tool), UPACS (fluid analysis software), P-FLOW (moving particle simulation), and LS-FLOW (CFD code).
Astronaut Soichi Noguchi starts cultivation of Asian Herb in Space Plant experiment project. (Photo courtesy of JAXA)
Intel Optane Persistent Memory Speeds Non-Parallelized Workloads
Many of the JAXA applications are well parallelized, distributed-computing workloads targeted for the large HPC cluster. Other programs are not parallelized yet or are heavily serial applications that cannot be parallelized. Faster, large memory (LM) and extreme memory (XM) nodes of TOKIRURI will speed up these applications. These nodes offer unparalleled performance and low cost to meet their serial programs and large capacity memory demands.
“With the new TOKI systems, JAXA brings innovation to research on Earth observation data processing, remote sensing, and climate change prediction,” stated Fujita-san. “With the Intel DL Boost and Intel Optane persistent memory, JAXA will be able to contribute to accelerated research in these fields.”
Solution Summary
- Fujitsu PRIMERGY RX2540 and CX2750 M5 server nodes in multiple clusters: TOKI-RURI, TOKI-TRURI, and TOKI-LI.
- 2nd Gen Intel Xeon Gold 6240 and 6240L processors.
- Intel Optane PMem (6 TB/node in XM nodes; 1.5 TB in LM nodes).
- 1.24 petaFLOPS peak performance.