OpenACC* (for open accelerators) is a parallel programming standard that enables offloading workloads to GPUs from NVIDIA*, AMD* and accelerators from a few other vendors. It is extensively used for Fortran* applications but also supports C and C++ programming languages. It is a light-weight framework focussing on GPU and accelerator offload parallelism only.
A broader, more flexible, and feature-rich framework for adding parallelism to C/C++ and Fortran workloads is OpenMP* introduced in 1997. As it originated as an open multi-architecture standard for efficient parallel programming across CPUs, it is not limited to GPU offload only. Thus, a major advantage for developers provided by OpenMP but not by OpenACC is that they can harness the potential of the latest Intel® hardware including CPUs, GPUs and other accelerators. Since its version 4.0 was released in 2013, OpenMP has progressively added more and more GPU and other accelerator offload capabilities. It continues to evolve quickly.
→ Check out this article on the latest OpenMP 5.2 and 6.0 specification device offload support with the Intel® oneAPi DPC++/C++ Compiler an Intel® Fortran Compiler: Advanced OpenMP* Device Offload with Intel® Compilers
OpenMP may be a slightly heavier-weight paradigm than OpenACC, but this is considerably outweighed by its open, standard-based, feature rich and highly flexible code parallelization framework. It is widely adopted across the entire high performance computing developer ecosystem, letting you take advantage of Intel’s latest built-in accelerators, architectural advances as well as its range of GPUs.
In this article, we will discuss three code samples that illustrate how to migrate your C/C++ code based on OpenACC directives to OpenMP using Intel® oneAPI Base Toolkit and Intel® Application Migration Tool for OpenACC to OpenMP API so that you can fully avail yourselves of the parallel programming capabilities of OpenMP.
Intel® Application Migration Tool for OpenACC* to OpenMP*: An Overview
The Intel Application Migration Tool automatically migrates OpenACC constructs in C/C++ and Fortran code to appropriate OpenMP constructs. It is a Python* 3 based tool that uses offloading mechanisms for code migration. The tool is part of an actively evolving project, reflecting the ever-changing differences between the latest OpenACC and OpenMP specifications. It thrives to be independent of any specific compilers’ infrastructure. Hence, it may not achieve the best performance on a given hardware, but it ensures semantically correct equivalent translation of OpenACC constructs to OpenMP. You may want to analyze the ported output code using performance analysis/debugging tools such as Intel® VTune Profiler and manually tweak the output for improving and optimizing the application performance and efficiency for the exact hardware platform you are targeting.
This is however a small price to pay for having a universally applicable utility like the Intel Application Migration Tool for OpenACC to OpenMP.
→ For more information on the functionality and how-to instructions, refer to the GitHub repo of the Intel Application Migration Tool project.
About The Code Samples
Each of the three code samples we will discuss in this article illustrates how to utilize the Intel Application Migration Tool for porting OpenACC constructs in C/C++ code to OpenMP. For a given input file in each of the samples, the migration tool generates two output files:
- a .translated file with the OpenACC directives migrated to OpenMP ones.
- a .report file that explains
- which OpenACC clauses were translated to which OpenMP clauses,
- which OpenACC constructs could not be migrated and the location of each of the migrated/unmigrated constructs in the code.
For the unmigrated constructs, it also suggests similar OpenMP constructs that you can manually implement.
The OpenMP runtime library is distributed as a part of the Intel® Fortran Compiler and Intel® oneAPI DPC++/C++ Compiler. Since the three code samples discussed here are based on C language, they leverage the Intel oneAPI DPC++/C++ compiler available in the Intel® oneAPI Base Toolkit. Intel® Core™ Processors with integrated Intel® Graphics and Intel® Data Center GPU Max Series are the hardware used for GPU-offload and parallel execution of the code samples. However, you can also migrate these and other workloads to other Intel® architectures of your choice.
Let us dig into the three code samples one-by-one.
Atomic Sample
Atomic operations supported by OpenMP enable multiple threads to update a shared numerical variable. Each atomic operation is only applicable to the single assignment statement following it and is hence thread-safe preventing simultaneous read/write operations by multiple threads on the same variable without resulting in indeterminate value. Atomic operations thus help in achieving fine-grained synchronization.
The Atomic Sample demonstrates how to migrate the parallel loop, read, write, update and capture atomic clauses from OpenACC to OpenMP for offloading the source code to Intel GPUs. You will learn how to install the Intel Application Migration Tool and invoke it for migrating the OpenACC directives to OpenMP.
For example, the tool translates
#pragma acc atomic read
to
#pragma omp atomic read
→ For hardware-software specifications and key implementation details, refer to the code sample on GitHub.
Monte Carlo GPU Sample
Monte Carlo simulation method is a statistical technique that helps predict possible outcomes of an uncertain event using random sampling in complex mathematical and physical systems. It models a system or problem under consideration and then runs multiple simulations on it to estimate the range of possible outcomes. This method is widely used for risk predictions and mitigations in applications such as cost estimation, investment planning, project management, risk assessment, emergency response planning, and portfolio management.
The Monte Carlo GPU Sample uses the Monte Carlo method for estimating the price of a European call option and the confidence interval around the predicted value. A confidence interval refers to the range of values in which one can confidently predict the occurrence of the estimated value based on multiple simulation runs. The code sample demonstrates how to parallelize the simulations by offloading the code to GPUs using OpenMP directives.
Using the Intel Application Migration Tool, most of the OpenACC directives in the sample are migrated to OpenMP. However, some calls remain unmigrated and hence require manual adjustment to select the best translation for the use case. The .report output file shows the untranslated OpenACC API calls and the corresponding closest OpenMP calls.
→ Detailed information on the code sample is available on GitHub.
Bilateral Filter Sample
Bilateral filtering is a non-linear, noise-reducing, edges-preserving technique used for smoothening images. It eliminates most texture, noise, and fine details yet preserves large sharp edges without blurring the image. It does so by replacing the intensity of each pixel with a weighted average of intensity values from nearby pixels.
The result of bilateral filtering depends on the following three parameters:
- Gaussian delta i.e. a function that gives more weight to pixels that are spatially close and similar in intensity – Larger gaussian delta blurs the fine texture.
- Euclidean delta i.e. Euclidean distance between the spatial locations of two pixels – Larger Euclidean delta filters away the fine texture yet keeps the contours as crisp as in the original image.
- Iterations – Having multiple iterations flattens the colors significantly without blurring the edges.
The Bilateral Filter sample demonstrates how to migrate OpenACC based source code for implementing bilateral filter on Intel architectures to OpenMP. For example, the following directive having OpenACC loop construct
#pragma acc loop independent, gang
is translated to
#pragma omp loop order(concurrent)
As there is no direct translation of the OpenACC kernels construct to OpenMP, the migration tool converts it to OpenMP target construct. All such details are included in the .report output file generated by the migration tool.
→ Check out the code sample on GitHub for stepwise set up and implementation details.
What’s Next?
Dive deeper into the OpenACC to OpenMP code samples and learn how to migrate your OpenACC code to OpenMP with minimal code changes using the Intel Application Migration Tool. Get started with the Intel oneAPI DPC++/C++ Compiler and the Intel Fortran Compiler – efficiently compile your code and achieve high-performance parallelism across Intel CPUs and GPUs.
We also encourage you to check out other HPC and AI tools included in Intel’s oneAPI-powered software portfolio for multiarchitecture, cross-vendor, accelerated parallel computing.
Get The Software
Install Intel oneAPI Base Toolkit which includes Intel oneAPI DPC++/C++ Compiler and hence the OpenMP library. Fortran developers can leverage the OpenMP library that comes along with the Intel Fortran Compiler available as a part of the Intel® oneAPI HPC Toolkit.
You can also download standalone versions of the compilers: