DPC++ Part 2: Programming Best Practices
Subscribe Now
Stay in the know on all things CODE. Updates are delivered to your inbox.
Overview
In Part two of the Data Parallel C++ (DPC++) overview series, software engineer Anoop Madhusoodhanan Prabha discusses the best practices for using this language to program oneAPI applications.
DPC++ is part of the oneAPI initiative and provides an open, cross-industry alternative to single-architecture, proprietary languages. Based on familiar C++ and incorporating SYCL* from The Khronos Group*, DPC++ lets developers more easily port code across various architectures from an existing application’s code base.
With that capability comes unique considerations such as how data should be made available on the device side, and the need for synchronization points between compute kernels running across a host and devices to ensure accurate results and deterministic behavior.
Take the next step in learning DPC++ by joining Anoop as he explains:
- How to efficiently use buffers, sub-buffers, and unified shared memory
- The in-depth implicit synchronization points in DPC++
- The atomics, mutex, work-group barriers, and work-group mem-fence
Other Resources
- Download the Presentation
- Additional DPC++ Best Practices PDF
- Download the first four chapters of a new DPC++ book, written by an expert author team
- Learn more about the oneAPI Initiative
- Explore this initiative led by Intel, including the download of free software toolkits like the essential Intel® oneAPI Base Toolkit, which includes the DPC++ compiler and libraries. Learn More
- Sign up for an Intel® Developer Cloud account—a free development sandbox with access to the latest Intel® hardware and oneAPI software. No downloads. No configuration steps. No installations.
Anoop Madhusoodhanan Prabha
Software engineer, Intel Corporation
With over 10 years of experience as a software engineer, Anoop’s work has included application development, system analysis and design, database administration, data migrations, automations, function point analysis, and critical projects in the telecommunications domain. Since joining Intel in 2009, he has worked on optimizing various customer applications by enabling multithreading, vectorization, and other microarchitectural tunings. He has experience working with OpenMP*, Threading Building Blocks (TBB), CUDA*, and more. Today, Anoop focuses on floating-point reproducibility across various Intel® architectures, containerized solutions for Intel® compiler-based workloads, and continuous integration and continuous deployment adoption.
Anoop holds a master of science in electrical engineering from State University of New York College at Buffalo with an emphasis in high-performance computing, and a bachelor of technology in electronics and communication engineering from Malaviya National Institute of Technology in Jaipur.
Develop high-performance, data-centric applications for CPUs, GPUs, and FPGAs with this core set of tools, libraries, and frameworks including LLVM*-based compilers.