Overview
The Concurrent Kernels sample demonstrates the use of SYCL queues for concurrent execution of several kernels on GPU devices. The original CUDA source code is migrated to SYCL for portability across GPUs from multiple vendors and further demonstrates how to optimize and improve processing time.
Area |
Description |
What you will learn |
How to migrate CUDA to SYCL |
Time to complete |
15 minutes |
Category |
Concepts and Functionality |
Key Implementation Details
This sample demonstrates the migration of the following prominent CUDA feature:
- Stream and Event Management.
- Reduction
ConcurrentKernels involves a kernel that does no real work but runs at least for a specified number of iterations.
The Sample demonstrates the use of multiple streams to enable simultaneous execution of kernels, where each stream represents an independent context for executing a kernel. By assigning kernels to different streams, they can run concurrently, effectively utilizing the GPU resources. To ensure desired execution times, the sample measures the clock frequency of the device and calculates the number of clock cycles required for each kernel. Finally, the kernels are queued, and a reduction operation is performed using the last stream.
Original CUDA source files: ConcurrentKernels.
Migrated SYCL source files including step by step instructions: guided_ConcurrentKernel_SYCLmigration.
References
- Data Parallel C++, by James Reinders et al
- oneAPI GPU Optimization Guide
- CUDA Toolkit documentation
- Install oneAPI for NVIDIA GPUs