Overview
The convolution separable is a process in which a single convolution can be divided into two or more convolutions to produce the same output. The original CUDA* source code is migrated to SYCL for portability across GPUs from multiple vendors.
Area |
Description |
What you will learn |
Migrate convolution separable sample from CUDA to SYCL |
Time to complete |
15 minutes |
Category | Code Optimization |
Key Implementation Details
This sample implements a separable convolution filter of a 2D image with an arbitrary kernel. There are two functions in the code named convolutionRowsGPU and convolutionColumnsGPU in which the kernel functions (convolutionRowsKernel & convolutionColumnsKernel) are called where the loading of the input data and computations are performed. We validate the results with reference CPU separable convolution implementation by calculating the relative L2 norm.
For more information on the convolutionSeparable SYCL migrated sample and build details on CPU and GPU, refer here.
Original CUDA source files: convolutionSeparable.
Migrated SYCL source files including step by step instructions: guided_convolutionSeparable_SYCLmigration.
References
- Data Parallel C++, by James Reinders et al
- oneAPI GPU Optimization Guide
- CUDA Toolkit documentation
- Install oneAPI for NVIDIA GPUs