Bindless Images for SYCL
The SYCL standard has significantly advanced, transitioning from SYCL 1.2.1 to SYCL 2020. This current standard iteration brings a refined approach to image handling to accommodate developers' diverse and complex needs better. However, the rising popularity of computer vision and increasing demand for accelerated image processing requires graphics and compute APIs to provide additional flexibility for streamlined software development.
The developer community identified areas where SYCL 2020's image management capabilities could be additionally enhanced, particularly for dynamic image arrays and accessing images through handles instead of accessors. Implementing complex image processing solutions has evolved beyond traditional texture binding models and favors a more efficient bindless texture handle approach.
To properly address this need, a new bindless image extension has been proposed to cover these use cases. The proposed extension is not intended to replace the existing SYCL 2020 image functionality. Rather, it serves as the template for a foundational component upon which SYCL 2020 images can be further developed and enhanced.
Support for the bindless texture has been added to the Intel® oneAPI DPC++/C++ Compiler and the Intel® DPC++ Compatibility Tool.
This brief overview will cover the use of the Intel DPC++ Compatibility Tool migrating CUDA* code using a similar construct to SYCL*, and the use of bindless texture with Blender*
Usage Examples
This document will walk-through CUDA* to SYCL migration of the simpleTexture sample from NVIDIA’s cuda-samples on GitHub*. Unlike other graphics-related samples, simpleTexture does not depend on X11 or GL, which makes it easy to set up.
Note: For general information about CUDA to SYCL migration, refer to the Workflow for a CUDA* to SYCL* Migration
Prerequisites:
- OS: Ubuntu* 22.04
- Hardware: Intel® Arc™ Graphics
- Software:
- Intel® oneAPI Base Toolkit 2025.1. Please refer to the Intel® oneAPI Toolkits Installation Guide for Linux* OS for detailed installation instructions.
- The latest Intel Graphics Driver: for installation guidance, refer to Installation for Client GPU
Migrate the simpleTexture sample
The Intel® DPC++ Compatibility Tool automatically migrates 100% of the CUDA runtime APIs to SYCL for this sample. Follow these steps to generate the SYCL code using the compatibility tool:
- Clone the original sample code
$ git clone https://github.com/NVIDIA/cuda-samples.git
- Navigate to the sample folder
$ cd cuda-samples/Samples/0_Introduction/simpleTexture
- Generate a compilation database with intercept-build
$ intercept-build make
- The above step creates a JSON file named compile_commands.json with all the compiler invocations and stores the names of the input files and the compiler options.
- Pass the JSON file as input to the Intel® DPC++ Compatibility Tool. The result is written to a folder named simpleTexture_bindless. The option --in-root specifies the path to the root of the source tree to be migrated. The option --use-experimental-features is required to use bindles images extension on the migrated code.
$ c2s -p compile_commands.json --in-root ../../.. --gen-helper-function --use-experimental-features=bindless_images --out-root=simpleTexture_bindless
Build the simpleTexture Sample for Intel® Arc™ Graphics on Linux*
- Change to simpleTexture_bindless
- Set environment variables:
$ export NEOReadDebugKeys=1 $ export UseBindlessMode=1 $ export UseExternalAllocatorForSshAndDsh=1
- Compile the code:
$ icpx -fsycl -Iinclude -Iinclude/dpct -ICommon \ Samples/0_Introduction/simpleTexture/simpleTexture.dp.cpp \ -o simpleTexture_2025.1
- Copy input data to simpleTexture_bindless:
$ cp -r cuda-samples/Common/data simpleTexture_bindless $ ls simpleTexture_bindless Common data include MainSourceFiles.yaml Makefile.dpct Samples simpleTexture
- Run the code:
$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu SYCL_UR_TRACE=1 ./simpleTexture simpleTexture starting... <LOADER>[INFO]: loaded adapter 0x0x10fd1d0 (libur_adapter_level_zero.so.0) from /opt/intel/oneapi/compiler/2025.1/lib/libur_adapter_level_zero.so.0 SYCL_UR_TRACE: Requested device_type: info::device_type::automatic SYCL_UR_TRACE: Requested device_type: info::device_type::automatic SYCL_UR_TRACE: Requested device_type: info::device_type::automatic SYCL_UR_TRACE: Selected device: -> final score = 1550 SYCL_UR_TRACE: platform: Intel(R) oneAPI Unified Runtime over Level-Zero SYCL_UR_TRACE: device: Intel(R) Arc(TM) A770 Graphics MapSMtoCores for SM 12.55 is undefined. Default to use 128 Cores/SM MapSMtoCores for SM 12.55 is undefined. Default to use 128 Cores/SM MapSMtoCores for SM 12.2 is undefined. Default to use 128 Cores/SM MapSMtoArchName for SM 12.55 is undefined. Default to use Hopper GPU Device 0: "Hopper" with compute capability 12.55 Loaded 'teapot512.pgm', 512 x 512 pixels Processing time: 0.305000 (ms) 859.49 Mpixels/sec Wrote './data/teapot512_out.pgm' Comparing files output: <./data/teapot512_out.pgm> reference: <./data/ref_rotated.pgm> simpleTexture completed, returned OK
Blender: Notice the Impact
Blender is an open-source 3D creation toolset maintained by Blender Foundation. It supports 3D modeling, animation, simulation, rendering, motion tracking, video editing, and game creation. Intel® Arc™ GPU support for Cycles using oneAPI was introduced in 2022 with Blender’s 3.3 LTS release.
Historically, Blender's implementation of Cycles, a path-tracing render engine, has leveraged CUDA bindless textures to enhance its rendering capabilities on GPUs like the NVIDIA GeForce* GTX TITAN Z (based on the microarchitecture code-named "Kepler") and beyond.
Blender stands to gain significantly from the adoption of bindless images, as it would streamline the process of rendering kernels that require access to a dynamic number of textures, which is not feasible with native image accessors. The current implementation, which involves passing textures as 1D buffers and requires an additional 400 lines of code for image access, not only adds maintenance complexity but also causes increasing pressure on Data and Instructions Caches.
Bindless images integrated with one of the latest updates address these issues by enabling hardware-based texture operations. Blender developers looked at various Intel® Arc™ Graphics based GPUs.They observed that the new implementation can lead to rendering speed improvements ranging from 1% to 11%, depending on the scene, when using the Intel® Arc™ A770 Graphics and Intel® Arc™ B580 Graphics. Although they have identified some minor performance regressions, specifically with NanoVDB texture operations on the Arc B580 and in the shade surface MNEE and Raytrace kernels on the Intel® Arc™ A770 Graphics, they consider these issues to be manageable and expect them to be resolved in future updates.
Next Steps
Check out bindless texture and bindless image processing with SYCL. Move your CUDA legacy image processing and rendering implementations to a flexible approach without vendor-lock using the Intel® DPC++ Compatibility Tool.
Useful Resources
- SYCL Bindless Images - An Introduction by Codeplay
- SYCL Extensions: sycl_ext_oneapi_bindless_images
- Level Zero Specification for Bindless Image
- Intel® Arc™ GPU support for Cycles using oneAPI
- 12th IWOCL Workshop: Check out the YouTube* video and SYCL Bindless Images slides
- Codeplay blog announcing blindless images support in oneAPI 2024.0.1 Release
- NVIDIA® Texture Cache in SYCL™