Using Standard Library Functions in SYCL Kernels

Developer Guide

oneAPI GPU Optimization Guide

Download PDF

ID 771772

Date 7/13/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Using Standard Library Functions in SYCL Kernels

Some, but not all, standard C++ functions can be called inside SYCL kernels. See Chapter 18 (Libraries) of Data Parallel C++ for an overview of supported functions. A simple example is provided here to illustrate what happens when an unsupported function is called from a SYCL kernel. The following program generates a sequence of random numbers using the rand() function:

#include <CL/sycl.hpp>
#include <iostream>
#include <random>

constexpr int N = 5;

extern SYCL_EXTERNAL int rand(void);

int main(void) {
#if defined CPU
  sycl::queue Q(sycl::cpu_selector_v);
#elif defined GPU
  sycl::queue Q(sycl::gpu_selector_v);
#else
  sycl::queue Q(sycl::default_selector_v);
#endif

  std::cout << "Running on: "
            << Q.get_device().get_info<sycl::info::device::name>() << std::endl;

  // Attempt to use rand() inside a DPC++ kernel
  auto test1 = sycl::malloc_shared<float>(N, Q.get_device(), Q.get_context());

  srand((unsigned)time(NULL));
  Q.parallel_for(N, [=](auto idx) {
     test1[idx] = (float)rand() / (float)RAND_MAX;
   }).wait();

  // Show the random number sequence
  for (int i = 0; i < N; i++)
    std::cout << test1[i] << std::endl;

  // Cleanup
  sycl::free(test1, Q.get_context());
}

The program can be compiled to execute the SYCL kernel on the CPU (i.e., cpu_selector), or GPU (i.e., gpu_selector) devices. It compiles without errors on the two devices, and runs correctly on the CPU, but fails when run on the GPU:

$ icpx -fsycl -DCPU -std=c++17 external_rand.cpp -o external_rand
$ ./external_rand
Running on: Intel(R) Xeon(R) E-2176G CPU @ 3.70GHz
0.141417
0.821271
0.898045
0.218854
0.304283

$ icpx -fsycl -DGPU -std=c++17 external_rand.cpp -o external_rand
$ ./external_rand
Running on: Intel(R) Graphics Gen9 [0x3e96]
terminate called after throwing an instance of 'cl::sycl::compile_program_error'
  what():  The program was built for 1 devices
Build program log for 'Intel(R) Graphics Gen9 [0x3e96]':

error: undefined reference to `rand()'

error: backend compiler failed build.
 -11 (CL_BUILD_PROGRAM_FAILURE)
Aborted

The failure occurs during Just-In-Time (JIT) compilation because of an undefined reference to rand(). Even though this function is declared SYCL_EXTERNAL, there’s no SYCL equivalent to the rand() function on the GPU device.

Fortunately, the SYCL library contains alternatives to many standard C++ functions, including those to generate random numbers. The following example shows equivalent functionality using the Intel^® oneAPI DPC++ Library (oneDPL) and the Intel^® oneAPI Math Kernel Library (oneMKL):

#include <CL/sycl.hpp>
#include <iostream>
#include <oneapi/dpl/random>
#include <oneapi/mkl/rng.hpp>

int main(int argc, char **argv) {
  unsigned int N = (argc == 1) ? 20 : std::stoi(argv[1]);
  if (N < 20)
    N = 20;

  // Generate sequences of random numbers between [0.0, 1.0] using oneDPL and
  // oneMKL
  sycl::queue Q(sycl::gpu_selector_v);
  std::cout << "Running on: "
            << Q.get_device().get_info<sycl::info::device::name>() << std::endl;

  auto test1 = sycl::malloc_shared<float>(N, Q.get_device(), Q.get_context());
  auto test2 = sycl::malloc_shared<float>(N, Q.get_device(), Q.get_context());

  std::uint32_t seed = (unsigned)time(NULL); // Get RNG seed value

  // oneDPL random number generator on GPU device
  clock_t start_time = clock(); // Start timer

  Q.parallel_for(N, [=](auto idx) {
     oneapi::dpl::minstd_rand rng_engine(seed, idx); // Initialize RNG engine
     oneapi::dpl::uniform_real_distribution<float>
         rng_distribution;                      // Set RNG distribution
     test1[idx] = rng_distribution(rng_engine); // Generate RNG sequence
   }).wait();

  clock_t end_time = clock(); // Stop timer
  std::cout << "oneDPL took " << float(end_time - start_time) / CLOCKS_PER_SEC
            << " seconds to generate " << N
            << " uniformly distributed random numbers." << std::endl;

  // oneMKL random number generator on GPU device
  start_time = clock(); // Start timer

  oneapi::mkl::rng::mcg31m1 engine(
      Q, seed); // Initialize RNG engine, set RNG distribution
  oneapi::mkl::rng::uniform<float, oneapi::mkl::rng::uniform_method::standard>
      rng_distribution(0.0, 1.0);
  oneapi::mkl::rng::generate(rng_distribution, engine, N, test2)
      .wait(); // Generate RNG sequence

  end_time = clock(); // Stop timer
  std::cout << "oneMKL took " << float(end_time - start_time) / CLOCKS_PER_SEC
            << " seconds to generate " << N
            << " uniformly distributed random numbers." << std::endl;

  // Show first ten random numbers from each method
  std::cout << std::endl
            << "oneDPL"
            << "\t"
            << "oneMKL" << std::endl;
  for (int i = 0; i < 10; i++)
    std::cout << test1[i] << " " << test2[i] << std::endl;

  // Show last ten random numbers from each method
  std::cout << "..." << std::endl;
  for (size_t i = N - 10; i < N; i++)
    std::cout << test1[i] << " " << test2[i] << std::endl;

  // Cleanup
  sycl::free(test1, Q.get_context());
  sycl::free(test2, Q.get_context());
}

The necessary oneDPL and oneMKL functions are included in <oneapi/dpl/random> and <oneapi/mkl/rng.hpp>, respectively. The oneDPL and oneMKL examples perform the same sequence of operations: get a random number seed from the clock, initialize a random number engine, select the desired random number distribution, then generate the random numbers. The oneDPL code performs device offload explicitly using a SYCL kernel. In the oneMKL code, the mkl::rng functions handle the device offload implicitly.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

oneAPI GPU Optimization Guide

Using Standard Library Functions in SYCL Kernels