Visible to Intel only — GUID: GUID-DB3A2866-A5C1-4A78-BD7A-458C3431ABF1
Visible to Intel only — GUID: GUID-DB3A2866-A5C1-4A78-BD7A-458C3431ABF1
Migration Workflow Guidelines
Overview
The CUDA* to SYCL* code migration workflow consists of the following high-level stages:
Stage 1: Prepare for Migration. Prepare your project and configure the tool for a successful migration.
Stage 2: Migrate Your Code. Review tool options and migrate your code with the tool.
Stage 3: Review the Migrated Code. Review and manually convert any unmigrated code.
Stage 4: Build the New SYCL Code Base. Build your project with the migrated code.
Stage 5: Validate the New SYCL Application. Validate your new SYCL application to check for correct functionality after migration.
This document describes the steps in each stage with general recommendations and optional steps.
Prerequisites
Install Intel® DPC++ Compatibility Tool. Intel® DPC++ Compatibility Tool is included in the Intel® oneAPI Base Toolkit (Base Kit). If you have not installed the Base Kit, follow the instructions in Install Intel® oneAPI Toolkits and Components.
Intel® DPC++ Compatibility Tool is also available as a stand-alone download. Download the stand-alone tool.
Set up the tool environment. Refer to Get Started with Intel® DPC++ Compatibility Tool for setup instructions.
Stage 1: Prepare for Migration
Before migrating your CUDA code to SYCL, prepare your CUDA source code for the migration process.
Prepare Your CUDA Project
Before migration, it is recommended to prepare your CUDA project to minimize errors during migration:
Make sure your CUDA source code has no syntax errors.
Make sure your CUDA source code is Clang compatible.
Fix Syntax Errors
If your original CUDA source code has syntax errors, it may result in unsuccessful migration.
Before you start migration, make sure that your original CUDA source code builds and runs correctly:
Compile your original source code using the compiler defined for your original CUDA project.
Run your compiled application and verify that it functions as expected.
When your code compiles with no build errors and you have verified that your application works as expected, your CUDA project is ready for migration.
Clang Compatibility
Intel® DPC++ Compatibility Tool uses the latest version of the Clang* parser to analyze your CUDA source code during migration. The Clang parser isn’t always compatible with the NVIDIA* CUDA compiler driver (nvcc). The tool will provide errors about incompatibilities between nvcc and Clang during migration.
In some cases, additional manual edits to the CUDA source may be needed before migration. For example:
The Clang parser may need namespace qualification in certain usage scenarios where nvcc does not require them.
The Clang parser may need additional forward class declarations where nvcc does not require them.
Space within the triple brackets of kernel invocation is tolerated by nvcc but not Clang. For example, cuda_kernel<< <num_blocks, threads_per_block>> >(args…) is ok for nvcc, but the Clang parser requires the spaces to be removed.
If you run the migration tool on CUDA source code that has unresolved incompatibilities between nvcc and Clang parsers, you will get a mixture of errors in the migration results:
Clang errors, which must be resolved in the CUDA source code
DPCT warnings, which must be resolved in the migrated SYCL code
For detailed information about dialect differences between Clang and nvcc, refer to llvm.org’s Compiling CUDA with clang page.
Run CodePin to Capture Application Signature
CodePin is a feature that helps reduce the effort of debugging inconsistencies in runtime behavior. CodePin generates reports from the CUDA and SYCL programs that, when compared, can help identify the source of divergent runtime behavior.
Enable the CodePin tool during the migration in order to capture the project signature.
This signature will be used later for validation after migration.
Enable CodePin with the –enable-codepin option.
For detailed information about debugging using the CodePin tool, refer to Debug Migrated Code Runtime Behavior.
Configure the Tool
CUDA header files used by your project must be accessible to the tool. If you have not already done so, configure the tool and ensure header files are available.
Refer to Get Started with the Intel® DPC++ Compatibility Tool for installation and setup information.
Record Compilation Commands
Use intercept-build to Generate a Compilation Database to capture the detailed build options for your project. The migration tool uses build information from the database (such as header file paths, include paths, macro definitions, compiler options, and the original compiler) to guide the migration of your CUDA code.
If your development environment prevents you from using intercept-build, use the alternate method described in Generate a Compilation Database with Other Build Systems.
re-run intercept-build to get an updated compilation database to use in your migration or
manually update the compilation database to capture the changes from the updated CUDA build script.
Set Up Revision Control
After migration, the recommendation is to maintain and develop your migrated application in SYCL to avoid vendor lock-in, though you may choose to continue your application development in CUDA. Continuing to develop in CUDA will result in the need to migrate from CUDA to SYCL again.
Revision control allows comparison between versions of migrated code, which can help you decide what previous manual changes to the SYCL code you want to merge into the newly migrated code.
Make sure to have revision control for your original CUDA source before the first migration. After the first migration, be sure to place the migrated SYCL code, with all subsequent manual SYCL changes, under revision control as well.
Run Analysis Mode
You can use Analysis Mode to generate a report before migration that will indicate how much of your code will be migrated, how much will be partially migrated, and an estimate of the manual effort needed to complete migration after you have run the tool. This can be helpful to estimate the work required for your migration.
Stage 2: Migrate Your Code
Plan Your Migration
Before executing your migration, review the available tool features and options that can be used to plan your specific migration.
Migration Rules
The tool uses a default set of migration rules for all migrations. If default rules do not give the migration results you need, you can define custom rules for your migration. This is helpful in multiple scenarios, for example:
After migration, you discover multiple instances of similar or identical CUDA source code that were not migrated, and you know how the CUDA source code should be migrated to SYCL. In this case, you can define a custom rule and re-run the migration for better results. This is useful for Incremental Migration or scenarios where you may run multiple migrations over time.
You know before migration that some code patterns in your original CUDA source will not be accurately migrated to SYCL using the built-in rules. In this case, you can define a custom migration rule to handle specific patterns in our CUDA source during migration.
For detailed information about defining custom rules, refer to Migration Rules.
For working examples of custom rules, refer to the optional predefined rules located in the extensions/opt_rules folder on the installation path of the tool.
Incremental Migration
Intel® DPC++ Compatibility Tool provides incremental migration, which automatically merges the results from multiple migrations into a single migrated project.
Incremental migration can be used to
migrate a CUDA* project incrementally, for example 10 files at a time
migrate new CUDA files into an already migrated project
migrate multiple code paths
Incremental migration is enabled by default. Disable incremental migration using the --no-incremental-migration option.
For detailed information and examples of incremental migration, refer to Incremental Migration.
Command-Line Options
Intel® DPC++ Compatibility Tool provides many command-line options to direct your migration. Command-line options provide control to
Refer to the Alphabetical Option List for a full list of all available command-line options.
Buffer vs USM Code Generation
Intel promotes both buffer and USM in the SYCL/oneAPI context. Some oneAPI libraries preferentially support buffer versus USM, so there may be some design consideration in configuring your migration. USM is used by default, but buffer may be a better fit for some projects.
The buffer model sets up a 1-3 dimensional array (buffer) and accesses its components via a C++ accessor class. This grants more control over the exact nature and size of the allocated memory, and how host and offload target compute units access it. However, the buffer model can also create extra class management overhead, which can require more manual intervention and may yield less performance.
USM (unified shared memory) is a newer model, beginning with SYCL2020. USM is a pointer-based memory management model using malloc_device/malloc_shared/malloc_host allocator functions, similar to how C++ code usually handles memory accesses when no GPU device offload is involved. Choosing the USM model can make it easier to add to existing code and migrate from CUDA code. Management of the USM memory space is however very much done by the SYCL runtime, reducing granularity of control for the developer.
For more information on USM versus Buffer modes, please see the following sections of the GPU Optimization Guide: * Unified Shared Memory Allocations * Buffer Accessor Modes
What to Expect in Migrated Code
When the tool migrates CUDA code to SYCL code, it inserts diagnostic messages as comments in the migrated code. The DPCT diagnostic messages are logged as comments in the migrated source files and output as warnings to the console during migration. These messages identify areas in the migrated code that may require your attention to make the code SYCL compliant or correct. This step is detailed in Stage 3: Review the Migrated Code.
The migrated code also uses DPCT helper functions to provide utility support for the generated SYCL code. The helper functions use the dpct:: namespace. Helper function source files are located at <tool-installation-directory>/latest/include/dpct. DPCT helper functions can be left in migrated code but should not be used in new SYCL code. Use standard SYCL and C++ when writing new code. For information about the DPCT namespace, refer to the DPCT Namespace Reference.
Run Migration
After reviewing the available migration tool functionality and options, run your migration.
You can run the tool from the command line or within the Microsoft Visual Studio or Eclipse IDEs.
If your project uses a Makefile or CMake file, use the corresponding option to automatically migrate the file to work with the migrated code:
To migrate a Makefile, use the --gen-build-scripts option.
To migrate a CMake file, use the --migrate-build-script or --migrate-build-script-only option. (Note that these options are experimental.)
For example:
c2s -p compile_commands.json --in-root ../../.. --gen-helper-function --gen-build-scripts
This example migrate command:
uses the tool alias c2s. dpct can also be used.
uses a compilation database, specified with the -p option
specifies the source to be migrated with the --in-root option
instructs the tool to generate helper function files with the --gen-helper-function option
instructs the tool to migrate the Makefile using the --gen-build-scripts option
The following samples show migrations of CUDA code using the tool and targets Intel and NVIDIA* hardware:
Stage 3: Review the Migrated Code
After running Intel® DPC++ Compatibility Tool, manual editing is usually required before the migrated SYCL code can be compiled. DPCT warnings are logged as comments in the migrated source files and output to the console during migration. These warnings identify the portions of code that require manual intervention. Review these comments and make the recommended changes to ensure the migrated code is consistent with the original logic.
For example, this original CUDA* code:
void foo() {
float *f;
cudaError_t err = cudaMalloc(&f, 4);
printf("%s\n", cudaGetErrorString(err));
}
results in the following migrated SYCL code:
void foo() {
float *f;
int err = (f = (float *)sycl::malloc_device(4, dpct::get_default_queue()), 0);
/*
DPCT1009:1: SYCL uses exceptions to report errors and does not use the error
codes. The original code was commented out and a warning string was inserted.
You need to rewrite this code.
*/
printf("%s\n",
"cudaGetErrorString is not supported" /*cudaGetErrorString(err)*/);
}
Note the DPCT1009 warning inserted where additional review is needed.
For a detailed explanation of the comments, including suggestions to fix the issues, refer to the Diagnostics Reference.
At this stage, you may observe that the same DPCT warnings were generated repeatedly in your code or that the same manual edits were needed in multiple locations to fix a specific pattern in your original source code. Consider defining the manual edits needed to fix repeated DPCT warnings as a user-defined migration rule. This allows you to save your corrections and automatically apply them to a future migration of your CUDA source.
Stage 4: Build the New SYCL Code Base
After you have completed any manual migration steps, build your converted code.
Install New SYCL Code Base Dependencies
Converted code makes use of oneAPI library APIs and Intel SYCL extensions. Before compiling, install the appropriate oneAPI libraries and a compiler that supports the Intel SYCL extensions.
If your CUDA source uses … |
… install this oneAPI library |
---|---|
cuBLAS, cuFFT, cuRAND, cuSolver, cuSparse |
Intel® oneAPI Math Kernel Library (oneMKL) |
Thrust, CUB |
Intel® oneAPI DPC++ Library (oneDPL) |
cuDNN |
Intel® oneAPI Deep Neural Network Library (oneDNN) |
NCCL |
Intel® oneAPI Collective Communications Library (oneCCL) |
The following compilers support Intel SYCL extensions:
Most libraries and the Intel® oneAPI DPC++/C++ Compiler are included in the Intel® oneAPI Base Toolkit (Base Kit). Libraries and the compiler are also available as stand-alone downloads.
Compile for Intel CPU and GPU
If your program targets Intel GPUs, install the latest Intel GPU drivers before compiling.
Use your updated Makefile or CMake file to build your program, or compile it manually at the command line using a compiler that supports the Intel SYCL extensions. Make sure that all linker and compilation commands use the -fsycl compiler option with the C++ driver. For example:
icpx -fsycl migrated-file.cpp
For detailed information about compiling with the Intel® oneAPI DPC++/C++ Compiler, refer to the Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference.
Compile for AMD* or NVIDIA* GPU
If your program targets AMD* or NVIDIA GPUs, install the appropriate Codeplay* plugin for the target GPU before compiling. Instructions for installing the AMD and NVIDIA GPU plugins, as well as how to compile for those targets, can be found in the Codeplay plugin documentation:
Install the oneAPI for AMD GPUs plugin from Codeplay.
Install the oneAPI for NVIDIA GPUs plugin from Codeplay.
Stage 5: Validate the New SYCL Application
After you have built your converted code, validate your new SYCL application to check for correct functionality after migration.
Use a Debugger to Validate Migrated Code
After you have successfully compiled your new SYCL application, run the app in debug mode using a debugger such as Intel Distribution for GDB to verify that your application runs as expected after migration.
Learn more about Debugging with Intel Distribution for GDB.
Use CodePIN to Validate Migrated Code
If the CodePin feature has been enabled during the migration time, project signature will be logged during the execution time.
The signature contains the data value of each execution checkpoint, which can be verified manually or with an auto-analysis tool.
For detailed information about debugging using the CodePin tool, refer to Debug Migrated Code Runtime Behavior.
Optimize Your Code
Optimize your migrated code for Intel GPUs using Intel® tools such as Intel® VTune™ Profiler and Intel® Advisor. These tools help identify areas of code to improve for optimizing your application performance.
Additional hardware- or library-specific optimization information is available:
For detailed information about optimizing your code for Intel GPUs, refer to the oneAPI GPU Optimization Guide.
For detailed information about optimizing your code for AMD GPUs, refer to the Codeplay AMD GPU Performance Guide.
For detailed information about optimizing your code for NVIDIA GPUS, refer to the Codeplay NVIDIA GPU Performance Guide.
Find More
Content |
Description |
---|---|
Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference |
Developer guide and reference for the Intel® oneAPI DPC++/C++ Compiler. |
The SYCL 2020 Specification PDF. |
|
Intel branded C++ compiler built from the open-source oneAPI DPC++ Compiler, with additional Intel hardware optimization. |
|
Open-source Intel LLVM-based compiler project that implements compiler and runtime support for the SYCL* language. |
|
Sample CUDA projects with instructions on migrating to SYCL using the tool. |
|
Guided migration samples |
Guided migration of two sample NVIDIA CUDA projects: |
A Jupyter* Notebook that guides you through the migration of a simple example and four step-by-step sample migrations from CUDA to SYCL. |
|
Catalog of CUDA projects that have been migrated to SYCL. |
|
Forum to get assistance when migrating your CUDA code to SYCL. |
|
Intel® oneAPI Math Kernel Library tool to help determine how to include oneMKL libraries for your specific use case. |
|
This tutorial describes the basic scenarios of debugging applications using Intel® Distribution for GDB*. |
|
Tutorials demonstrating an end-to-end workflow using Intel® VTune™ Profiler that you can ultimately apply to your own applications. |