跳转至主要内容
英特尔标志 - 返回主页
我的工具

选择您的语言

  • Bahasa Indonesia
  • Deutsch
  • English
  • Español
  • Français
  • Português
  • Tiếng Việt
  • ไทย
  • 한국어
  • 日本語
  • 简体中文
  • 繁體中文
登录 以访问受限制的内容

使用 Intel.com 搜索

您可以使用几种方式轻松搜索整个 Intel.com 网站。

  • 品牌名称: 酷睿 i9
  • 文件号: 123456
  • Code Name: Emerald Rapids
  • 特殊操作符: “Ice Lake”、Ice AND Lake、Ice OR Lake、Ice*

快速链接

您也可以尝试使用以下快速链接查看最受欢迎搜索的结果。

  • 产品信息
  • 支持
  • 驱动程序和软件

最近搜索

登录 以访问受限制的内容

高级搜索

仅搜索

Sign in to access restricted content.

不建议本网站使用您正在使用的浏览器版本。
请考虑通过单击以下链接之一升级到最新版本的浏览器。

  • Safari
  • Chrome
  • Edge
  • Firefox

Workflow for a CUDA* to SYCL* Migration

Overview

Use this basic workflow to migrate your entire code base for CUDA* applications to SYCL* and optimize the code for Intel® GPU kernels.

Target Audience

Software developers with CUDA software development background. 

Prerequisites

The Intel® Tiber™ AI Cloud provides access to a virtual sandbox with access to Intel CPUs and GPUs as well as Intel software developer tools, such as the Intel® oneAPI Base Toolkit (Base Kit).

Alternatively, to use a local development system, you must have access to the following:

  • An Intel GPU. For developer guidance and best practices see the oneAPI GPU Optimization Guide.
  • The Base Kit, which provides core tools and libraries to develop high-performance applications across diverse architectures.
  • The Intel® DPC++ Compatibility Tool, which is available as a stand-alone component or as part of the Base Kit. The Intel DPC++ Compatibility Tool provides guided CUDA to SYCL source code migration.
     

For a complete set of migration resources, see Migrate from CUDA to C++ with SYCL. 

Migration Workflow

Step 1: Decide How to Migrate Your CUDA Sources

Step 2: Migrate Your Code

Step 3: Optimize for Hardware Targets

 

diagram of the three steps in the migration process

Step 1: Decide How to Migrate Your CUDA Sources

Back to Top

To port your source code to C++ with SYCL, ensure you have a working CUDA application. You can migrate your CUDA sources by either:

  • Auto-generating most of the SYCL code using the Intel DPC++ Compatibility Tool, which provides a side-by-side comparison of CUDA to SYCL code.

    See a Migration Example
  • Manually analyzing CUDA sources and replacing all specific CUDA calls with the equivalent SYCL calls.

The Intel DPC++ Compatibility Tool usually migrates 90%-95% of the code and generates warnings for code regions that need manual intervention to complete the migration.1

This tool uses helper functions defined in the <dpct/dpct.hpp> header file. This is due to some SYCL calls being wrapped in an extra layer to aid the dpct helper functions. The manually migrated SYCL code uses SYCL calls and syntax that map directly to CUDA calls.

Download and try a migration using the simple Vector Add sample.

1 Intel estimates are as of September 2021 and based on measurements on a set of 70 HPC benchmarks and samples, with examples such as Rodinia, Scalable Heterogeneous Computing (SHOC), and Pennant. Results may vary.

Step 2: Migrate Your Code

Back to Top

In this step, migrate your source code to SYCL using a manual or assisted method. After finishing the migration, continue your development work on the SYCL source code.

Assisted Migration

Migrate existing CUDA code to SYCL using the Intel DPC++ Compatibility Tool. The tool ports CUDA language kernels and library API calls, and migrates most of the CUDA code to architecture- and vendor-portable SYCL code.

 

Learn with a Code Sample

To become familiar with the migration process for your CUDA sources, use the following resources:

  • Guide to migrating a Jacobi sample: CUDA to SYCL Migration–Jacobi Iterative Method.
    This guide provides a detailed step-by-step analysis of the migration with explicit explanations of the migration process and CUDA to SYCL mappings.
  • Additional background and the associated source code for the ported examples using the Intel DPC++ Compatibility Tool as well as additional migration and optimization techniques are on GitHub*.

   

Manual Migration

The manually migrated SYCL code uses SYCL calls and syntax that directly map to CUDA calls. This approach produces cleaner migrated code that may be easier to maintain in the long term. The code functionality between the two is nearly identical. 

For technical details between CUDA and SYCL mappings using the Jacobi sample, see the instructions in the CUDA to SYCL Migration–Jacobi Iterative Method. This guide explains the underlying concepts of CUDA and SYCL, and the essential terms for migrating the code.

Although there are common steps for offloading and setting up asynchronous streams and memory allocation and copy, the actual work happens in the offload computation. CUDA and SYCL share some basic concepts about creating offload kernels that run on a GPU. To efficiently understand the SYCL syntax, map many of these concepts by identifying the similarities and differences:

  • CUDA thread block and SYCL work group
  • Shared local memory (SLM) access
  • CUDA thread block and SYCL barrier synchronization
  • CUDA cooperative group and SYCL subgroup
  • CUDA warp primitives and SYCL group algorithms
  • CUDA and SYCL atomics

Resources

  • Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference
  • SYCL 2020 Specification

 

   

Step 3: Optimize for Hardware Targets

Back to Top

At this stage, you have working code that compiles and runs. Optimize the migrated code for Intel GPUs using Intel® tools such as Intel® VTune™ Profiler and Intel® Advisor. These tools help identify areas of code to improve for optimizing your application performance. Both tools include graphical user interfaces to help visualize your optimization strategy.

Performance Analysis with Intel® VTune™ Profiler

Use this profiler to create a snapshot of your application performance baseline and identify focus areas for further analysis.

Follow these steps:

  1. Set up your system for GPU analysis.
  2. Launch the Intel VTune Profiler command-line interface.
  3. Run the Performance Snapshot analysis.
  4. View the results.

Roofline Analysis with Intel® Advisor

Use this tool to measure the actual performance of offloaded code using the GPU Roofline Insights analysis. You can evaluate GPU code to see how close the performance is to hardware maximums.

Follow these steps:

  1. Set up your environment to analyze GPU kernels.
  2. Run Roofline Analysis.
  3. Review results to evaluate throughput based on hardware models.

Note For more information on the Jacobi sample, on GitHub, see the "Optimizations" section in Guided Jacobi CUDA Graphs SYCL Migration. The output of these optimization steps is in sycl_migrated_optimized.

Resources

  • Optimize Your GPU Application with the Base Kit
  • oneAPI GPU Optimization Guide
  • Essentials of SYCL for Intel Tiber AI Cloud (Training)

 

  • 公司信息
  • 英特尔资本
  • 企业责任部
  • 投资者关系
  • 联系我们
  • 新闻发布室
  • 网站地图
  • 招贤纳士 (英文)
  • © 英特尔公司
  • 沪 ICP 备 18006294 号-1
  • 使用条款
  • *商标
  • Cookie
  • 隐私条款
  • 请勿分享我的个人信息 California Consumer Privacy Act (CCPA) Opt-Out Icon

英特尔技术可能需要支持的硬件、软件或服务激活。// 没有任何产品或组件能够做到绝对安全。// 您的成本和结果可能会有所不同。// 性能因用途、配置和其他因素而异。请访问 intel.cn/performanceindex 了解更多信息。// 请参阅我们的完整法律声明和免责声明。// 英特尔致力于尊重人权,并避免成为侵犯人权行为的同谋。请参阅英特尔的《全球人权原则》。英特尔产品和软件仅可用于不会导致或有助于任何国际公认的侵犯人权行为的应用。

英特尔页脚标志