Visible to Intel only — GUID: GUID-897AF4E5-4D79-4EAD-8B73-831615B063E8
Visible to Intel only — GUID: GUID-897AF4E5-4D79-4EAD-8B73-831615B063E8
oneAPI Debug Tools
The following tools are available to help with debugging the SYCL* and OpenMP* offload process.
Tool |
When to Use |
---|---|
Environment variables |
Environment variables allow you to gather diagnostic information from the OpenMP and SYCL runtimes at program execution with no modifications to your program. |
The onetrace tool from Profiling Tools Interfaces for GPU (PTI for GPU) |
When using the oneAPI Level Zero and OpenCL™ backends for SYCL and OpenMP Offload, this tool can be used to debug backend errors and for performance profiling on both the host and device.
|
Intercept Layer for OpenCL™ Applications |
When using the OpenCL™ backend for SYCL and OpenMP Offload, this library can be used to debug backend errors and for performance profiling on both the host and device (has wider functionality comparing with onetrace). |
Intel® Distribution for GDB* |
Used for source-level debugging of the application, typically to inspect logical bugs, on the host and any devices you are using (CPU, GPU, FPGA emulation). |
Intel® Inspector |
This tool helps to locate and debug memory and threading problems, including those that can cause offloading to fail.
NOTE:
Intel Inspector is included in the Intel oneAPI HPC Toolkit or the Intel oneAPI IoT Toolkit.
|
In-application debugging |
In addition to these tools and runtime based approaches, the developer can locate problems using other approaches. For example:
|
Intel® Advisor |
Use to ensure Fortran, C, C++, OpenCL™, and SYCL applications realize full performance potential on modern processors. |
Intel® VTune TM Profiler |
Use to gather performance data either on the native system or on a remote system. |
Debug Environment Variables
Both the OpenMP* and SYCL offload runtimes, as well as Level Zero, OpenCL, and the Shader Compiler, provide environment variables that help you understand the communication between the host and offload device. The variables also allow you to discover or control the runtime chosen for offload computations.
OpenMP* Offload Environment Variables
There are several environment variables that you can use to understand how OpenMP Offload works and control which backend it uses.
Environment Variable |
Description |
---|---|
LIBOMPTARGET_DEBUG |
This environment variable enables debug output from the OpenMP Offload runtime. It reports:
Values: (0, 1, 2) Default: 0 |
LIBOMPTARGET_INFO |
This variable controls whether basic offloading information will be displayed from the offload runtime.
Values: (0, 1, 2, 4, 8, 32) Default: 0 |
LIBOMPTARGET_PLUGIN_PROFILE |
This variable enables the display of performance data for offloaded OpenMP code. It displays:
Values:
Default: F Example: export LIBOMPTARGET_PLUGIN_PROFILE=T,usec |
LIBOMPTARGET_PLUGIN |
This environment variable allows you to choose the backend used for OpenMP offload execution.
NOTE:
The Level Zero backend is only supported for GPU devices.
Values:
Default:
|
SYCL* and DPC++ Environment Variables
The DPC++ compiler supports all standard SYCL environment variables. The full list is available from GitHub. Of interest for debugging are the following SYCL environment variables, plus an additional Level Zero environment variable.
Environment Variable |
Description |
---|---|
SYCL_DEVICE_FILTER |
This complex environment variable allows you to limit the runtimes, compute device types, and compute device IDs used by the runtime to a subset of all available combinations. The compute device IDs correspond to those returned by the SYCL API, clinfo, or sycl-ls (with the numbering starting at 0) and have no relation to whether the device with that ID is of a certain type or supports a specific runtime. Using a programmatic special selector (like gpu_selector) to request a device filtered out by SYCL_DEVICE_FILTER will cause an exception to be thrown. Refer to the Environment Variables descriptions in GitHub for additional details: https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md Example values include:
Default: use all available runtimes and devices |
SYCL_PI_TRACE |
This environment variable enables debug output from the runtime. Values:
Default:disabled |
ZE_DEBUG |
This environment variable enables debug output from the Level Zero backend when used with the runtime. It reports:
Value: variable defined with any value - enabled Default: disabled |
Environment Variables that Produce Diagnostic Information for Support
The Level Zero backend provides a few environment variables that can be used to control behavior and aid in diagnosis.
Level Zero Specification, core programming guide: https://spec.oneapi.com/level-zero/latest/core/PROG.html#environment-variables
Level Zero Specification, tool programming guide: https://spec.oneapi.com/level-zero/latest/tools/PROG.html#environment-variables
An additional source of debug information comes from the Intel® Graphics Compiler, which is called by the Level Zero or OpenCL backends (used by both the OpenMP Offload and SYCL/DPC++ Runtimes) at runtime or during Ahead-of-Time (AOT) compilation. Intel Graphics Compiler creates the appropriate executable code for the target offload device. The full list of these environment variables can be found at https://github.com/intel/intel-graphics-compiler/blob/master/documentation/configuration_flags.md. The two that are most often needed to debug performance issues are:
IGC_ShaderDumpEnable=1 (default=0) causes all LLVM, assembly, and ISA code generated by the Intel® Graphics Compiler to be written to /tmp/IntelIGC/<application_name>
IGC_DumpToCurrentDir=1 (default=0) writes all the files created by IGC_ShaderDumpEnable to your current directory instead of /tmp/IntelIGC/<application_name>. Since this is potentially a lot of files, it is recommended to create a temporary directory just for the purpose of holding these files.
If you have a performance issue with your OpenMP offload or SYCL offload application that arises between different versions of Intel® oneAPI, when using different compiler options, when using the debugger, and so on, then you may be asked to enable IGC_ShaderDumpEnable and provide the resulting files. For more information on compatibility, see oneAPI Library Compatibility.
Offload Intercept Tools
In addition to debuggers and diagnostics built into the offload software itself, it can be quite useful to monitor offload API calls and the data sent through the offload pipeline. For Level Zero, if your application is run as an argument to the onetrace and ze_tracer tools, they will intercept and report on various aspects of Level Zero made by your application. For OpenCL™, you can add a library to LD_LIBRARY_PATH that will intercept and report on all OpenCL calls, and then use environment variables to control what diagnostic information to report to a file. You can also use onetrace or cl_tracer to report on various aspects of OpenCL API calls made by your application. Once again, your application is run as an argument to the onetrace or cl_tracer tool.
Intercept Layer for OpenCL™ Applications
This library collects debugging and performance data when OpenCL is used as the backend to your SYCL or OpenMP offload program. When OpenCL is used as the backend to your SYCL or OpenMP offload program, this tool can help you detect buffer overwrites, memory leaks, mismatched pointers, and can provide more detailed information about runtime error messages (allowing you to diagnose these issues when either CPU, FPGA, or GPU devices are used for computation). Note that you will get nothing useful if you use ze_tracer on a program that uses the OpenCL backend, or the Intercept Layer for OpenCL Applications library and cl_tracer on a program that uses the Level Zero backend.
Additional resources:
Extensive information on building and using the Intercept Layer for OpenCL Applications is available from https://github.com/intel/opencl-intercept-layer.
NOTE:For best results, run cmake with the following flags: -DENABLE_CLIPROF=TRUE -DENABLE_CLILOADER=TRUEInformation about a similar tool (CLIntercept) is available from https://github.com/gmeeker/clintercept and https://sourceforge.net/p/clintercept/wiki/Home/.
Information on the controls for the Intercept Layer for OpenCL Applications can be found at https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md.
Information about optimizing for GPUs is available from the Intel oneAPI GPU Optimization Guide.
Profiling Tools Interfaces for GPU (onetrace, cl_tracer, and ze_trace)
Like the Intercept Layer for OpenCL™ Applications, these tools collect debugging and performance data from applications that use the OpenCL and Level Zero offload backends for offload via OpenMP* or SYCL. Note that Level Zero can only be used as the backend for computations that happen on the GPU (there is no Level Zero backend for the CPU or FPGA at this time). The onetrace tool is part of the Profiling Tools Interfaces for GPU (PTI for GPU) project, found at https://github.com/intel/pti-gpu. This project also contains the ze_tracer and cl_tracer tools, which trace just activity from the Level Zero or OpenCL offload backends respectively. The ze_tracer and cl_tracer tools will produce no output if they are used with the application using the other backend, while onetrace will provide output no matter which offload backend you use.
The onetrace tool is distributed as source. Instructions for how to build the tool are available from https://github.com/intel/pti-gpu/tree/master/tools/onetrace. The tool provides the following features:
Call logging: This mode allows you to trace all standard Level Zero (L0) and OpenCL™ API calls along with their arguments and return values annotated with time stamps. Among other things, this can give you supplemental information on any failures that occur when a host program tries to make use of an attached compute device.
Host and device timing: These provide the duration of all API calls, the duration of each kernel, and application runtime for the entire application.
Device Timeline mode: Gives time stamps for each device activity. All the time stamps are in the same (CPU) time scale.
Chrome Call Logging mode: Dumps API calls to JSON format that can be opened in chrome://tracing browser tool.
These data can help debug offload failures or performance issues.
Additional resources:
Intel® Distribution for GDB*
The Intel Distribution for GDB* is an application debugger that allows you to inspect and modify the program state. With the debugger, both the host part of your application and kernels that are offloaded to a device can be debugged seamlessly in the same debug session. The debugger supports the CPU, GPU, and FPGA-emulation devices. Major features of the tool include:
Automatically attaching to the GPU device to listen to debug events
Automatically detecting JIT-compiled, or dynamically loaded, kernel code for debugging
Defining breakpoints (both inside and outside of a kernel) to halt the execution of the program
Listing the threads; switching the current thread context
Listing active SIMD lanes; switching the current SIMD lane context per thread
Evaluating and printing the values of expressions in multiple thread and SIMD lane contexts
Inspecting and changing register values
Disassembling the machine instructions
Displaying and navigating the function call-stack
Source- and instruction-level stepping
Non-stop and all-stop debug mode
Recording the execution using Intel Processor Trace (CPU only)
For more information and links to full documentation for Intel Distribution for GDB, see Get Started with Intel Distribution for GDB onLinux* Host|Windows* Host.
Intel® Inspector for Offload
Intel® Inspector is a dynamic memory and threading error checking tool for users developing serial and multithreaded applications. It can be used to verify correctness of the native part of the application as well as dynamically generated offload code.
Unlike the tools and techniques above, Intel Inspector cannot be used to catch errors in offload code that is communicating with a GPU or an FPGA. Instead, Intel Inspector requires that the SYCL or OpenMP runtime needs to be configured to execute kernels on CPU target. In general, it requires definition of the following environment variables prior to an analysis run.
To configure a SYCL application to run kernels on a CPU device
export SYCL_DEVICE_FILTER=opencl:cpu
To configure an OpenMP application to run kernels on a CPU device
export OMP_TARGET_OFFLOAD=MANDATORY export LIBOMPTARGET_DEVICETYPE=cpu
To enable code analysis and tracing in JIT compilers or runtimes
export CL_CONFIG_USE_VTUNE=True export CL_CONFIG_USE_VECTORIZER=false
Use one of the following commands to start analysis from the command line. You can also start from the Intel Inspector graphical user interface.
Memory: inspxe-cl -c mi3 -- <app> [app_args]
Threading: inspxe-cl -c ti3 -- <app> [app_args]
View the analysis result using the following command: inspxe-cl -report=problems -report-all
If your SYCL or OpenMP Offload program passes bad pointers to the OpenCL™ backend, or passes the wrong pointer to the backend from the wrong thread, Intel Inspector should flag the issue. This may make the problem easier to find than trying to locate it using the intercept layers or the debugger.
Additional details are available from the Intel Inspector User Guide forLinux* OS|Windows* OS.