Visible to Intel only — GUID: GUID-9BC8EF2C-66BF-4AC5-8481-7BF100867507
Visible to Intel only — GUID: GUID-9BC8EF2C-66BF-4AC5-8481-7BF100867507
How to Use the Intercept Layer for OpenCLTM Applications
Linux* and OS X*: Linux OS X Build Status | Windows*: Windows Build Status
The Intercept Layer for OpenCL Applications is a tool that can intercept and modify OpenCL calls for debugging and performance analysis. Using the Intercept Layer for OpenCL Applications requires no application or driver modifications.
To operate, the Intercept Layer for OpenCL Applications masquerades as the OpenCL ICD loader (usually) or as an OpenCL implementation (rarely) and is loaded when the application intends to load the real OpenCL ICD loader. As part of the Intercept Layer for OpenCL Application’s initialization, it loads the real OpenCL ICD loader and gets function pointers to the real OpenCL entry points. Then, whenever the application makes an OpenCL call, the call is intercepted and can be passed through to the real OpenCL with or without changes.
To access the OpenCL Intercept Layer repository:
git clone https://github.com/intel/opencl-intercept-layer
All controls are documented here: https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md
- See intercept documentation for information about controls.
To run, use the following setup:
export CLI_OpenCLFileName=/opt/intel/inteloneapi/compiler/latest/linux/lib/libOpenCL.so.1 export LD_LIBRARY_PATH=/home/opencl-intercept-layer/build/intercept:$LD_LIBRARY_PATH export SYCL_BE=PI_OPENCL CLI_ReportToStderr=0 CLI_ReportToFile=1 CLI_HostPerformanceTiming=1 CLI_DevicePerformanceTiming=1 CLI_DumpDir=. ./matrix.dpcpp
This will generate a file called cliintercept_report.txt. The file will include the following data and tables shown below.
Total Enqueues: 2
Total Time (ns): 1604325652
Function Name |
Calls |
Time (ns) |
Time (%) |
Average (ns) |
Min (ns) |
Max (ns) |
---|---|---|---|---|---|---|
clBuildProgram |
1 |
337069812 |
21.01% |
337069812 |
337069812 |
337069812 |
clCreateBuffer |
3 |
3393909 |
0.21% |
1131303 |
140325 |
2036170 |
clCreateCommandQueue WithProperties |
1 |
5221 |
0.00% |
5221 |
5221 |
5221 |
clCreateContext |
1 |
33639 |
0.00% |
33639 |
33639 |
33639 |
clCreateKernel |
1 |
11713 |
0.00% |
11713 |
11713 |
11713 |
clCreateProgramWithIL |
1 |
153337 |
0.01% |
153337 |
153337 |
153337 |
clEnqueueNDRangeKernel ( _ZTS9Matrix1_2IfE ) |
3 |
3102488 |
0.19% |
3102488 |
3102488 |
3102488 |
clEnqueueReadBufferRect |
1 |
1099684 |
0.07% |
1099684 |
1099684 |
1099684 |
clGetContextInfo |
8 |
4720 |
0.00% |
590 |
160 |
1997 |
clGetDeviceIDs |
12 |
53004 |
0.00% |
4417 |
504 |
14853 |
clGetDeviceInfo |
30 |
85695 |
0.01% |
2856 |
133 |
19920 |
clGetExtensionFunction AddressForPlatform |
3 |
6446 |
0.00% |
2148 |
1317 |
3687 |
clGetKernelInfo |
2 |
716 |
0.00% |
358 |
169 |
547 |
clGetPlatformIDs |
2 |
1198290216 |
74.69% |
599145108 |
715 |
1198289501 |
clGetPlatformInfo |
12 |
22538 |
0.00% |
1878 |
404 |
7326 |
clReleaseCommandQueue |
1 |
1744 |
0.00% |
1744 |
1744 |
1744 |
clReleaseContext |
1 |
331 |
0.00% |
331 |
331 |
331 |
clReleaseDevice |
6 |
6365 |
0.00% |
1060 |
491 |
1352 |
clReleaseEvent |
2 |
2398 |
0.00% |
1199 |
992 |
1406 |
clReleaseKernel |
1 |
2733 |
0.00% |
2733 |
2733 |
2733 |
clReleaseMemObject |
3 |
45464 |
0.00% |
15154 |
10828 |
22428 |
clReleaseProgram |
1 |
51380 |
0.00% |
51380 |
51380 |
51380 |
clRetainDevice |
6 |
8680 |
0.00% |
1446 |
832 |
2131 |
clSetKernelArg |
20 |
6976 |
0.00% |
348 |
180 |
1484 |
clSetKernelExecInfo |
3 |
1588 |
0.00% |
529 |
183 |
1149 |
clWaitForEvents |
6 |
60864855 |
3.79% |
10144142 |
928 |
60855555 |
Function Name |
Calls |
Time (ns) |
Time (%) |
Average (ns) |
Min (ns) |
Max (ns) |
---|---|---|---|---|---|---|
_ZTS9Matrix1_2IfE |
1 |
58691515 |
99.98% |
58691515 |
58691515 |
58691515 |
clEnqueueReadBufferRect |
1 |
13390 |
0.02% |
13390 |
13390 |
13390 |
The report includes detailed timing data on both your host and device.