Visible to Intel only — GUID: GUID-C24D81E2-0138-4F4D-A3E8-53F981A36E81
Visible to Intel only — GUID: GUID-C24D81E2-0138-4F4D-A3E8-53F981A36E81
Intel® Intercept Layer for OpenCLTM Applications
The Intercept Layer for OpenCL Applications is a tool that can intercept and modify OpenCL calls for debugging and performance analysis. Using the Intercept Layer for OpenCL Applications requires no application or driver modifications.
To operate, the Intercept Layer for OpenCL Applications masquerades as the OpenCL ICD loader (usually) or as an OpenCL implementation (rarely) and is loaded when the application intends to load the real OpenCL ICD loader. As part of the Intercept Layer for OpenCL Application’s initialization, it loads the real OpenCL ICD loader and gets function pointers to the real OpenCL entry points. Then, whenever the application makes an OpenCL call, the call is intercepted and can be passed through to the real OpenCL with or without changes.
To access the OpenCL Intercept Layer repository:
git clone https://github.com/intel/opencl-intercept-layer
All controls are documented here: https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md
- See intercept documentation for information about controls.
To run, use the following setup:
export CLI_OpenCLFileName=/opt/intel/inteloneapi/compiler/latest/linux/lib/libOpenCL.so.1 export LD_LIBRARY_PATH=/home/opencl-intercept-layer/build/intercept:$LD_LIBRARY_PATH export SYCL_BE=PI_OPENCL CLI_ReportToStderr=0 CLI_ReportToFile=1 CLI_HostPerformanceTiming=1 CLI_DevicePerformanceTiming=1 CLI_DumpDir=. ./matrix.dpcpp
This will generate a file called cliintercept_report.txt. The file will include the following data and tables shown below.
Total Enqueues: 2
Total Time (ns): 1604325652
Function Name |
Calls |
Time (ns) |
Time (%) |
Average (ns) |
Min (ns) |
Max (ns) |
---|---|---|---|---|---|---|
clBuildProgram |
1 |
337069812 |
21.01% |
337069812 |
337069812 |
337069812 |
clCreateBuffer |
3 |
3393909 |
0.21% |
1131303 |
140325 |
2036170 |
clCreateCommandQueue WithProperties |
1 |
5221 |
0.00% |
5221 |
5221 |
5221 |
clCreateContext |
1 |
33639 |
0.00% |
33639 |
33639 |
33639 |
clCreateKernel |
1 |
11713 |
0.00% |
11713 |
11713 |
11713 |
clCreateProgramWithIL |
1 |
153337 |
0.01% |
153337 |
153337 |
153337 |
clEnqueueNDRangeKernel ( _ZTS9Matrix1_2IfE ) |
3 |
3102488 |
0.19% |
3102488 |
3102488 |
3102488 |
clEnqueueReadBufferRect |
1 |
1099684 |
0.07% |
1099684 |
1099684 |
1099684 |
clGetContextInfo |
8 |
4720 |
0.00% |
590 |
160 |
1997 |
clGetDeviceIDs |
12 |
53004 |
0.00% |
4417 |
504 |
14853 |
clGetDeviceInfo |
30 |
85695 |
0.01% |
2856 |
133 |
19920 |
clGetExtensionFunctionAddressForPlatform |
3 |
6446 |
0.00% |
2148 |
1317 |
3687 |
clGetKernelInfo |
2 |
716 |
0.00% |
358 |
169 |
547 |
clGetPlatformIDs |
2 |
1198290216 |
74.69% |
599145108 |
715 |
1198289501 |
clGetPlatformInfo |
12 |
22538 |
0.00% |
1878 |
404 |
7326 |
clReleaseCommandQueue |
1 |
1744 |
0.00% |
1744 |
1744 |
1744 |
clReleaseContext |
1 |
331 |
0.00% |
331 |
331 |
331 |
clReleaseDevice |
6 |
6365 |
0.00% |
1060 |
491 |
1352 |
clReleaseEvent |
2 |
2398 |
0.00% |
1199 |
992 |
1406 |
clReleaseKernel |
1 |
2733 |
0.00% |
2733 |
2733 |
2733 |
clReleaseMemObject |
3 |
45464 |
0.00% |
15154 |
10828 |
22428 |
clReleaseProgram |
1 |
51380 |
0.00% |
51380 |
51380 |
51380 |
clRetainDevice |
6 |
8680 |
0.00% |
1446 |
832 |
2131 |
clSetKernelArg |
20 |
6976 |
0.00% |
348 |
180 |
1484 |
clSetKernelExecInfo |
3 |
1588 |
0.00% |
529 |
183 |
1149 |
clWaitForEvents |
6 |
60864855 |
3.79% |
10144142 |
928 |
60855555 |
Function Name |
Calls |
Time (ns) |
Time (%) |
Average (ns) |
Min (ns) |
Max (ns) |
---|---|---|---|---|---|---|
_ZTS9Matrix1_2IfE |
1 |
58691515 |
99.98% |
58691515 |
58691515 |
58691515 |
clEnqueueReadBufferRect |
1 |
13390 |
0.02% |
13390 |
13390 |
13390 |
The report includes detailed timing data on both your host and device.