Visible to Intel only — GUID: GUID-08C7B1FF-D507-4D5B-BA90-38E7CD51FEB9
Legal Information
Getting Help and Support
Introducing the Intel® SDK for OpenCL™ Applications
What's New in This Release
Which Version of the Intel® SDK for OpenCL™ Applications Should I Use?
Intel® Code Builder for OpenCL™ API Plug-in for Microsoft Visual Studio*
Intel® Code Builder for OpenCL™ API Plug-in for Eclipse*
Debugging OpenCL™ Kernels on GPU
Intel® SDK for OpenCL™ Applications Standalone Version
OpenCL™ 2.1 Development Environment
Intel® FPGA Emulation Platform for OpenCL™ Getting Started Guide
Troubleshooting Intel® SDK for OpenCL™ Applications Issues
Configuring Microsoft Visual Studio* IDE
Converting an Existing Project into an OpenCL™ Project
OpenCL™ New Project Wizard
Building an OpenCL™ Project
Using OpenCL™ Build Properties
Selecting a Target OpenCL™ Device
Generating and Viewing Assembly Code
Generating and Viewing LLVM Code
Generating Intermediate Program Binaries with Intel® Code Builder for OpenCL™ API Plug-in
Configuring OpenCL™ Build Options
Visible to Intel only — GUID: GUID-08C7B1FF-D507-4D5B-BA90-38E7CD51FEB9
Analyzing OpenCL™ Kernel Performance
To analyze OpenCL™ kernel performance with the Intel® SDK for OpenCL™ Applications Standalone Version, do the following:
- Click the Analyze button.
- Click Refresh kernel(s) to get the list of kernels in the currently open *.cl file.
- Select the target kernel from pull-down menu. If only one kernel is available, it is selected by default.
- Click cells in the Assigned Variables column to create or add variables as kernel arguments. You can assign one-dimensional variables (such as integer, float, char, half, and so on) on-the-fly by typing single values into the table. See section "Creating Variables" for details.
- Set number of iterations, global size and local sizes per workload dimension in the Workgroup size definitions group box.
- Click Analyze to wrap a specific kernel and execute analyses.
You can use the local size(s) text boxes for several different test configurations:
- Set single size value for a single test.
- Add several comma-separated sizes for multiple tests.
- Set 0 to utilize the default framework-assigned local size.
- Click Auto to enable the tool iterate on all sizes that are smaller than global size and device maximum local size.
Also consider the following:
- Using each option is available for each dimension.
- To analyze the kernel in its designed conditions, set a single value.
- To find the local size that provides higher performance results, click Auto or set a list of comma-separated values.
- To improve the analysis accuracy, run each global and local work size combination several times by increasing the Number of iterations value. Several iterations minimize the impact of other system processes or tasks on the kernel execution time.
- Use the Device Information dialog to compare device properties and choose the appropriate device for the kernel.
- When running analysis on Experimental OpenCL 2.1 Platform, you may use local WG size as described in OpenCL 2.0 specification
-
- Local work-group size doesn't have to be a divisor of the global WG size.
- When choosing "auto", all global work-group size devisors and all powers of 2 smaller than the global work-group size ran in the analysis.