Use Case and Prerequisites
You can use Intel® VTune™ Profiler to identify and analyze performance bottlenecks in your serial or parallel application by performing a series of steps in a workflow. This tutorial guides you through these workflow steps while using a sample matrix multiplication application named matrix.
Prerequisites
This tutorial requires you to install several Intel software tools. It is recommended to get them as part of the Intel® oneAPI Base Toolkit. Intel® VTune™ Profiler 2021 and the Intel® oneAPI DPC++/C++ Compiler are available for free as part of this toolkit.
Intel® VTune™ Profiler 2021 or later
Intel® oneAPI DPC++/C++ Compiler
(Optional) Microsoft Visual Studio* IDE
Follow these links to download the components:
This tutorial uses the Intel® oneAPI DPC++/C++ Compiler to establish a common baseline for analysis and performance gain tracking. Your results and workflow may be different depending on the compiler you use.
Workflow
Follow these steps to identify and fix the most prominent performance issues in the sample matrix application.
Establish the application performance baseline
Identify main bottleneck in the matrix application
Eliminate the memory access bottleneck
Assess the performance improvement
Address the vectorization problem
Identify next steps
Visualize the performance gain