Use Case and Prerequisites
You can use Intel® VTune™ Profiler to identify and analyze performance bottlenecks in your serial or parallel application by performing a series of steps in a workflow. This tutorial guides you through these workflow steps while using a sample matrix multiplication application named matrix.
Prerequisites
This tutorial requires you to install several Intel software tools. You can download and use these tools for free.
Intel® VTune™ Profiler 2021 or later
Intel® C++ Compiler Classic
Follow these links to download the components:
This tutorial uses the Intel® C++ Compiler Classic to establish a common baseline for analysis and performance gain tracking. Your results and workflow may be different depending on the compiler you use.
Workflow
Follow these steps to identify and fix the most prominent performance issues in the sample matrix application.
Establish the application performance baseline
Identify main bottleneck in the matrix application
Eliminate the memory access bottleneck
Assess the performance improvement
Address the vectorization problem
Identify next steps
Visualize the performance gain