Intel® VTune™ Profiler
Find and Fix Performance Bottlenecks Quickly and Realize All the Value of Your Hardware
Performance Analysis for Applications & Systems
Intel® VTune™ Profiler optimizes application performance, system performance, and system configuration for AI, HPC, cloud, IoT, media, storage, and more.
- CPU, GPU, and NPU: Tune the entire application’s performance―not just the accelerated portion.
- Multilingual: Profile SYCL*, C, C++, C#, Fortran, OpenCL™ code, Python*, Google Go* programming language, Java*, .NET, Assembly, or any combination of languages.
- System or Application: Get coarse-grained system data for an extended period or detailed results mapped to source code.
- Power: Optimize performance while avoiding power- and thermal-related throttling.
Download as Part of the Toolkit
Intel VTune Profiler is included in the Intel® oneAPI Base Toolkit, which is a core set of tools and libraries for developing high-performance, data-centric applications across diverse architectures.
Download the Stand-Alone Version
A stand-alone download of Intel VTune Profiler is available. You can download binaries from Intel or choose your preferred repository.
Features
Algorithm Optimization
- Locate hot spots—the most time-consuming parts of your code.
- Visualize hot code paths and time spent in each function and with its callees with Flame Graph.
Microarchitecture and Memory Bottlenecks
- Identify the most significant hardware issues that affect the performance of your application with microarchitecture exploration analysis.
- Pinpoint memory-access-related issues such as cache misses and high-bandwidth problems.
Accelerators and XPUs
- Optimize GPU offload schema and data transfers for SYCL, OpenCL code, Microsoft DirectX*, or OpenMP* offload code. Identify the most time-consuming GPU kernels for further optimization.
- Analyze GPU-bound code for performance bottlenecks caused by microarchitectural constraints or inefficient kernel algorithms.
- Explore CPU and FPGA interactions and FPGA use.
- Understand how much data is transferred between a neural processing unit (NPU) and DDR memory and identify the most time-consuming tasks running on the NPU.
Parallelism
- Examine how efficiently the code is threaded. Identify threading issues that impact performance.
- Evaluate compute-intensive or throughput HPC applications for efficient CPU use, vectorization, and memory use.
Method for OpenMP Code Analysis
Schedule Overhead in Intel® oneAPI Threading Building Blocks (oneTBB) Applications
Platform and I/O
- Locate performance bottlenecks in I/O-intensive applications. Explore how effectively the hardware processes I/O traffic generated by external PCIe* devices or integrated accelerators.
- Get a fine-grained overview for short-running workloads with System Overview.
Multi-Node
- Characterize performance aspects of large-scale message passing interface (MPI) and OpenMP workloads.
- Identify scalability issues and get recommendations for in-depth analysis.
What's New in 2025
- Adds support for Intel® Core™ Ultra 200V processor (formerly code named Lunar Lake), Intel Core Ultra 200 processor (formerly code named Arrow Lake-S series), and 6th generation Intel Xeon Scalable processors (formerly code named Granite Rapids).
- Identify GPU-bound bottlenecks, optimize rendering pipelines, and improve overall application responsiveness for media and content creation applications on Intel Core Ultra 200V processors.
- Identify and optimize device-side inefficiencies for Microsoft* Direct X APIs.
- Adds profiling support for Python 3.11. Improved productivity with the ability to focus Python profiling to only areas of interest and control performance data collection with instrumentation and tracing technology (ITT) APIs.
For a more complete and up-to-date list, see the release notes.
Get Started
Download
Get Intel VTune Profiler as a stand-alone tool or as part of the Intel oneAPI Base Toolkit.
Try It Out
Get started with Intel VTune Profiler and use an introductory code sample to see how it works.
Learn Analysis Techniques
Use these learning tools and workflows to understand and analyze performance bottlenecks in your application.
Profile Machine Learning Applications
What Customers Are Saying
"Ensuring the best possible performance of systems for our users is a top priority for us. Intel VTune Amplifier helps us do that with effective workload management."
— Dennis O’Connell, senior director of performance engineering, Verizon*
"Intel VTune Profiler is an invaluable tool for identifying hotspots when optimizing code. Its user interface is easy to use and informative, quickening the pace of development. Without access to Intel VTune's line-by-line performance counters, we would never have been able to identify the reasons why our mixed-precision code was running slower than our original double-precision code."
— Dr. Perri Needham, postdoctoral researcher, Walker Molecular Dynamic Laboratory
"We recommend using Intel® MPI for best performance, and tools such as Intel VTune Profiler and Intel® Advisor to help better understand performance optimizations and how to best migrate your workloads to the cloud."
— Ilias Katsardis, HPC solution lead, Google Cloud*
"Intel VTune Profiler [helped us] to analyze code performance and further enhance it to run optimally on our products."
— Won-Chul Bang, PhD, vice president and head of product strategy, Samsung Medison*
"The Application Performance Snapshot feature of Intel VTune Profiler helped us analyze HemeLB running at 96K MPI ranks on SuperMUC-NG of the Leibniz Supercomputing Centre. It was straightforward and effective in its operation and analysis output."
— Dr. Jon McCullough, University College London
"We are always looking for new methods to accelerate workloads in our data center. Our teams used Intel VTune Profiler’s Flame Graph feature and found it intuitive to use and practical for interpreting performance data. This tool [part of the Intel oneAPI Base Toolkit] has become essential to optimizing code and workflows, and its ability to work across Intel CPUs and GPUs adds to our productivity and performance optimization efforts."
— Dr. Markus Rampp, head of HPC Applications Division and deputy director, Max Planck Computing & Data Facility
"We rely super heavily on Intel VTune Profiler and some of the other Intel products that are our primary way to understand performance at very large scale."
— Dan Stanzione, executive director, Texas Advanced Computing Center (TACC)
"Intel® Advanced Vector Extensions 512 (Intel® AVX-512) and Vector Neural Network Instructions (VNNI) acceleration techniques and advanced debugging and profiling capabilities of Intel VTune Profiler helped Netflix* optimize and boost performance in a variety of use cases such as video encoding, microservices latency and throughput improvements, and accelerating machine learning inference tasks."
– Amer Ather, senior cloud architect, Netflix
Case Studies
Specifications
Processor:
- 3rd generation Intel® Xeon® processor family v3 (or later)
- 4th generation (or later) Intel® Core™ processor
GPUs:
- Intel® UHD Graphics for 11th generation Intel processors or newer
- Intel® Iris® Xe graphics
- Intel® Arc™ graphics
- Intel® Server GPU
- Intel® Data Center GPU Flex Series
- Intel® Data Center GPU Max Series
FPGAs:
- Intel® Arria® 10 FPGA and Intel® Stratix® FPGA
Languages:
- SYCL
- C and C++
- C#
- Fortran
- OpenCL code
- Google Go programming language
- Java
- Python
- .NET
Development environments:
- Windows*: Microsoft Visual Studio*, Visual Studio Code
- Linux*: Eclipse*
- Virtual machine support: Kernel-based virtual machine (KVM), Hyper-V*, VMware*
- Container support: Docker*, Singularity*, LXC, Apache Mesos*
- Interface: Desktop or web GUI, command line
For more information, see the system requirements.
Host operating systems:
- Windows
- Linux
Target operating systems:
- Windows
- Linux
- FreeBSD*
- Android*
- Wind River Linux*
- Yocto Project*
Compilers:
- Intel® compilers
- Microsoft* compilers
- GNU Compiler Collection (GCC)*
Threading analysis:
- OpenMP
- Intel® oneAPI Threading Building Blocks
- Native threads
Distributed environments:
- MPI (MPICH-based, OpenMPI)
Get Help
Your success is our success. Access these support resources when you need assistance.
Related Tools
This design and analysis tool achieves high application performance through efficient threading, vectorization, memory use, and GPU offload on current and future Intel hardware. It supports C, C++, Fortran, DPC++, OpenMP, and Python.
- Offload Advisor: Get your code ready for efficient GPU offload even before you have the hardware
- Automated Roofline Analysis: See performance headroom against hardware limitations and get insights for an effective optimization roadmap
- Vectorization Advisor: Enable more vector parallelism and get guidance to improve its efficiency
- Threading Advisor: Model, tune, and test threading design options
Stay In the Know on All Things CODE
Sign up to receive the latest tech articles, tutorials, dev tools, training opportunities, product updates, and more, hand-curated to help you optimize your code, no matter where you are in your developer journey. Take a chance and subscribe. You can change your mind at any time.