Intel® VTune™ Profiler

User Guide

ID 766319
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Highly Accurate CPU Time Data Collection

Configure the Intel® VTune™ Profiler on Windows* OS to get highly accurate CPU time data in the user-mode sampling and tracing results.

By default, the VTune Profiler detects CPU time based on the OS scheduler tick granularity. As a result, the CPU time values may be inaccurate for targets that execute in short quanta less than the OS scheduler tick interval (for example, frame-by-frame computation in video decoders).

Accurate collection of CPU time information is available for the user-mode sampling and tracing analysis types (Hotspots and Threading) and enabled by default in the predefined analysis configurations when you run both the VTune Profiler and your application to analyze with administrator privileges.

To collect more accurate CPU time information, the VTune Profiler uses the Event Tracing for Windows* (ETW) capability. For example, without ETW, a sample is taken every 10ms. For each sample, the OS is queried for the amount of time the thread executed and the difference is calculated between the samples, resulting in the delta. The information returned by the OS via this mechanism has a coarse granularity. VTune Profiler totals the deltas and displays it in the user interface. However, with ETW enabled, the VTune Profiler can filter out any time spent executing other threads and accurately calculate time for monitored threads within each 10ms sample based on the context switch information acquired from ETW. Based on this additional information, the CPU time metric calculated for the function/thread will be more accurate.

VTune Profiler needs exclusive access to the Microsoft* NT Kernel Logger. Therefore, only one VTune Profiler collection can run in this mode on the system and no other tools can use the service. If the VTune Profiler cannot get access to the NT Kernel Logger, the collection will continue with this mode disabled.

This type of collection takes more processing time and disk space. VTune Profiler may generate up to 5 MB of temporary data per minute per logical CPU depending on the system configuration and the profiled target.

Enabling or disabling the accurate CPU time collection depends on what is executing on the system during data collection and the structure of your application. In specific cases, there may be about a 3% variation between "normal" and "highly accurate" CPU time. But, there are corner cases where the difference could be as high as 30% or 40%. If the thread is executing, but happens to be inactive every 10ms that a sample is taken without ETW, the results would grossly misrepresent the execution time. Or, if the thread is mostly inactive, but runs exactly on the frequency of the 10ms samples, it may appear to consume large amounts of time, when in reality it does not. The best thing to do is to test it yourself, if possible. That is, collect the Baic Hotspots data with and without this option on and compare the resulting data. This can tell you if running without the highly accurate CPU time option produces results accurate enough to direct your optimization efforts, or if you need to have Administrative privileges so that you can enable this option. However, if you are restricted from using highly accurate CPU time because of your corporation's policies, you can, in general, be confident that analysis of your application's performance is valid using "normal" Hotspots data collection.

To disable highly accurate CPU time collection for custom analysis:

  1. Create a new custom analysis (based on an existing configuration such as Hotspots or Threading).
  2. Deselect the Collect highly accurate CPU time option.