Intel® VTune™ Profiler

User Guide

ID 766319
Date 12/20/2024
Public
Document Table of Contents

Analyze Linux Kernel I/O

Use the Input and Output analysis of Intel® VTune™ Profiler to match user-level code to I/O operations executed by the hardware.

This collection mode uses hardware event-based sampling collection and system-wide Ftrace* collection to provide a consistent view of the storage system combined with hardware events, as well as an easy-to-use method to match user-level source code to I/O operations executed by the hardware.

NOTE:

This analysis actively relies on the data provided by the kernel block driver sub-system. If your platform utilizes a non-standard block driver sub-system, such as in the case of using user-space storage drivers, I/O metrics will not be available in this analysis type.

VTune Profiler provides the following system-wide metrics for the kernel I/O analysis:

  • I/O Wait — this system-wide metric represents the amount of time during which the CPU cores were idle due to threads being in an I/O wait state.

  • I/O Queue Depth — this metric shows the number of I/O requests submitted to the storage device. If the number of requests in a queue is zero, this means that there are no requests scheduled, and the disk is not utilized at all.

  • I/O Data Transfer — this metric shows the number of bytes read from or written to the storage device(s).

  • Page Faults — this metric shows the number of page faults that have occurred on the system. It is particularly useful when analyzing access to memory-mapped files.

  • CPU Activity — this metric represents the portion of time the system spent in one of the following states:

    • Idle state — the CPU core is idle

    • Active state — the CPU core is executing a thread

    • I/O Wait — the CPU core is idle, but there is a thread that could potentially be executed on this core that is blocked by disk access.

All I/O metrics collected by VTune Profiler, such as I/O Wait Time, I/O Waits, and I/O Queue Depth, are collected in a system-wide mode and are not target-specific.

Analyze I/O Wait Time

To analyze I/O Wait Time, start with the Summary window. This window provides a quick overview of the target system performance and introduces the I/O Wait Time metric that helps you identify whether your application is I/O-bound:

The I/O Wait Time metric represents a portion of time during which the threads are in I/O wait state while the system has cores in idle state. In this case, the number of threads is not greater than the number of idling cores. This aggregated I/O Wait Time metric is an integral function of the I/O Wait metric that is available in the Timeline pane of the Bottom-up window.

To estimate how quickly storage requests are served by the kernel sub-system, see the Disk Input and Output Histogram. Use the Operation Type drop-down menu to select the type of I/O operation you are interested in. For example, for I/O writes, 2-4 storage requests executed within 0.06 seconds or more are classified as slow by VTune Profiler:

To explore this type of I/O request in greater detail, switch to the Bottom-up window.

Analyze Slow I/O Requests

In the Bottom-up window, select an area of interest on the timeline, then use the Zoom In and Filter by Selection context menu option. The Summary histogram is updated to show the data for the selected time range.

For example, in this case, there were 2-4 slow write requests executed during the 6th second of application execution:

By zooming in on an area of interest, you can get a closer look at different metrics and understand the reason behind high I/O wait time.

VTune Profiler collects the I/O Wait type of context switches caused by I/O accesses from the thread, and provides a system-wide I/O Wait metric in the CPU Activity area. Use this data to identify imbalance between I/O and compute operations.

System-wide I/O Wait shows the time during which the system cores were idle, but there were threads in a context switch due to I/O access. Use this metric to estimate the dependency of performance on the storage medium.

For example, an I/O Wait value of 100% means that all cores of the system are idle, but there are threads blocked by I/O requests. To solve this issue, change the logic of the application to run compute threads in parallel with I/O tasks. Alternatively, consider using faster storage.

An I/O Wait value of 0% could mean one of the following:

  • Regardless of the number of threads blocked on storage access, all CPU cores are actively executing application code.

  • No threads are blocked on storage access.

Explore the I/O Queue Depth area to see thee number of storage requests submitted to the storage device. Spikes correspond to the maximum number of requests. Zero-value gaps on the I/O Queue Depth chart correspond to points in application run when storage was not utilized at all.

To identify the exact points in time when slow I/O packets were scheduled for execution, enable the Slow markers for the I/O Queue Depth metric:

To identify points of high bandwidth, analyze the I/O Data Transfer area that shows thee number of bytes read from or written to the storage device.

Analyze Call Stack for I/O Functions

VTune Profiler instruments all user-space I/O functions. This enables you to correlate slow I/O requests with instrumented user-space activities. You can do that by examining the full call stack that points to the exact API invocation.

To view a Task Time call stack for a particular I/O call, select the required I/O API marker on the timeline and explore the stack in the Call Stack pane: