Intel® Trace Analyzer and Collector User and Reference Guide

ID 767272
Date 10/31/2024
Public
Document Table of Contents

CPU Cycle Counter

This is a high-resolution counter inside the CPU which counts CPU cycles. This counter is called Timer Stamp Counter (TSC) on x86/Intel®64 architectures. It can be read through an assembler instruction, so the overhead is much lower than gettimeofday(). On the other hand, these counters were never meant to measure long time intervals, so the clock speed also varies a lot, as seen earlier in Figure 5.2.

Figure 5.3 CPU Timer: Error with Linear (above x-axis) and Non-linear Interpolation (below)

Additional complications are:

Multi-CPU machines: the counter is CPU-specific, so if threads migrate from one CPU to another the clock that Intel® Trace Collector reads might jump arbitrarily. Intel® Trace Collector cannot compensate this as it would have to identify the current CPU and read the register in one atomic operation, which cannot be done from user space without considerable overhead.

CPU cycle counters might still be useful on multi-CPU systems: Linux* OS tries to set the registers of all CPUs to the same value when it boots. If all CPUs receive their clock pulse from the same source their counters do not drift apart later on and it does not matter on which CPU a thread reads the CPU register, the value will be the same one each.

This problem could be addressed by locking threads onto a specific CPU, but that could have an adverse effect on application performance and thus is not supported by Intel® Trace Collector itself. If done by the application or some other component, then care has to be taken that all threads in a process run on the same CPU, including those created by Intel® Trace Collector itself. If the application already is single-threaded, then the additional Intel® Trace Collector threads could be disabled to avoid this complication.

Frequency scaling: power-saving mode might lead to a change in the frequency of the cycle count register during the run and thus a non-linear clock drift. Machines meant for HPC probably do not support frequency scaling or will not enter power-saving mode. Even then, on Intel CPUs, TSC often continues to run at the original frequency.

Figure 5.4 getimeofday() without NTP