Architecture Agnostic Spin-Wait Loops

ID 672290
Updated 4/26/2018
Version Latest
Public

author-image

By

To fully utilize the power of today's multicore processors, game developers are using more advanced tasking systems which distribute the work across multiple threads in a thread pool. As the thread count increases, so does the chance of contention between the threads on constructs such as job queue locks and other shared resources. There are many ways to work around these, but a common construct is the spin wait loop.

while (!acquire_lock())
{
	// Spin for a specific number of iterations
	for (int i = 0; i < max_spin_count; i++)
	{
		// Pause intrinsic
		_mm_pause();
	}
}

The _mm_pause instruction is used here as it gives a hint to the processor that the calling thread is in a spin-wait loop. This will pause the next instruction from executing and in so doing the processor is not under demand and parts of the pipeline will not be used, thus saving power.

The _mm_pause instruction latency has been similar on most Intel platforms for the past few generations, and due to this historical consistency, many developers have tuned their spin loops with this in mind. However, the latency of the pause instruction was significantly increased by an order of magnitude, starting with the 6th Generation Intel® Core™ i processor family, to provide better power saving opportunities in many scenarios.

As a result of this latency modification, the above fixed-count spin loop would now consume an order of magnitude more cycles which could have a detrimental impact on the performance of your application. To avoid any issues with future architectural changes to this instruction, any spin-wait loops should be checked to ensure they are not implemented with a fixed count of pause instructions. An appropriate modification to the above spin-wait loop would be :

while (!acquire_lock())
{
	// __rdtsc intrinsic is used to read the time stamp counter
	// This allows the loop to run for a fixed number of cycles
	uint64_t prev = __rdtsc();
	do
	{
		// Pause intrinsic

		_mm_pause();
	}
	while ((__rdtsc() - prev) < max_spin_time)
	}

While the above spin-wait loop is very simple and software developers would typically use more advanced spin loops with exponential backoffs etc, it does show how to make software more robust with respect to future architectural changes in instruction latencies.

 

Please read the following article for a more detailed look into the _mm_pause instruction and spin-wait loops: Benefitting Power and Performance Sleep Loops

Further information about programming for Intel® architecture can be found in the Software Development Manuals: Intel® 64 and IA-32 Architectures Software Developer Manuals