Intel® Inspector User Guide for Windows* OS

ID 767798
Date 10/31/2024
Public
Document Table of Contents

APIs for Custom Synchronization

While the Intel Inspector supports a significant portion of the Windows* OS and POSIX* APIs, it is often useful to define your own synchronization constructs. Any specially built constructs that you create are not normally tracked by the Intel Inspector; however, the Intel Inspector supports synchronization APIs to help you gather semantic information related to your custom synchronization constructs.

Synchronization constructs may generally be modeled as a series of signals. One thread, or many threads, may wait for a signal from another group of threads before proceeding with some action. Synchronization APIs track when a thread begins waiting for a signal and when the signal occurs.

Using User-Defined Synchronization APIs in Your Code

Use This in C/C++ Code

Use This in Fortran Code

To Do This

void __itt_sync_acquired (
   void *addr)
subroutine itt_sync_acquired(addr)
   integer(kind=itt_ptr), intent(in), value :: addr
end subroutine itt_sync_acquired

Tell the Intel Inspector that the code received a signal on the specified synchronization object.

void __itt_sync_releasing (
   void *addr)
subroutine itt_sync_releasing(addr)
   integer(kind=itt_ptr), intent(in), value :: addr
end subroutine itt_sync_releasing

Tell the Intel Inspector that the code is about to send a signal on the specified synchronization object.

void __itt_sync_destroy (
   void *addr)
subroutine itt_sync_destroy(addr)
   integer(kind=itt_ptr), intent(in), value :: addr
end subroutine itt_sync_destroy

Tell the Intel Inspector that the synchronization object will not be used again, so the Intel Inspector can dispose of bookkeeping information associated with this object.

The addr parameter is simply a value that uniquely identifies the synchronization object to be modeled. Unique values allow the Intel Inspector to track distinct custom synchronization objects. To use the same custom object to protect access in different parts of your code, use the same addr parameter around each.

Since each custom synchronization construct may involve any number of synchronization objects, each synchronization object must be triggered off a unique memory handle, which the synchronization APIs will use to track the object. You can track any number of synchronization objects at one time using synchronization APIs, as long as each object uses a unique memory pointer. You can think of this as modeling objects similar to the WaitForMultipleObjects function in the Windows* OS API. You can create more complex synchronization constructs from a group of synchronization objects.

API Usage Tips

Follow these guidelines to properly insert synchronization APIs within your code:

  • Insert an acquired API immediately after your code stops waiting for a synchronization object.

  • Insert a releasing API immediately before the code signals that it no longer holds a synchronization object.

If you place the synchronization APIs improperly, the Intel Inspector may report threading problems where there are none or fail to detect real threading problems.

Usage Example: User-Defined Synchronized Critical Section

The following code snippets show how to create a critical section construct that can be tracked with synchronization APIs:

C/C++ Example

Fortran Example

#include <ittnotify.h>

CSEnter(MyCriticalSection * cs)
{
  while(cs->LockIsUsed)
  {
    if(cs->LockIsFree)
    {
    // Code to acquire the lock goes here
    __itt_sync_acquired((void *) cs);
    } 
  }
}
CSLeave(MyCriticalSection *cs)
{
if(cs->LockIsMine)
    {
        __itt_sync_releasing((void *) cs);
        // Code to release the lock goes here
    }
}
use ittnotify

subroutine CSEnter(cs)

  integer cs
  while(LockIsUsed(cs) .ne. 1) if(LockIsFree(cs) .eq. 1)
    
    ! Code to acquire the lock goes here
    call itt_sync_acquired(LOC(cs))
    end if
  enddo
end subroutine 

subroutine CSLeave(integer cs)
{
    integer cs
    if(LockIsMine(cs) .eq. 1)
        call itt_sync_releasing(LOC(cs));
        ! Code to release the lock goes here
    end if
end subroutine

Note the following when looking at this simple critical section example:

  • The acquired API is placed immediately after the code obtains the user lock.

  • The releasing API is placed before the code releases the user lock. This ensures another thread does not call the acquired API before the Intel Inspector realizes this thread has released the lock.

Usage Example: User-Level Synchronized Barrier

Higher-level constructs, such as barriers, are also easy to model using synchronization APIs. The following code snippets show how to create a barrier construct that can be tracked using synchronization APIs:

C/C++ Example

Fortran Example

#include <ittnotify.h>

Barrier()
{
    teamflag = false;
    __itt_sync_releasing((void *) &counter);
    InterlockedIncrement(&counter);  
    //use the atomic increment primitive
      appropriate to your OS and compiler

    if( counter == thread_count )
    {
        __itt_sync_acquired((void *) &counter);
        __itt_sync_releasing((void *) &teamflag);
        counter = 0;
        teamflag = true;
        
    }
    else
    {
        Wait for team flag
        __ itt_sync_acquired((void *) &teamflag);
    }
}
use ittnotify

subroutine barrier()
    common /x/ teamflag, counter, thread_count
    integer teamflag
    integer thread_count
    integer counter
    teamflag = 0
    call itt_sync_releasing(LOC(counter))
    !atomically update counter here  
    !use the atomic increment primitive
      !appropriate to your OS and compiler

    If ( counter .eq. thread_count ) then
        call itt_sync_acquired(LOC(counter))
        call itt_sync_releasing(LOC(teamflag))
        counter = 0
        teamflag = 1
        
    else
        !Wait for team flag
        call itt_sync_acquired(LOC(teamflag))
    end if
end subroutine

Note the following when looking at this example:

  • There are two synchronization objects in this barrier code. The counter object is used to do a gather-like signaling from all the threads to the final thread indicating that each thread has entered the barrier. Once the last thread hits the barrier, it uses the teamflag object to signal all the other threads that they may proceed.

  • As each thread enters the barrier, it calls the releasing API to tell the Intel Inspector it is about to signal the last thread by incrementing counter.

  • The last thread to enter the barrier calls the acquired API to tell the Intel Inspector it was successfully signaled by all the other threads.

  • The last thread to enter the barrier then calls the releasing API to tell the Intel Inspector it is going to signal the barrier completion to all the other threads by setting teamflag.

  • Finally, before leaving the barrier, each thread calls the acquired API to tell the Intel Inspector it successfully received the end-of-barrier signal.