Visible to Intel only — GUID: GUID-70E18EA1-1EF5-4B0D-AF9F-3D7800825184
Visible to Intel only — GUID: GUID-70E18EA1-1EF5-4B0D-AF9F-3D7800825184
Collecting Lightweight Statistics
Intel® Trace Collector can gather and store statistics about the function calls and their communication. These statistics are gathered even if no trace data is collected, so it is a good starting point for trying to understand an unknown application that might produce an unmanageable trace.
Usage Instructions
To collect this lightweight statistics for your application, set the following environment variables before tracing:
$ export VT_STATISTICS=ON $ export VT_PROCESS=OFF
Alternatively, set the VT_CONFIG environment variable to point to the configuration file:
# Enable statistics gathering STATISTICS ON # Do not gather trace data PROCESS 0:N OFF
$ export VT_CONFIG=<configuration_file_path>/config.conf
The statistics is written into the *.stf file. Use the stftool to convert the data to the ASCII text with --print-statistics. For example:
$ stftool tracefile.stf --print-statistics
The resulting output has easy-to-process format, so you can use text processing programs and scripts such as awk*, Perl*, and Microsoft Excel* for better readability. A Perl script convert-stats with this capability is provided in the bin folder.
Output Format
Each line contains the following information:
Thread or process
Function ID
Receiver (if applicable)
Message size (if applicable)
Number of involved processes (if applicable)
And the following statistics:
Count – number of communications or number of calls as applicable
Minimum execution time excluding callee times
Maximum execution time excluding callee times
Total execution time excluding callee times
Minimum execution time including callee times
Maximum execution time including callee times
Total execution time including callee times
Within each line the fields are separated by colons.
Receiver is set to 0xffffffff for file operations and to 0xfffffffe for collective operations. If message size equals 0xffffffff the only defined value is 0xfffffffe to mark it as a collective operation.
The message size is the number of bytes sent or received per single message. With collective operations the following values (buckets of message size) are used for individual instances:
Value | Process-local bucket | Is the same value on all processes? |
---|---|---|
MPI_Barrier |
0 | Yes |
MPI_Bcast |
Broadcast bytes | Yes |
MPI_Gather |
Bytes sent | Yes |
MPI_Gatherv |
Bytes sent | No |
MPI_Scatter |
Bytes received | Yes |
MPI_Scatterv |
Bytes received | No |
MPI_Allgather |
Bytes sent + received | Yes |
MPI_Allgatherv |
Bytes sent + received | No |
MPI_Alltoall |
Bytes sent + received | Yes |
MPI_Alltoallv |
Bytes sent + received | No |
MPI_Reduce |
Bytes sent | Yes |
MPI_Allreduce |
Bytes sent + received | Yes |
MPI_Reduce_Scatter |
Bytes sent + received | Yes |
MPI_Scan |
Bytes sent + received | Yes |
Message is set to 0xffffffff if no message was sent, for example, for non-MPI functions or functions like MPI_Comm_rank.
If more than one communication event (message or collective operation) occur in the same function call (for example in MPI_Waitall, MPI_Waitany, MPI_Testsome, MPI_Sendrecv etc.), the time in that function is evenly distributed over all communications and counted once for each message or collective operation. Therefore, it is impossible to compute a correct traditional function profile from the data referring to such function instances (for example, those that are involved in more than one message per actual function call). Only the Total execution time including callee times and the Total execution time excluding callee times can be interpreted similar to the traditional function profile in all cases.
The number of involved processes is negative for received messages. If messages were received from a different process/thread it is -2.
Statistics are gathered on the thread level for all MPI functions, and for all functions instrumented through the API or compiler instrumentation.