Improve Your Application Performance by Changing Communications

Get Started with Intel® Trace Analyzer and Collector

Download PDF

ID 758510

Date 9/27/2021

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Improve Your Application Performance by Changing Communications

Improve the performance of the MPI application by changing blocking to non-blocking communications.

In your code replace the serial MPI_Sendrcv with non-blocking communication: MPI_Isend and MPI_Irecv. For example:

Original code snippet:


// boundary exchange
void exchange(para* p, grid* gr)
{
  int i,j;
  MPI_Status status_100, status_200, status_300, status_400;
  // send down first row
  MPI_Send(gr->x_new[1], gr->lcol+2, MPI_DOUBLE, gr->down, 100, MPI_COMM_WORLD);
  MPI_Recv(gr->x_new[gr->lrow+1], gr->lcol+2, MPI_DOUBLE, gr->up, 100, MPI_COMM_WORLD, &status_100);
  // send up last row
  MPI_Send(gr->x_new[gr->lrow], gr->lcol+2, MPI_DOUBLE, gr->up, 200, MPI_COMM_WORLD);
  MPI_Recv(gr->x_new[0], gr->lcol+2, MPI_DOUBLE, gr->down, 200, MPI_COMM_WORLD, &status_200);
// copy left column to tmp arrays
 if(gr->left != MPI_PROC_NULL)
 {
   for(i=0; i< gr->lrow+2; i++)
   {
     left_col[i] = gr->x_new[i][1];
  }
   MPI_Send(left_col, gr->lrow+2, MPI_DOUBLE, gr->left, 300, MPI_COMM_WORLD);
  }
  if(gr->right != MPI_PROC_NULL)
  {
   MPI_Recv(right_col, gr->lrow+2, MPI_DOUBLE, gr->right, 300, MPI_COMM_WORLD, &status_300);
  // copy right column to tmp
  // copy received left column to ghost cells
  for(i=0; i< gr->lrow+2; i++)
  {
    gr->x_new[i][gr->lcol+1] = right_col[i];
    right_col[i] = gr->x_new[i][gr->lcol];
  }
  // send right
  MPI_Send(right_col, gr->lrow+2, MPI_DOUBLE, gr->right, 400, MPI_COMM_WORLD);
  }
  if(gr->left != MPI_PROC_NULL)
  {
    MPI_Recv(left_col, gr->lrow+2, MPI_DOUBLE, gr->left, 400, MPI_COMM_WORLD,&status_400);
    for(i=0; i< gr->lrow+2; i++)
    {
      gr->x_new[i][0] = left_col[i];
    }
  }
}

Updated code snippet:


MPI_Request req[7];
// send down first row
MPI_Isend(gr->x_new[1], gr->lcol+2, MPI_DOUBLE, gr->down, 100, MPI_COMM_WORLD, &req[0]);
MPI_Irecv(gr->x_new[gr->lrow+1], gr->lcol+2, MPI_DOUBLE, gr->up, 100, MPI_COMM_WORLD, &req[1]);
.....
  MPI_Waitall(7, req, MPI_STATUSES_IGNORE);

Once corrected, the single iteration of the revised application will look like the following example:

Use the Intel Trace Analyzer Comparison view to compare the serialized application with the revised one. Compare two traces with the help of the Comparison View, going to View > Compare. The Comparison View looks similar to:

In the Comparison View, you can see that using non-blocking communication helps to remove serialization and decrease the time of communication of processes.

NOTE:

For more information about node-level performance of your application, see documentation for the respective tools: Intel® VTune™ Profiler MPI Code Analysis and Analyzing Intel® MPI applications using Intel® Advisor.

Parent topic: Get Started with Intel® Trace Analyzer and Collector

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Get Started with Intel® Trace Analyzer and Collector

Improve Your Application Performance by Changing Communications