Improve Your Application Performance by Changing Communications

Get Started with Intel® Trace Analyzer and Collector

Download PDF

ID 758510

Date 10/31/2024

Version

Public

Improve Your Application Performance by Changing Communications

Improve the performance of the MPI application by changing blocking to non-blocking communications.

In your code replace the serial MPI_Sendrcv with non-blocking communication: MPI_Isend and MPI_Irecv. For example:

Original code snippet:


// boundary exchange
void exchange(para* p, grid* gr)
{
  int i,j;
  MPI_Status status_100, status_200, status_300, status_400;
  // send down first row
  MPI_Send(gr->x_new[1], gr->lcol+2, MPI_DOUBLE, gr->down, 100, MPI_COMM_WORLD);
  MPI_Recv(gr->x_new[gr->lrow+1], gr->lcol+2, MPI_DOUBLE, gr->up, 100, MPI_COMM_WORLD, &status_100);
  // send up last row
  MPI_Send(gr->x_new[gr->lrow], gr->lcol+2, MPI_DOUBLE, gr->up, 200, MPI_COMM_WORLD);
  MPI_Recv(gr->x_new[0], gr->lcol+2, MPI_DOUBLE, gr->down, 200, MPI_COMM_WORLD, &status_200);
// copy left column to tmp arrays
 if(gr->left != MPI_PROC_NULL)
 {
   for(i=0; i< gr->lrow+2; i++)
   {
     left_col[i] = gr->x_new[i][1];
  }
   MPI_Send(left_col, gr->lrow+2, MPI_DOUBLE, gr->left, 300, MPI_COMM_WORLD);
  }
  if(gr->right != MPI_PROC_NULL)
  {
   MPI_Recv(right_col, gr->lrow+2, MPI_DOUBLE, gr->right, 300, MPI_COMM_WORLD, &status_300);
  // copy right column to tmp
  // copy received left column to ghost cells
  for(i=0; i< gr->lrow+2; i++)
  {
    gr->x_new[i][gr->lcol+1] = right_col[i];
    right_col[i] = gr->x_new[i][gr->lcol];
  }
  // send right
  MPI_Send(right_col, gr->lrow+2, MPI_DOUBLE, gr->right, 400, MPI_COMM_WORLD);
  }
  if(gr->left != MPI_PROC_NULL)
  {
    MPI_Recv(left_col, gr->lrow+2, MPI_DOUBLE, gr->left, 400, MPI_COMM_WORLD,&status_400);
    for(i=0; i< gr->lrow+2; i++)
    {
      gr->x_new[i][0] = left_col[i];
    }
  }
}

Updated code snippet:


MPI_Request req[7];
// send down first row
MPI_Isend(gr->x_new[1], gr->lcol+2, MPI_DOUBLE, gr->down, 100, MPI_COMM_WORLD, &req[0]);
MPI_Irecv(gr->x_new[gr->lrow+1], gr->lcol+2, MPI_DOUBLE, gr->up, 100, MPI_COMM_WORLD, &req[1]);
.....
  MPI_Waitall(7, req, MPI_STATUSES_IGNORE);

Once corrected, the single iteration of the revised application will look like the following example:

Use the Intel Trace Analyzer Comparison view to compare the serialized application with the revised one. Compare two traces with the help of the Comparison View, going to View > Compare. The Comparison View looks similar to:

In the Comparison View, you can see that using non-blocking communication helps to remove serialization and decrease the time of communication of processes.

NOTE:

For more information about node-level performance of your application, see documentation for the respective tools: Intel® VTune™ Profiler MPI Code Analysis and Analyzing Intel® MPI applications using Intel® Advisor.

Parent topic: Get Started with Intel® Trace Analyzer and Collector

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Get Started with Intel® Trace Analyzer and Collector

Improve Your Application Performance by Changing Communications