Summary
You have completed the Analyzing OpenMP* and MPI Applications tutorial with Application Performance Snapshot, Intel® Trace Analyzer and Collector, and Intel® VTune™ Profiler. Here are some important things to remember when working with your own hybrid application:
Step |
Tutorial Recap |
Key Tutorial Take-Aways |
---|---|---|
1. Build and configure application |
You made sure all of the relevant tools were installed. You built the application and tested running the application with various process/thread combinations to determine optimization opportunities. |
Test various combinations of MPI processes and OpenMP threads for your hybrid application. Different combinations can produce very different performance results for the same application. |
2. Get a performance overview with Application Performance Snapshot |
You ran the heart_demo application with the -aps option to collect load balance information, memory and disk usage information, and other metrics. |
Use the Application Performance Snapshot HTML report to review where your application is inefficient and determine which tool to use next. |
3. Identify communication issues with Intel Trace Analyzer and Collector |
You ran the application with the -trace option to understand MPI library wait times and communication patterns. You reviewed the results using the Message Profile chart and identified communication issues. |
|
4. Tune MPI-bound code |
You optimized the application by applying the Cuthill-McKee algorithm for reordering a mesh before performing calculations. You used Intel Trace Analyzer and Collector and Application Performance Snapshot to confirm the performance improvement. |
After completing an optimization, it is beneficial to check the performance of the best MPI process and OpenMP thread combinations again to see if there has been any change. Run the application without any analysis software to get an accurate elapsed time. |
5. Analyze vector instruction set with Intel VTune Profiler |
You ran a performance analysis on the heart_demo application using Intel VTune Profiler on the thread suggested by the Application Performance Snapshot report. You updated to the latest vector instruction set. |
Using legacy vector instruction sets can lead to inefficient application performance. Be sure to use the latest vector instruction sets for your application. |
6. Analyze serial and parallel code efficiency with Intel VTune Profiler |
You reviewed issues with parallelism using Intel VTune Profiler. You updated the sample code to fix problem functions. You reviewed the process/thread combinations and observed efficiency improvements. |
Review the Bottom-up tab in Intel VTune Profiler to find sections of your application that would benefit from threading and explore threaded code efficiency. |