Visible to Intel only — GUID: GUID-B5DF6C43-542E-4C9D-8AB2-A951E0DFD45F
Visible to Intel only — GUID: GUID-B5DF6C43-542E-4C9D-8AB2-A951E0DFD45F
Profile an Application with Instrumentation
This topic provides detailed information on how to profile an application by providing sample commands for each of the three phases.
Instrumentation Compilation and Linking (Phase One)
Use [Q]prof-gen to produce an executable with instrumented information included.
Linux
ifort -prof-gen -prof-dir /usr/profiled a1.f90 a2.f90 a3.f90 ifort -o a1 a1.o a2.o a3.o
Windows
ifort /Qprof-gen /Qprof-dir:c:\profiled a1.f90 a2.f90 a3.f90 ifort a1.obj a2.obj a3.obj
Use /Qcov-gen option to obtain minimum instrumentation only for code coverage.
ifort /Qcov-gen /Qcov-dir:c:\cov_data a1.f90 a2.f90 a3.f90 ifort a1.obj a2.obj a3.obj
Use the [Q]prof-dir or /Qcov-dir option if the application includes the source files in multiple directories; using the option insures the profile information is generated in one consistent place. The example commands demonstrate how to combine these options on multiple sources.
The compiler gathers extra information when you use the -prof-gen=srcpos or /Qprof-gen:srcpos option. The extra information is collected to support specific Intel tools, including the code coverage tool. If you do not expect to use such tools, do not specify -prof-gen=srcpos or /Qprof-gen:srcpos. The extended option does not provide better optimization and could slow parallel compile times. If you are interested in using the instrumentation only for code coverage, use the /Qcov-gen option, instead of the /Qprof-gen:srcpos option, to minimize instrumentation overhead.
PGO data collection is optimized for collecting data on serial applications at the expense of some loss of precision on areas of high parallelism. However, you can specify the threadsafe keyword with the -prof-gen or the /Qprof-gen compiler option for files or applications that contain parallel constructs using OpenMP features, for example. Using the threadsafe keyword produces instrumented object files that support the collection of PGO data on applications that use a high level of parallelism but may increase the overhead for data collection.
Unlike serial programs, parallel programs using OpenMP may involve dynamic scheduling of code paths, and counts collected may not be perfectly reproducible for the same training data set.
Instrumented Execution (Phase Two)
Run your instrumented program with a representative set of data to create one or more dynamic information files.
Linux
./a1.out
Windows
a1.exe
Executing the instrumented applications generates a dynamic information file that has a unique name and .dyn suffix. A new dynamic information file is created every time you execute the instrumented program.
You can run the program more than once with different input data.
By default, the .dyn filename follows this naming convention: <timestamp>_<pid>.dyn. The .dyn file is either placed into a directory specified by an environment variable, a compile-time specified directory, or the current directory.
To make it easy to distinguish files from different runs, you can specify a prefix for the .dyn filename in the environment variable, INTEL_PROF_DYN_PREFIX. In such a case, executing the instrumented application generates a .dyn filename, like <prefix>_<timestamp>_<pid>.dyn, where <prefix> is the identifier that you have specified. Be sure to set the INTEL_PROF_DYN_PREFIX environment variable before starting your instrumented application.
The value specified in INTEL_PROF_DYN_PREFIX environment variable must not contain < > : " / \ | ? * characters. The default naming scheme will be used if an invalid prefix is specified.
Feedback Compilation (Phase Three)
Before this step, copy all .dyn and .dpi files into the same directory. Compile and link the source files with [Q]prof-use; the option instructs the compiler to use the generated dynamic information to guide the optimization:
Linux
ifort -prof-use -ipo -prof-dir /usr/profiled a1.f90 a2.f90 a3.f90
Windows
ifort /Qprof-use /Qipo /Qprof-dir:c:\profiled a1.f90 a2.f90 a3.f90
This final phase compiles and links the sources files using the data from the dynamic information files generated during instrumented execution (phase 2).
In addition to the optimized executable, the compiler produces a pgopti.dpi file.
Most of the time, you should specify the default optimizations,O2, for phase one, and specify more advanced optimizations, [Q]ipo, during the phase three compilation. The example in phase one used the O2 option and used the [Q]ipo option with phase three.
The compiler ignores the [Q]ipo or [Q]ip option during phase 1 with [Q]prof-gen.