Visible to Intel only — GUID: GUID-CB5D324C-A860-465A-9BD1-676AC3B58584
Visible to Intel only — GUID: GUID-CB5D324C-A860-465A-9BD1-676AC3B58584
run_oa.py Options
Collect basic data, do markup, and collect refinement data. Then proceed to run analysis on profiling data. This script combines the separate scripts collect.py and analyze.py.
Usage
advisor-python <APM>/run_oa.py <project-dir> [--options] -- <target> [target-options]
Options
The following table describes options that you can use with the run_oa.py script. The target application to analyze and application options, if any, must be preceded by two dashes and a space and placed at the end of a command.
Option |
Description |
---|---|
<project-dir> |
Required. Specify the path to the Intel® Advisor project directory. |
-h --help |
Show all script options. |
-v <verbose> --verbose <verbose> |
Specify output verbosity level:
NOTE:
This option affects the console output, but does not affect logs and report results.
|
--assume-dependencies (default) | --no-assume-dependencies |
Assume that a loop has a dependency if the loop type is not known. When disabled, assume that a loop does not have dependencies if the loop dependency type is unknown. |
--assume-hide-taxes [<loop-id> | <file-name>:<line-number>] |
Use an optimistic approach to estimate invocation taxes: hide all invocation taxes except the first one. You can provide a comma-separated list of loop IDs and source locations to hide taxes for. If you do not provide a list, taxes are hidden for all loops. |
--assume-never-hide-taxes (default) |
Use a pessimistic approach to estimate invocation taxes: do not hide invocation taxes. |
--assume-parallel | --no-assume-parallel (default) |
Assume that a loop is parallel if the loop type is not known. |
--check-profitability (default) | --no-check-profitability |
Check the profitability of offloading regions. Only regions that can benefit from the increased speed are added to a report. When disabled, add all evaluated regions to a report, regardless of the profitability of offloading specific regions. |
-c {basic, refinement, full} --collect {basic, refinement, full} |
Specify the type of data to collect for the application:
NOTE:
For --collect full, make sure to use --data-reuse-analysis and --track-memory-objects.
For --collect basic, make sure to use the --track-memory-objects. |
--config <config> |
Specify a configuration file by absolute path or name. If you choose the latter, the model configuration directory is searched for the file first, then the current directory. The following device configurations are available: xehpg_512xve (default), xehpg_256xve , gen12_tgl, gen12_dg1.
NOTE:
You can specify several configurations by using the option more than once.
|
--cpu-scale-factor <integer> |
Assume a host CPU that is faster than the original CPU by the specified value. All original CPU times are divided by the scale factor. |
--data-reuse-analysis (default) | --no-data-reuse-analysis |
Estimate data reuse between offloaded regions. Disabling can decrease analysis overhead.
IMPORTANT:
Use with --collect full.
|
--data-transfer (default) | --no-data-transfer |
Analyze data transfer.
NOTE:
Disabling can decrease analysis overhead.
|
--dry-run |
Show the Intel® Advisor CLI commands for advisor appropriate for the specified configuration. No actual collection is performed. |
--enable-batching | --disable-batching (default) |
Enable job batching for top-level offloads. Emulate the execution of more than one instance simultaneously. |
--enable-edram |
Enable eDRAM modeling in the memory hierarchy model. |
--enable-slm |
Enable SLM modeling in the memory hierarchy model. Use both with collect.py and analyze.py. |
--exclude-from-report <items-to-exclude> |
Specify items to exclude from a report. Available items: memory_objects, sources, call_graph, dependencies, strides. By default, you can exclude the following items from the report:
Use this option if your report is heavy weight, for example, due to containing a lot of memory objects or sources, which slows down opening in a browser.
NOTE:
This option affects only data shown in the HTML report and does not affect data collection.
|
--executable-of-interest <executable-name> |
Specify an executable process name to profile if it is not the same as the application to run. Use this option if you run your application via script or other binary.
NOTE:
Specify the name only, not the full path.
|
--flex-cachesim <cache-configuration> |
Use flexible cache simulation to model cache data for several target devices. The flexible cache simulation allows you to change a device for an analysis without recollecting data. By default, when no configuration is set, cache data is simulated for all supported target platforms. You can also specify a list of cache configurations separated with a forward slash in the format <size_of_level1>:<size_of_level2>:<size_of_level3>. For each memory level size, specify a unit of measure as b - bytes, k- kilobytes, or m - megabytes. For example, 8k:512k:8m/24k:1m:8m/32k:1536k:8m. |
--gpu (recommended) | --profile-gpu | --analyze-gpu-kernels-only |
Model performance only for code regions running on a GPU. Use one of the three options.
NOTE:
This is a preview feature. --analyze-gpu-kernels-only is deprecated and will be removed in futire releases.
|
--ignore <list> |
Specify a comma-separated list of runtimes or libraries to ignores time spent in regions from these runtimes and libraries when calculating per-program speedup.
NOTE:
This does not affect estimated speedup of individual offloads.
|
--include-to-report <items-to-include> |
Specify items to include to a report. Available items: memory_objects, sources, call_graph, dependencies, strides. By default, you can add the following items from the report:
Use this option if you want to add more data to the report or see that some data, for example, sources or memory objects, are missing from the report, though you collected this data.
NOTE:
This option affects only data shown in the HTML report and does not affect data collection.
|
-m [{all, generic, regions, omp, icpx -fsycl, daal, tbb}] --markup [{all, generic, regions, omp, icpx -fsycl, daal, tbb}] |
Mark up loops after survey or other data collection. Use this option to limit the scope of further collections by selecting loops according to a provided parameter:
omp, icpx -fsycl, or generic selects loops in the project so that the corresponding collection can be run without loop selection options. You can specify several parameters in a comma-separated list. Loops are selected if they fit any of specified parameters. |
--model-system-calls (default) | --no-model-system-calls |
Analyze regions with system calls inside. The actual presence of system calls inside a region may reduce model accuracy. |
--mpi-rank <mpi-rank> |
Specify a MPI rank to analyze if multiple ranks are analyzed. |
--no-cache-sources |
Disable keeping source code cache within a project. |
--no-cachesim |
Disable cache simulation during collection. The model assumes 100% hit rate for cache.
NOTE:
Usage decreases analysis overhead.
|
--no-profile-jit |
Disable JIT function analysis. |
--no-stacks |
Run data collection without collecting data distribution over stacks. You can use this option to reduce overhead at the potential expense of accuracy. |
-o <output-dir> --out-dir <output-dir> |
Specify the directory to put all generated files into. By default, results are saved in <advisor-project>/e<NNN>/pp<MMM>/data.0. If you specify an existing directory or absolute path, results are saved in specified directory. The new directory is created if it does not exist. If you only specify the directory <name>, results are stored in <advisor-project>/e<NNN>/pp<MMM>/<name>.
NOTE:
If you use this options, you might not be able to open the analysis results in the Intel Advisor GUI.
|
-p <output-name-prefix> --out-name-prefix <output-name-prefix> |
Specify a string to add to the beginning output result filenames.
NOTE:
If you use this options, you might not be able to open the analysis results in the Intel Advisor GUI.
|
--set-parameter <CLI-config> |
Specify a single-line configuration parameter to modify in a format "<group>.<parameter>=<new-value>". For example: "min_required_speed_up=0", "scale.Tiles_per_process=0.5". You can use this option more than once to modify several parameters. |
--track-heap-objects (default) | --no-track-heap-objects |
Deprecated. Use --track-memory-objects. |
--track-memory-objects (default) | --no-track-memory-objects |
Attribute heap-allocated objects to the analyzed loops that accessed the objects. Disable to decrease analysis overhead.
IMPORTANT:
Currently, this option affects only the analysis step.
|
--track-stack-accesses (default) | --no-track-stack-accesses |
Track accesses to stack memory.
IMPORTANT:
Currently, this option does not affect the collection.
|
Examples
Collect full data on myApplication, run analysis with default configuration, and save the project to the ./advi directory. The generated output is saved to the default advi/perfmodels/mNNNN directory.
advisor-python $APM/run_oa.py ./advi_results -- ./myApplication
Collect full data on myApplication, run analysis with default configuration, save the project to the ./advi directory, and save the generated output to the advi/perf_models/report directory.
advisor-python $APM/run_oa.py ./advi_results --out-dir report -- ./myApplication
Collect refinement data for SYCL code regions on myApplication, run analysis with a custom configuration file config.toml, and save the project to the ./advi directory. The generated output is saved to the default advi/perf_models/mNNNN directory.
advisor-python $APM/run_oa.py ./advi_results --collect refinement --markup icpx -fsycl --config ./config.toml -- ./myApplication