Visible to Intel only — GUID: GUID-C9900300-457A-4D77-9120-21ACA55345F7
Visible to Intel only — GUID: GUID-C9900300-457A-4D77-9120-21ACA55345F7
advisor Command Option Reference
The advisor command currently supports the options shown below.
Option |
Description |
---|---|
Set an accuracy level for the Offload Modeling collection preset. |
|
Add loops (by file and line number) to the loops selected for deeper analysis. |
|
Specify the directory where the target application runs during analysis, if it is different from the current working directory. |
|
Assume that a loop has dependencies if the loop dependency type is unknown. |
|
Estimate invocation taxes assuming the invocation tax is paid only for the first kernel launch. |
|
When searching for an optimal N-dimensional offload, assume there are dependencies between inner and outer loops. |
|
Assume data is only transferred once for each offload, and all instances share that data. |
|
Finalize Survey and Trip Counts & FLOP analysis data after collection is complete. |
|
Emulate the execution of more than one instance simultaneously for a top-level offload. |
|
Run benchmarks on only one concurrently executing Intel Advisor instance to avoid concurrency issues with regard to platform limits. |
|
Generate a Survey report in bottom-up view. |
|
Enable binary visibility in a read-only snapshot you can view any time. |
|
Select what binary files will be added to a read-only snapshot. |
|
Set the cache hierarchy to collect modeling data for CPU cache behavior during Trip Counts & FLOP analysis. |
|
Simulate device cache behavior for your application. |
|
Enable source code visibility in a read-only snapshot you can view any time (with the --snapshot action). Enable keeping source code cache within a project (with the --collect action). |
|
Enable cache simulation for Performance Modeling. |
|
Set the cache associativity for modeling CPU cache behavior during the Memory Access Patterns analysis. |
|
Set the cache line size (in bytes) for modeling CPU cache behavior during Memory Access Patterns analysis. |
|
Set the focus for modeling CPU cache behavior during Memory Access Patterns analysis. |
|
Specify what percentage of total memory accesses should be processed during cache simulation. |
|
Set the cache set size (in bytes) for modeling CPU cache behavior during Memory Access Patterns analysis. |
|
Check the profitability of offload regions and add only profitable regions to a report. |
|
Clear all loops previously selected for deeper analysis. |
|
Specify a device configuration to model your application performance for. |
|
Use the projection of x86 logical instructions to GPU logical instructions. |
|
Project x86 memory instructions to GPU SEND/SENDS instructions. |
|
Count the number of accesses to memory objects created by code regions. |
|
Project x86 MOV instructions to GPU MOV instructions. |
|
Select how to model SEND instruction latency. |
|
Specify a scale factor to approximate a host CPU that is faster than the baseline CPU by this factor. |
|
Set the delimiter for a report in CSV format. |
|
Specify the ablosute path or name for a custom TOML configuration file with additional modeling parameters. |
|
Limit the maximum amount (in MB) of raw data collected during Survey analysis. |
|
Analyze potential data reuse between code regions. |
|
Set the level of details for modeling data transfers during Characterization. |
|
Estimate data transfers in details and latencies for each transferred object. |
|
Specify memory page size to set the traffic measurement granularity for the data transfer simulator. |
|
Show only floating-point data, only integer data, or data for the sum of both data types in a Roofline interactive HTML report. |
|
Remove previously collected trip counts data when re-running a Survey analysis with changed binaries. |
|
Do not account for optimized traffic for transcendentals on a GPU. |
|
Show a callstack for each loop/function call in a report. |
|
List all steps included in Offload Modeling batch collection at a specified accuracy level without running them. |
|
Specify the maximum amount of time (in seconds) an analysis runs. |
|
Show (in a Survey report) how many instructions of a given type actually executed during Trip Counts & FLOP analysis. |
|
enable-batching |
Deprecated. |
Model CPU cache behavior on your target application. |
|
Model data transfer between host memory and device memory. |
|
Enable a simulator to model GRF. |
|
enable-slm |
Deprecated. SLM is modeled by default if available. |
Examine specified annotated sites for opportunities to perform task-chunking modeling in a Suitability report. |
|
Use the same local size and SIMD width as measured on a baseline device. |
|
Emulate data distribution over stacks if stacks collection is disabled. |
|
Offload all selected code regions even if offloading their child loops/functions is more profitable. |
|
Estimate region speedup with relaxed constraints. |
|
Consider loops recommended for offloading only if they reach the minimum estimated speedup specified in a configuration file. |
|
Exclude the specified files or directories from annotation scanning during analysis. |
|
Specify an application for analysis that is not the starting application. |
|
Specify a path to an unpacked result snapshot or an MPI rank result to generate a report or model performance. |
|
Filter data by the specified column name and value in a Survey and Trips Counts & FLOP report. |
|
Enable filtering detected stack variables by scope (warning vs. error) in a Dependencies analysis. |
|
Mark all potential reductions by specific diagnostic during Dependencies analysis. |
|
Enable flexible cache simulation to change cache configuration without re-running collection. |
|
Collect data about floating-point and integer operations, memory traffic, and mask utilization metrics for AVX-512 platforms during Trip Counts & FLOP analysis. |
|
Consider all arithmetic operations as single-precision floating-point or int32 operations. |
|
Consider all arithmetic operations as double-precision floating-point or int64 operations. |
|
Set a report output format. |
|
With Offload Modeling perspective, analyze OpenCL™ and oneAPI Level Zero programs running on Intel® Graphics. With GPU Roofline Insights perspective. create a Roofline interactive HTML report for data collected on GPUs. |
|
Collect memory traffic generated by OpenCL™ and Intel® Media SDK programs executed on Intel® Processor Graphics. |
|
gpu-kernels |
Deprecated. Use --profile-gpu or --gpu instead. |
Specify time interval, in milliseconds, between GPU samples during Survey analysis. |
|
Disable data transfer tax estimation. |
|
Specify runtimes or libraries to ignore time spent in these regions when calculating per-program speedup. |
|
Ignore mismatched target or application parameter errors before starting analysis. |
|
Ignore mismatched module checksums before starting analysis. |
|
Analyze the Nth child process during Memory Access Patterns and Dependencies analysis. |
|
Model traffic on all levels of the memory hierarchy for a Roofline report. |
|
Set the length of time (in milliseconds) to wait before collecting each sample during Survey analysis. |
|
Set the maximum number of top items to show in a report. |
|
Set the maximum number of instances to analyze for all marked loops. |
|
Specify total time, in milliseconds, to filter out loops that fall below this value. |
|
Select loops (by criteria instead of human input) for deeper analysis. |
|
Enable/disable user selection as a way to control loops/functions identified for deeper analysis. |
|
After running a Survey analysis and identifying loops of interest, select loops (by file and line number or ID) for deeper analysis. |
|
Model specific memory level(s) in a Roofline interactive HTML report, including L1, L2, L3, and DRAM. |
|
Model only load memory operations, store memory operations, or both, in a Roofline interactive HTML report. |
|
Show dynamic or static instruction mix data in a Survey report. |
|
Collect Intel® oneAPI Math Kernel Library (oneMKL) loops and functions data during the Survey analysis. |
|
Use the baseline GPU configuration as a target device for modeling. |
|
Analyze child loops of the region head to find if some of the child loops provide more profitable offload. |
|
Model calls to math functions such as EXP, LOG, SIN, and COS as extended math instructions, if possible. |
|
Analyze code regions with system calls considering they are separated from offload code and executed on a host device. |
|
Specify application (or child application) module(s) to include in or exclude from analysis. |
|
Limit, by inclusion or exclusion, application (or child application) module(s) for analysis. |
|
Specify MPI process data to import. |
|
Set the Microsoft* runtime environment mode for analysis. |
|
When searching for an optimal N-dimensional offload, limit the maximum loop depth that can be converted to one offload. |
|
Specify a text file containing command line arguments. |
|
Enable asynchronous execution to overlap offload overhead with execution time. |
|
Pack a snapshot into an archive. |
|
Analyze OpenCL™ and oneAPI Level Zero programs running on Intel® Processor Graphics. |
|
Show Intel® performance libraries loops and functions in Intel® Advisor reports. |
|
Collect metrics about Just-In-Time (JIT) generated code regions during the Trip Counts and FLOP analysis. |
|
Collect Python* loop and function data during Survey analysis. |
|
Collect metrics for stripped binaries. |
|
Specify the top-level directory where a result is saved if you want to save the collection somewhere other than the current working directory. |
|
Minimize status messages during command execution. |
|
Recalculate total time after filtering a report. |
|
Enable heap allocation tracking to identify heap-allocated variables for which access strides are detected during Memory Access Patterns analysis. |
|
Capture stack frame pointers to identify stack variables for which access strides are detected during Memory Access Patterns analysis. |
|
Examine specified annotated sites for opportunities to reduce lock contention or find deadlocks in a Suitability report. |
|
Examine specified annotated sites for opportunities to reduce lock overhead in a Suitability report. |
|
Examine specified annotated sites for opportunities to reduce site overhead in a Suitability report. |
|
Examine specified annotated sites for opportunities to reduce task overhead in a Suitability report. |
|
Refinalize a survey result collected with a previous Intel® Advisor version or if you need to correct or update source and binary search paths. |
|
Remove loops (by file and line number) from the loops selected for deeper analysis. |
|
Redirect report output from stdout to another location. |
|
Specify the PATH/name of a custom report template file. |
|
Specify a directory to identify the running analysis. |
|
Resume collection after the specified number of milliseconds. |
|
Return the target exit code instead of the command line interface exit code. |
|
Specify the location(s) for finding target support files. |
|
Enable searching for an optimal N-dimensional offload. |
|
Select loops (by file and line number, ID, or criteria) for deeper analysis. |
|
Assume loops with specified IDs or source locations have a dependency. |
|
Assume loops with specified IDs or source locations are parallel. |
|
Specify a single-line parameter to modify in a target device configuration. |
|
Show data for all available columns in a Survey report. |
|
Show data for all available rows, including data for child loops, in a Survey report. |
|
Show only functions in a report. |
|
Show only loops in a report. |
|
Show not-executed child loops in a Survey report. |
|
Generate a Survey report for data collected for GPU kernels. |
|
Specify the total time threshold, in milliseconds, to filter out nodes that fall below this value from PDF and DOT Offload Modeling reports. |
|
Sort data in ascending order (by specified column name) in a report. |
|
Sort data in descending order (by specified column name) in a report. |
|
Register flow analysis to calculate the number of consecutive load/store operations in registers and related memory traffic in bytes during Survey analysis. |
|
Specify stack access size to set stack memory access measurement granularity for the data transfer simulation. |
|
Restructure the call flow during Survey analysis to attach stacks to a point introducing a parallel workload. |
|
Set stack size limit for analyzing stacks after collection. |
|
Perform advanced collection of callstack data during Roofline and Trip Counts & FLOP analysis. |
|
Choose between online and offline modes to analyze stacks during Survey analysis. |
|
Start executing the target application for analysis purposes, but delay data collection. |
|
Statically calculate the number of specific instructions present in the binary during Survey analysis. |
|
Specify processes and/or children for instrumentation during Survey analysis. |
|
Collect a variety of data during Survey analysis for loops that reside in non-executed code paths. |
|
Specify a device configuration to model cache for during Trip Counts collection. |
|
Specify a target GPU to collect data for if you have multiple GPUs connected to your system. |
|
Attach Survey or Trip Counts & FLOP collection to a running process specified by the process ID. |
|
Attach Survey or Trip Counts & FLOP collection to a running process specified by the process name. |
|
Specify the hardware configuration to use for modeling purposes in a Suitability report. |
|
Specify the threading model to use for modeling purposes in a Suitability report. |
|
Specify the number of parallel threads to use for offload heads. |
|
Generate a Survey report in top-down view. |
|
Set how to trace loop iterations during Memory Access Patterns analysis. |
|
Configure collectors to trace MPI code and determine MPI rank IDs for non-Intel® MPI library implementations. |
|
Attribute memory objects to the analyzed loops that accessed the objects. |
|
Track accesses to stack memory. |
|
Enable parallel data sharing analysis for stack variables during Dependencies analysis. |
|
Collect loop trip counts data during Trip Counts & FLOP analysis. |
|
use-collect-configs |
Deprecated. |
user-data-dir |
Deprecated. |
Maximize status messages during command execution. |
|
Show call stack data in a Roofline interactive HTML report (if call stack data is collected). |