Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

ID 767253
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Interprocedural Optimization

Interprocedural Optimization (IPO) is an automatic, multi-step process that allows the compiler to analyze your code to determine where you can benefit from specific optimizations.

The compiler may apply the following optimizations:

  • Address-taken analysis

  • Array dimension padding

  • Alias analysis

  • Automatic array transposition

  • Automatic memory pool formation

  • C++ class hierarchy analysis

  • Common block variable coalescing

  • Common block splitting

  • Constant propagation

  • Dead call deletion

  • Dead formal argument elimination

  • Dead function elimination

  • Formal parameter alignment analysis

  • Forward substitution

  • Indirect call conversion

  • Inlining

  • Mod/ref analysis

  • Partial dead call elimination

  • Passing arguments in registers to optimize calls and register usage

  • Points-to analysis

  • Routine key-attribute propagation

  • Specialization

  • Stack frame alignment

  • Structure splitting and field reordering

  • Symbol table data promotion

  • Un-referenced variable removal

  • Whole program analysis

IPO Compilation Models

IPO supports two compilation models: single-file compilation and multi-file compilation.

The compiler performs some single-file interprocedural optimization at the O2 default optimization level; additionally the compiler may perform some inlining for the O1 optimization level, such as inlining functions marked with inlining pragmas or attributes (GNU C and C++) and C++ class member functions with bodies included in the class declaration.

Multi-file compilation uses the [Q]ipo option, and results in one or more mock object files rather than normal object files. (See the Compilation section below for information about mock object files.) Additionally, the compiler collects information from the individual source files that make up the program. Using this information, the compiler performs optimizations across functions and procedures in different source files.

Compile with IPO

As each source file is compiled with IPO, the compiler stores an intermediate representation (IR) of the source code in a mock object file. The mock object files contain the IR instead of the normal object code. Mock object files can be ten times or more larger than the size of normal object files.

During the IPO compilation phase only the mock object files are visible.

Link with IPO

When you link with the [Q]ipo compiler option the compiler is invoked a final time. The compiler performs IPO across all mock object files. The mock objects must be linked with the compiler or by using LLVM linking tools. While linking with IPO, the compiler and other linking tools compile mock object files as well as invoke the real/true object files linkers provided on the user's platform.

NOTE:

Starting with version 2024.0, options specified with the Clang -mllvm flag are no longer passed through to linker option processing. Instead, use the -Wl option to pass options to the linker. For example, to pass the -lto-debug-options option to the linker, use:

-Wl,-plugin-opt,-lto-debug-options

Whole Program Analysis

The compiler supports a large number of IPO optimizations that can be applied or have its effectiveness greatly increased when the whole program condition is satisfied.

During the analysis process, the compiler reads all Intermediate Representation (IR) in the mock file, object files, and library files to determine if all references are resolved and whether or not a given symbol is defined in a mock object file. Symbols that are included in the IR in a mock object file for both data and functions are candidates for manipulation based on the results of whole program analysis.

There are two types of whole program analysis - object reader method and table method. Most optimizations can be applied if either type of whole program analysis determines that the whole program conditions exists; however, some optimizations require the results of the object reader method, and some optimizations require the results of table method.

Object reader method

In the object reader method, the object reader emulates the behavior of the native linker and attempts to resolve the symbols in the application. If all symbols are resolved, the whole program condition is satisfied. This type of whole program analysis is more likely to detect the whole program condition.

Table method

In the table method the compiler analyzes the mock object files and generates a call-graph.

The compiler contains detailed tables about all of the functions for all important language-specific libraries, like libc. In this second method, the compiler constructs a call-graph for the application. The compiler then compares the function table and application call-graph. For each unresolved function in the call-graph, the compiler attempts to resolve the calls by finding an entry for each unresolved function in the compiler tables. If the compiler can resolve the functions call, the whole program condition exists.