Intel® VTune™ Profiler

User Guide

ID 766319
Date 10/31/2024
Public
Document Table of Contents

Python* Code Analysis

Explore performance analysis options provided by the Intel® VTune™ Profiler for Python* applications to identify the most time-consuming code sections and critical call paths.

VTune Profiler supports the Hotspots, Threading, and Memory Consumption (Linux only) analyses for Python* applications through the Launch Application mode. For example, when your application does excessive numerical modeling, you need to know how effectively it uses available CPU resources. A good example of the effective CPU usage is when the calculating process spends most time executing native extension and not interpreting Python glue code.

To get the maximum performance out of your Python application, consider using native extensions, such as NumPy or writing and compiling performance critical modules of your Python project in native languages, such as C or even assembly. This will help your application take advantage of vectorization and make complete use of powerful CPU resources.

To analyze the Python code performance with the VTune Profiler and interpret data:

Configure Python Data Collection

Configure VTune Profiler through the GUI or command-line (vtune) interface to analyze the performance of your Python code.

In the GUI:

  1. Click the Configure Analysis button on the toolbar.

    The Configure Analysis window opens.

  2. Choose a target system and target type, like Local Host and Launch Application.

    NOTE:

    You can profile Windows* and Linux* target systems only.

  3. In the WHERE pane, provide these details:

    • In the Application field, enter the path to the installed Python interpreter.
    • In the Application parameters field, enter the path to your Python script.
      NOTE:

      If you specify a relative path to your Python script, VTune Profiler completely resolves the full function or method names for the imported modules only. The names inside the main script are not resolved. To avoid this, specify the absolute path to your Python script.

    • In the Advanced settings, in the Managed code profiling mode drop-down menu, select Auto. This way, VTune Profiler automatically detects the type of target executable (managed or native) and switches to the corresponding mode.
    • If necessary, select Analyze child processes to collect data on processes launched by the target process.

    NOTE:

    When you attach the VTune Profiler to the Python process, make sure you initialize the Global Interpreter Lock (GIL) inside your script before you start the analysis. If GIL is not initialized, the VTune Profiler collector initializes it only when a new Python function is called.

  4. If your Python application should run before you start profiling or if you cannot run the application at the start of the analysis, attach VTune Profiler to the Python process. To do this, in the WHAT pane, select the Attach to Process target type. Specify the Python process name or PID.

  5. In the HOW pane on the right, select the Hotspots, Threading, or Memory Consumption analysis type.

  6. If necessary, configure these options or use their default settings:

    User-Mode Sampling mode

    Select to enable the user-mode sampling and tracing collection for hotspots and call stack analysis (formerly known as Basic Hotspots). This collection mode uses a fixed sampling interval of 10ms. If you need to change the interval, click the Copy button and create a custom analysis configuration.

    Show additional performance insights check box

    Get additional performance insights, such as vectorization, and learn next steps. This option collects additional CPU events, which may enable the multiplexing mode.

    The option is enabled by default.

    Details button

    Expand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify or enable additional settings for the analysis, you need to create a custom configuration by copying an existing predefined configuration. VTune Profiler creates an editable copy of this analysis type configuration.

  7. Click Start to run the analysis.

Identifying Hot Spots

Hotspots analysis in the user-mode sampling mode helps identify sections of your Python code that take a long time to execute (hotspots), along with their timing metrics and call stacks. It also displays the workload distribution over threads in the Timeline pane.

By default, the VTune Profiler uses the Auto managed code profiling mode, that enables you to view and analyze mixed stacks for Python/C++ applications. In the example below, you can see a native hotspot Intel® oneAPI Math Kernel Library(oneMKL) function on the left pane. The mixed call stack analysis on the right pane reveals a Python black_scholes function that actually calls the hotspot function:

Double-click the black_scholes function on the Call Stack pane to open the source view on call site line 66:

To view call stacks only inside your Python code, filter out Python core and system functions by selecting Only user functions option for the Call Stack Mode on the filter bar.

Python Code Profiling Considerations

  • Profiling support exists for Python distribution 2.6 and newer versions.

  • If you use Python extensions that compile Python code to the native language (JIT, C/C++), VTune Profiler may show incorrect analysis results. Consider using JIT Profiling API to solve this problem.

  • You can profile Python code on Windows and Linux target systems.

  • In some cases, VTune Profiler may not resolve full names of Python functions and modules on Windows OS. However, the source information displays properly. You can view the source directly from viewpoints in VTune Profiler.

  • The Timeline pane does not always display proper thread names.

  • If your application has very low stack depth, which includes called functions and imported modules, the VTune Profiler does not collect Python data. Consider using deeper calls to enable the profiling.

  • When collecting data remotely, VTune Profiler may not resolve full function or method names, and display the source code of your Python script. To solve this problem for Linux targets, copy the source files to a directory on your host system with a path identical to the path on your target system before running the analysis.