Intel® VTune™ Profiler Performance Analysis Cookbook

ID 766316
Date 9/05/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Analyzing CPU and FPGA (Intel® Arria® 10 GX) Interaction

This recipe instructs you how to configure your platform to analyze an interaction of your CPU and FPGA, using Intel® Arria 10 GX FPGA as an example.

Ingredients

This section lists the hardware and software tools used for the performance analysis scenario.

  • Application: Matrix Multiplication OpenCL™ application. The Matrix Multiplication sample application is available for download from the Intel® FPGA SDK for OpenCL™ website
  • Tools: Intel® FPGA SDK for OpenCL™, Intel® VTune™ Amplifier 2019 or higher

    NOTE:
    • Starting with the 2020 release, Intel® VTune™ Amplifier has been renamed to Intel® VTune™ Profiler.

    • Most recipes in the Intel® VTune™ Profiler Performance Analysis Cookbook are flexible. You can apply them to different versions of Intel® VTune™ Profiler. In some cases, minor adjustments may be required.

    • Get the latest version of Intel® VTune™ Profiler:

  • Operating System: CentOS* 7, Red Hat* Enterprise Linux 7 or higher
  • CPU: Intel® server platform code named Skylake
  • FPGA: Intel® Arria® 10 GX

Configure the Intel® Arria® 10 GX FPGA and Intel® FPGA SDK for OpenCL™

  1. On your Intel Arria 10 GX FPGA, set up the DIP switches and connect the power and USB cables. See detailed instructions.

  2. Download Intel® FPGA SDK for OpenCL™ (includes CodeBuilder, Quartus Prime software and devices) from FPGA Software Download Center.

  3. Run the setup_pro.sh file to install the SDK.

  4. Run source init_opencl.sh to set the appropriate environment variables.

  5. Run aocl version to verify the installation. The output should look similar to the following:

    aocl 17.1.0.240 (Intel(R) FPGA SDK for OpenCL(TM), Version 17.1.0 Build 240, Copyright (C) 2017 Intel Corporation)

  6. Run aocl install to install the FPGA board.

  7. Run aocl diagnose to verify the hardware installation. The output should look similar to the following:

    Device Name:
    acl0
    
    Package Pat:
    /home/tce/intelFPGA_pro/17.1/hld/board/a10_ref
    
    Vendor: Intel(R) Corporation
    
    Phys Dev Name  Status   Information
    
    acla10_ref0   Passed   Arria 10 Reference Platform (acla10_ref0)
                       	PCIe dev_id = 2494, bus:slot.func = 44:00.00, Gen3 x4
                       	FPGA temperature = 44.3555 degrees C.
    
    DIAGNOSTIC_PASSED
    

Build the Sample Application and Flash to the FPGA

  1. Run make with the default makefile to build the host executable. The executable output filename is host.

  2. Build the binary for the FPGA using the following command:

    aoc -v -board=a10gx device/matrix_mult.cl -o bin/ matrix_mult.aocx
  3. Set up the USB driver to flash.

    1. Run the following command:

      sudo vim /etc/udev/rules.d/51-usbblaster.rules
    2. Add the following lines:

      # usb blaster
      SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6001", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c"
      SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6002", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c"
      SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6003", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c"
      SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6010", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c"
      SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6810", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c"
      
  4. Lower the JTAG clock speed to 6 MHz using the following command:

    jtagconfig --setparam 1 JtagClock 6M
  5. Flash the binary to the FPGA using the following command:

    aocl flash acl0 ./bin/matrix_mult.aocx
  6. Reboot the host system with the FPGA.

Run CPU/FPGA Interaction Analysis

  1. Launch the VTune Amplifier. For example:

    /opt/intel/vtune_amplifier_2019/bin64/amplxe-gui
  2. Create a project for your analysis, for example: hello_world_opencl.

  3. Click Configure Analysis to start a new analysis.

  4. Set up the CPU/FPGA Interaction analysis.

    Configure Analysis window showing matrix multiply file path

    1. In the WHERE pane, select Local Host.

    2. In the WHAT pane, select Launch Application and browse to the hello world application. Typically the application can be found under <sample app>/bin/host.

    3. In the HOW pane, select CPU/FPGA Interaction from the available analysis types.

  5. Click Start to begin the analysis.

Interpret Results

After data collection completes, the results are finalized and shown in the CPU/FPGA Interaction viewpoint. Start with the Summary tab to view the FPGA top compute tasks and well as the top tasks and hotspots for the CPU.

Summary window showing CPU/FPGA Interaction viewpoint with Top Hotspots and FPGA Top Compute lists

Switch to the Bottom-up tab to review the work size of a compute task and data transfer throughput. Use the timeline pane to review the FPGA utilization for compute and transfer.

Bottom-up tab of CPU/FPGA Interaction viewpoint showing timeline of FPGA utilization

Use the Platform tab to check the computing queue for the FPGA and host application. You can also find the start time and duration of each transfer and synchronization.

Platform tab of CPU/FPGA Interaction viewpoint showing computing queue, tread, and FPGA utilization timelines