FPGA-CPU Interaction

Intel® oneAPI Programming Guide

Download PDF

ID 771723

Date 7/14/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

FPGA-CPU Interaction

One of the main influences on the overall performance of an FPGA design is how kernels executing on the FPGA interact with the host on the CPU.

Host and Kernel Interaction

FPGA devices typically communicate with the host (CPU) via PCIe.

FPGA Device Communication with the Host

This is an important factor influencing the performance of SYCL* programs targeting FPGAs. Furthermore, the first time you run a particular SYCL program, you must configure the FPGA with its hardware bitstream, and this may require several seconds.

Data Transfer

Typically, the FPGA board has its own private Double Data Rate (DDR) memory on which it primarily operates. The CPU must bulk transfer or direct memory access (DMA) all data that the kernel needs to access into the FPGA’s local DDR memory. After the kernel completes its operations, results must be transferred over DMA back to the CPU. The transfer speed is bound by the PCIe link itself and the efficiency of the DMA solution. For example, the Intel® PAC with Intel® Arria® 10 GX FPGA has a PCIe Gen 3 x 8 link, and transfers are typically limited to 6-7 GB/s.

The following are the techniques to manage these data transfer times:

SYCL allows buffers to be tagged as read-only or write-only, which eliminates some unnecessary transfers.
Improve the overall system efficiency by maximizing the number of concurrent operations. Since PCIe supports simultaneous transfers in opposite directions and PCIe transfers do not interfere with kernel execution, you can apply techniques such as double buffering. Refer to the Double Buffering Host Utilizing Kernel Invocation Queue topic in the FPGA Optimization Guide for Intel® oneAPI Toolkits and the double_buffering tutorial for additional information about these techniques.
Improve data transfer throughput by prepinning system memory on board variants that support Restricted USM. Refer to the Prepinning topic in the FPGA Optimization Guide for Intel® oneAPI Toolkits for additional information.

Configuration Time

You must program the hardware bitstream on the FPGA device in a process called configuration. Configuration is a lengthy operation requiring several seconds of communication with the FPGA device. The SYCL runtime manages configuration for you automatically. The runtime decides when the configuration occurs. For example, the configuration might be triggered when a kernel is first launched, but subsequent launches of the same kernel may not trigger configuration since the bitstream has not changed. Therefore, during development, Intel® recommends to time the execution of the kernel after the FPGA has been configured, for example, by performing a warm-up execution of the kernel before timing kernel execution. You must remove this warm-up execution in the production code.

Multiple Kernel Invocations

If a SYCL program submits the same kernel to a SYCL queue multiple times (for example, by calling single_task within a loop), only one kernel invocation is active at a time. Each subsequent invocation of the kernel waits for the previous run of the kernel to complete.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Programming Guide

FPGA-CPU Interaction

Host and Kernel Interaction

Data Transfer

Configuration Time

Multiple Kernel Invocations