Split Kernel into Multiple FPGA Images (Linux only)

Developer Guide

Intel® oneAPI DPC++/C++ Compiler Handbook for FPGAs

Download PDF

ID 785441

Date 10/24/2024

Version

Public

Split Kernel into Multiple FPGA Images (Linux only)

Use this feature of the Intel® oneAPI DPC++/C++ Compiler when you want to split your FPGA compilation into different FPGA images. This feature is particularly useful when your design does not fit on a single FPGA. You can use it to split your very large design into multiple smaller images, which you can use to partially reconfigure your FPGA device.

You can split your design using one of the following approaches, each giving you different benefits:

Dynamic Linking Flow
Dynamic Loading Flow

Between the two flows, dynamic linking is easier to implement than dynamic loading. However, dynamic linking can require more memory on the host device as all of the device images must be loaded into memory. Dynamic loading addresses these limitations but introduces the need for some extra source-level changes. The following comparison table highlights the differences between the flows:

Dynamic Linking vs. Dynamic Loading Flow
	Dynamic Linking	Dynamic Loading
Can dynamically change FPGA Image at runtime?	Yes	Yes
Defining the type and number of FPGA images	At compile time	At runtime
Host-program memory footprint	All FPGA images are stored in memory at runtime.	Only explicitly loaded FPGA images are stored in memory.
Calling host code	Call function in the dynamic library directly.	Explicitly load the dynamic library and functions to call.

Dynamic Linking Flow

This flow allows you to split your design into different source files and map them into a separate FPGA image. Intel® recommends this flow for designs with a small number of FPGA images.

To use this flow, perform the following steps:

Split your source code such that for each FPGA image you want, you create a separate .cpp file that submits various kernels. Separate the host code into one or more .cpp files that can then interface with functions in the kernel files.
Consider that you now have the following three files:
- main.cpp containing your host code. For example:
```
// main.cpp
int main() {
  queue queueA;
  add(queueA);
  mul(queueA);
}
```
- vector_add.cpp containing a function that submits the vector_add kernel. For example:
```
// vector_add.cpp
extern "C"{
  void add(queue queueA) {
    queue.submit(
      // Kernel Code
    );
  }
}
```
- vector_mul.cpp containing a function that submits the vector_mul kernel. For example:
```
// vector_mul.cpp
extern "C"{
  void mul(queue queueA) {
    queue.submit(
      // Kernel Code
    );
  }
} 
```

Compile the source files using the following commands:


icpx -fPIC -fintelfpga -c vector_add.cpp -o vector_add.o
icpx -fPIC -fintelfpga -c vector_mul.cpp -o vector_mul.o

// FPGA image compiles take a long time to complete
icpx -fPIC -shared -fintelfpga vector_add.o -o vector_add.so \
     -Xshardware -Xstarget=<bsp:board_variant>
icpx -fPIC -shared -fintelfpga vector_mul.o -o vector_mul.so \
     -Xshardware -Xstarget=<bsp:board_variant>

// Final link step
icpx -o main.exe main.cpp vector_add.so vector_mul.so

With this flow, the long FPGA compile steps are split into separate commands that you can potentially run on different systems or only when you change the files.

Dynamic Loading Flow

Use this flow to avoid loading all of the different FPGA images into memory at once. Similar to dynamic linking flow, this flow also requires you to split your code. However, for this flow, you must load the .so (shared object) files in the host program. The advantage of this flow is that you can load large FPGA image files dynamically as necessary instead of linking all image files at compile time.

To use this flow, perform the following steps:

Split your source code in the same manner as done in step 1 of the dynamic linking flow.

Modify the main.cpp file to appear as follows:


// main.cpp
#include <dlfcn.h>


int main() {
  queue queueA;
  bool runAdd, runMul;
  // Assuming runAdd and runMul are set dynamically at runtime
  if (runAdd) {
    auto add_lib = dlopen("./vector_add.so", RTLD_NOW);
    auto add = (void (*)(queue))dlsym(add_lib, "add");
    add(queueA);
  }
  if (runMul) {
    auto mul_lib = dlopen("./vector_mul.so", RTLD_NOW);
    auto mul = (void (*)(queue))dlsym(mul_lib, "mul");
    mul(queueA);
  }
}

Compile the source files using the following commands:

NOTE:

You do not have to link the .so files at compile time because they are loaded dynamically at runtime.


icpx -fPIC -fintelfpga -c vector_add.cpp -o vector_add.o
icpx -fPIC -fintelfpga -c vector_mul.cpp -o vector_mul.o

// FPGA Image compiles take a long time to complete
icpx -fPIC -shared -fintelfpga vector_add.o -o vector_add.so \
     -Xshardware -Xstarget=<bsp:board_variant>
icpx -fPIC -shared -fintelfpga vector_mul.o -o vector_mul.so \
     -Xshardware -Xstarget=<bsp:board_variant>

icpx -o main.exe main.cpp
// Before running the design, add the path containing the .so files to LD_LIBRARY_PATH
// e.g., export LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH

With this approach, you can arbitrarily load many .so files at runtime. This is useful when you have a large library of FPGA images, and you want to select a subset of files from it.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI DPC++/C++ Compiler Handbook for FPGAs

Split Kernel into Multiple FPGA Images (Linux only)

Dynamic Linking Flow

Dynamic Loading Flow