Visible to Intel only — GUID: GUID-04759B2C-C1ED-49F9-AFBB-CEB273C07B06
Visible to Intel only — GUID: GUID-04759B2C-C1ED-49F9-AFBB-CEB273C07B06
Split Kernel into Multiple FPGA Images (Linux only)
Use this feature of the Intel® oneAPI DPC++/C++ Compiler when you want to split your FPGA compilation into different FPGA images. This feature is particularly useful when your design does not fit on a single FPGA. You can use it to split your very large design into multiple smaller images, which you can use to partially reconfigure your FPGA device.
You can split your design using one of the following approaches, each giving you different benefits:
Dynamic Linking Flow
Dynamic Loading Flow
Between the two flows, dynamic linking is easier to implement than dynamic loading. However, dynamic linking can require more memory on the host device as all of the device images must be loaded into memory. Dynamic loading addresses these limitations but introduces the need for some extra source-level changes. The following comparison table highlights the differences between the flows:
Dynamic Linking |
Dynamic Loading |
|
---|---|---|
Can dynamically change FPGA Image at runtime? |
Yes |
Yes |
Defining the type and number of FPGA images |
At compile time |
At runtime |
Host-program memory footprint |
All FPGA images are stored in memory at runtime. |
Only explicitly loaded FPGA images are stored in memory. |
Calling host code |
Call function in the dynamic library directly. |
Explicitly load the dynamic library and functions to call. |
Dynamic Linking Flow
This flow allows you to split your design into different source files and map them into a separate FPGA image. Intel® recommends this flow for designs with a small number of FPGA images.
To use this flow, perform the following steps:
- Split your source code such that for each FPGA image you want, you create a separate .cpp file that submits various kernels. Separate the host code into one or more .cpp files that can then interface with functions in the kernel files.
Consider that you now have the following three files:
- main.cpp containing your host code. For example:
// main.cpp int main() { queue queueA; add(queueA); mul(queueA); }
- vector_add.cpp containing a function that submits the vector_add kernel. For example:
// vector_add.cpp extern "C"{ void add(queue queueA) { queue.submit( // Kernel Code ); } }
- vector_mul.cpp containing a function that submits the vector_mul kernel. For example:
// vector_mul.cpp extern "C"{ void mul(queue queueA) { queue.submit( // Kernel Code ); } }
- main.cpp containing your host code. For example:
- Compile the source files using the following commands:
icpx -fsycl -fPIC -fintelfpga -c vector_add.cpp -o vector_add.o icpx -fsycl -fPIC -fintelfpga -c vector_mul.cpp -o vector_mul.o // FPGA image compiles take a long time to complete icpx -fsycl -fPIC -shared -fintelfpga vector_add.o -o vector_add.so \ -Xshardware -Xstarget=<bsp:board_variant> icpx -fsycl -fPIC -shared -fintelfpga vector_mul.o -o vector_mul.so \ -Xshardware -Xstarget=<bsp:board_variant> // Final link step icpx -fsycl -o main.exe main.cpp vector_add.so vector_mul.so
With this flow, the long FPGA compile steps are split into separate commands that you can potentially run on different systems or only when you change the files.
Dynamic Loading Flow
Use this flow to avoid loading all of the different FPGA images into memory at once. Similar to dynamic linking flow, this flow also requires you to split your code. However, for this flow, you must load the .so (shared object) files in the host program. The advantage of this flow is that you can load large FPGA image files dynamically as necessary instead of linking all image files at compile time.
To use this flow, perform the following steps:
- Split your source code in the same manner as done in step 1 of the dynamic linking flow.
- Modify the main.cpp file to appear as follows:
// main.cpp #include <dlfcn.h> int main() { queue queueA; bool runAdd, runMul; // Assuming runAdd and runMul are set dynamically at runtime if (runAdd) { auto add_lib = dlopen("./vector_add.so", RTLD_NOW); auto add = (void (*)(queue))dlsym(add_lib, "add"); add(queueA); } if (runMul) { auto mul_lib = dlopen("./vector_mul.so", RTLD_NOW); auto mul = (void (*)(queue))dlsym(mul_lib, "mul"); mul(queueA); } }
- Compile the source files using the following commands:
NOTE:You do not have to link the .so files at compile time because they are loaded dynamically at runtime.
icpx -fsycl -fPIC -fintelfpga -c vector_add.cpp -o vector_add.o icpx -fsycl -fPIC -fintelfpga -c vector_mul.cpp -o vector_mul.o // FPGA Image compiles take a long time to complete icpx -fsycl -fPIC -shared -fintelfpga vector_add.o -o vector_add.so \ -Xshardware -Xstarget=<bsp:board_variant> icpx -fsycl -fPIC -shared -fintelfpga vector_mul.o -o vector_mul.so \ -Xshardware -Xstarget=<bsp:board_variant> icpx -fsycl -o main.exe main.cpp // Before running the design, add the path containing the .so files to LD_LIBRARY_PATH // e.g., export LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH
With this approach, you can arbitrarily load many .so files at runtime. This is useful when you have a large library of FPGA images, and you want to select a subset of files from it.