Visible to Intel only — GUID: GUID-4377C142-6E9D-4A2B-9A92-FCEF4DD7C90D
Why is FPGA Compilation Different?
Types of SYCL* FPGA Compilation
FPGA Compilation Flags
Emulate and Debug Your Design
Evaluate Your Kernel Through Simulation
Device Selectors for FPGA
FPGA IP Authoring Flow
Fast Recompile for FPGA
Generate Multiple FPGA Images (Linux only)
FPGA BSPs and Boards
Targeting Multiple Homogeneous FPGA Devices
Targeting Multiple Platforms
FPGA-CPU Interaction
FPGA Performance Optimization
Use of RTL Libraries for FPGA
Use SYCL Shared Library With Third-Party Applications
FPGA Workflows in IDEs
Customize RTL Interfaces
Suggested Coding Styles
Lambda Coding Style Example
Functor Coding Style Example
Memory-Mapped Host Interfaces
Host Pipes
Agent IP Component Kernels
Streaming IP Component Kernels
Kernel Argument Interfaces
Pipelined Kernels
Stable Arguments
IP Component Reset Behavior
The printf Command
Intel oneAPI DPC++ Library (oneDPL)
Intel oneAPI Math Kernel Library (oneMKL)
Intel oneAPI Threading Building Blocks (oneTBB)
Intel oneAPI Data Analytics Library (oneDAL)
Intel oneAPI Collective Communications Library (oneCCL)
Intel oneAPI Deep Neural Network Library (oneDNN)
Intel oneAPI Video Processing Library (oneVPL)
Other Libraries
Visible to Intel only — GUID: GUID-4377C142-6E9D-4A2B-9A92-FCEF4DD7C90D
Suggested Coding Styles
For creating your IP, use one of the following recommended general coding styles:
Lambda Coding Style Example: The lambda coding style is typically used in most full-system SYCL programs.
Functor Coding Style Example: You can write your IP component (kernel) code out-of-line from the host code with the functor coding style.
Lambda Coding Style Example
#include <iostream>
#include <vector>
// oneAPI headers
#include <sycl/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>
using namespace sycl;
// Forward declare the kernel name in the global scope. This is an FPGA best
// practice that reduces name mangling in the optimization reports.
class VectorAddID;
void VectorAdd(const int *vec_a_in, const int *vec_b_in, int *vec_c_out,
int len) {
for (int idx = 0; idx < len; idx++) {
int a_val = vec_a_in[idx];
int b_val = vec_b_in[idx];
int sum = a_val + b_val;
vec_c_out[idx] = sum;
}
}
constexpr int kVectSize = 256;
int main() {
bool passed = true;
try {
// Use compile-time macros to select either:
// - the FPGA emulator device (CPU emulation of the FPGA)
// - the FPGA device (a real FPGA)
// - the simulator device
#if FPGA_SIMULATOR
auto selector = sycl::ext::intel::fpga_simulator_selector_v;
#elif FPGA_HARDWARE
auto selector = sycl::ext::intel::fpga_selector_v;
#else // #if FPGA_EMULATOR
auto selector = sycl::ext::intel::fpga_emulator_selector_v;
#endif
// create the device queue
sycl::queue q(selector);
// make sure the device supports USM host allocations
auto device = q.get_device();
std::cout << "Running on device: "
<< device.get_info<sycl::info::device::name>().c_str()
<< std::endl;
if (!device.has(sycl::aspect::usm_host_allocations)) {
std::terminate();
}
// declare arrays and fill them
// allocate in shared memory so the kernel can see them
int *vec_a = malloc_shared<int>(kVectSize, q);
int *vec_b = malloc_shared<int>(kVectSize, q);
int *vec_c = malloc_shared<int>(kVectSize, q);
for (int i = 0; i < kVectSize; i++) {
vec_a[i] = i;
vec_b[i] = (kVectSize - i);
}
std::cout << "add two vectors of size " << kVectSize << std::endl;
q.single_task<VectorAddID>([=]() {
VectorAdd(vec_a, vec_b, vec_c, kVectSize);
})
.wait();
// verify that vec_c is correct
for (int i = 0; i < kVectSize; i++) {
int expected = vec_a[i] + vec_b[i];
if (vec_c[i] != expected) {
std::cout << "idx=" << i << ": result " << vec_c[i] << ", expected ("
<< expected << ") A=" << vec_a[i] << " + B=" << vec_b[i]
<< std::endl;
passed = false;
}
}
std::cout << (passed ? "PASSED" : "FAILED") << std::endl;
free(vec_a, q);
free(vec_b, q);
free(vec_c, q);
} catch (sycl::exception const &e) {
// Catches exceptions in the host code.
std::cerr << "Caught a SYCL host exception:\n" << e.what() << "\n";
// Most likely the runtime couldn't find FPGA hardware!
if (e.code().value() == CL_DEVICE_NOT_FOUND) {
std::cerr << "If you are targeting an FPGA, please ensure that your "
"system has a correctly configured FPGA board.\n";
std::cerr << "Run sys_check in the oneAPI root directory to verify.\n";
std::cerr << "If you are targeting the FPGA emulator, compile with "
"-DFPGA_EMULATOR.\n";
}
std::terminate();
}
return passed ? EXIT_SUCCESS : EXIT_FAILURE;
}
Functor Coding Style Example
With this style, you can specify all the interfaces in one location and make a call to your IP component from your SYCL* host program.
#include <iostream>
// oneAPI headers
#include <sycl/ext/intel/fpga_extensions.hpp>
#include <sycl/sycl.hpp>
// Forward declare the kernel name in the global scope. This is an FPGA best
// practice that reduces name mangling in the optimization reports.
class VectorAddID;
struct VectorAdd {
int *const vec_a_in;
int *const vec_b_in;
int *const vec_c_out;
int len;
void operator()() const {
for (int idx = 0; idx < len; idx++) {
int a_val = vec_a_in[idx];
int b_val = vec_b_in[idx];
int sum = a_val + b_val;
vec_c_out[idx] = sum;
}
}
};
constexpr int kVectSize = 256;
int main() {
bool passed = true;
try {
// Use compile-time macros to select either:
// - the FPGA emulator device (CPU emulation of the FPGA)
// - the FPGA device (a real FPGA)
// - the simulator device
#if FPGA_SIMULATOR
auto selector = sycl::ext::intel::fpga_simulator_selector_v;
#elif FPGA_HARDWARE
auto selector = sycl::ext::intel::fpga_selector_v;
#else // #if FPGA_EMULATOR
auto selector = sycl::ext::intel::fpga_emulator_selector_v;
#endif
// create the device queue
sycl::queue q(selector);
// make sure the device supports USM host allocations
auto device = q.get_device();
std::cout << "Running on device: "
<< device.get_info<sycl::info::device::name>().c_str()
<< std::endl;
if (!device.has(sycl::aspect::usm_host_allocations)) {
std::terminate();
}
// declare arrays and fill them
// allocate in shared memory so the kernel can see them
int *vec_a = sycl::malloc_shared<int>(kVectSize, q);
int *vec_b = sycl::malloc_shared<int>(kVectSize, q);
int *vec_c = sycl::malloc_shared<int>(kVectSize, q);
for (int i = 0; i < kVectSize; i++) {
vec_a[i] = i;
vec_b[i] = (kVectSize - i);
}
std::cout << "add two vectors of size " << kVectSize << std::endl;
q.single_task<VectorAddID>(VectorAdd{vec_a, vec_b, vec_c, kVectSize})
.wait();
// verify that vec_c is correct
for (int i = 0; i < kVectSize; i++) {
int expected = vec_a[i] + vec_b[i];
if (vec_c[i] != expected) {
std::cout << "idx=" << i << ": result " << vec_c[i] << ", expected ("
<< expected << ") A=" << vec_a[i] << " + B=" << vec_b[i]
<< std::endl;
passed = false;
}
}
std::cout << (passed ? "PASSED" : "FAILED") << std::endl;
sycl::free(vec_a, q);
sycl::free(vec_b, q);
sycl::free(vec_c, q);
} catch (sycl::exception const &e) {
// Catches exceptions in the host code.
std::cerr << "Caught a SYCL host exception:\n" << e.what() << "\n";
// Most likely the runtime couldn't find FPGA hardware!
if (e.code().value() == CL_DEVICE_NOT_FOUND) {
std::cerr << "If you are targeting an FPGA, please ensure that your "
"system has a correctly configured FPGA board.\n";
std::cerr << "Run sys_check in the oneAPI root directory to verify.\n";
std::cerr << "If you are targeting the FPGA emulator, compile with "
"-DFPGA_EMULATOR.\n";
}
std::terminate();
}
return passed ? EXIT_SUCCESS : EXIT_FAILURE;
}