FPGA Support Package for the Intel® oneAPI DCP++/C++ Compiler Release...

Where To Find the FPGA Support Package

The FPGA Support Package for the Intel oneAPI DPC++/C++ Compiler requires the Intel® oneAPI DPC++/C++ Compiler as provided by the Intel® oneAPI Base Toolkit (Base Kit). Visit the FPGA Support Package for Intel® oneAPI DPC++/C++ Compiler website to download the toolkit and the FPGA support package.

Both the base toolkit and the FPGA support package are required for FPGA design flows.

Supported Hardware and Operating System

See Intel® oneAPI DPC++/C++ Compiler System Requirements.

FPGA Support Package Release Notes

2025.0 Release Notes

2025.0 New and Changed Features

Removed support for the Quartus® Prime Pro Edition versions 21.2 to 21.4.
Added support to the following operating systems:
- Quartus® Prime Standard Edition version 23.1
- Quartus® Prime Pro Edition versions 22.3, 22.4, 23.1, 23.2, 23.3, 23.4, 24.1, and 24.2.
- Windows* 10 and 11
- Ubuntu* 22.04
- RHEL* 9.1, 9, and 8.6
- SUSE15 SP5

2025.0 Bugs Fixes

The -Xsallow-wide-device-globals compiler option is now supported in emulation.
Fixed the issue where Microsoft* Windows* systems using the %ld format specifier with the printf function would show incorrect results.
Fixed the issue where compiling using -Xsfast-compile compiler option would fail when targeting the BSP for the Intel® FPGA SmartNIC N6000-PL Platform (formerly code-named Arrow Creek):
Fixed the issue where the Quartus® Summary section of the FPGA Optimization Report was not getting populated when compiling a N6000-PL Platform FPGA hardware image.
Fixed the issue where the host program would crash at runtime if the design used ac_int variables larger than 256 bits.
Fixed the issue where applying the [[intel::fpga_register]] attribute to a variable would cause the compiler to crash an error message.
Fixed the issue where freeing a USM pointer allocated but not used as a kernel argument would result in a segmentation fault error.
Fixed the issue where the compiler JIT engine would issue a warning message in the FPGA acceleration flow if you used host pipes.
Fixed the issue where the compiler would issue an error even when the pipe is specified with a protocol that includes a ready signal.
Fixed the issue where calling ext::oneapi::experimental::printf with a float, char or short value would not print the correct value.
Fixed the issue where the programs using the fpga_datapath template would crash.
Fixed the issue where a design using macros defined in #include <sycl/ext/intel/prototype/interfaces.hpp> and having a device_global variable with the [[intel::fpga_register]] attribute applied would crash and the compiler would issue an error message.
Fixed the issue where the compiler would issue an error message when applying memory attributes, such as the [[intel::fpga_register]] attribute, to member variables of structs.
Fixed the issue in the FPGA Optimization Report where designs with multiple lambda kernels would report inaccurate results unless the lambda kernels were all given unique names.
Fixed the issue where you would encounter functional failures in the FPGA emulation flow when resetting a device_global and loading a new device_image without the device_image scope property.

2025.0 Known Issues and Limitations

When multiple oneAPI versions are installed on Windows, the following assertion failure may happen when running oneAPI FPGA simulation:
```
HAL : Getting info version: 2024.2
  Runtime version: 20.3
  MMD version:     2024.2
Assertion failed: 0 && "MMD version mismatch", file <path>\opencl-fpga-runtime\src\acl_hal_mmd.cpp, line 1435
```
As a workaround, use the oneAPI installer to uninstall all the undesired oneAPI versions from your system until only a single oneAPI installation remains with the target version.

When simulating designs utilizing host pipes, very rarely, there might be segmentation faults with a stack trace similar to the following. As a workaround, rerun the application again.

std::atomic<bool>::operator bool (this=0x5f5) at /nfs/site/disks/psg_ctools_1/gcc/7.5.0/linux64/rhel8/include/c++/7.5.0/atomic:86

0x00007ffff2976d77 in ACL_MSIM_DEVICE::do_check_for_writes (this=0x4, ch=0x7ffff29fe340 <(anonymous namespace)::s_handle_map>) at src/acl_msim_device.cpp:835

0x00007ffff2976f18 in ACL_MSIM_DEVICE::check_for_writes (arg=0x7ffee4050a00) at src/acl_msim_device.cpp:864

The compiler produces the following generic error in many cases, for example, when you are missing device support files.
Error message:

...
aoc: Compiling for Simulator.
Error: Simulation system generation FAILED.
Refer to <...>/logs/<ip_name>.log for details
llvm-foreach:
icpx: error: fpga compiler failed with exit code 1 (use -v to

As a workaround for the missing device support files, install the appropriate device support files from the FPGA Software Download Center. You can determine the missing device support files by inspecting the log file mentioned in the error message for the following message:

Info: qsys-generate /tmp/fpga_template-96985d-fda046/ip/mpsim/done_cfan.ip 
--simulation=VERILOG --allow-mixed-language-simulation 
--output-directory=/tmp/fpga_template-96985d-fda046/ip/mpsim/done_cfan 
--family=Agilex --part=Unknown
Error: done_cfan: deviceFamily "Agilex" is out of range: "None", "Unknown"
Error: qsys-generate failed with exit code 3: 1 Error, 0 Warnings

For an FPGA SYCL HLS program compiled for an Arria 10, Stratix 10, or Agilex 7 FPGA, it is possible that simulation or hardware runs may not update host memory before the kernel completes. The memory is written, but it is possible that the host program checks the memory before the data gets there.
Sample subroutine from a whole program showing the problem:
It is possible for the program to print Failed: 15 if the read from *Result is done before the write reaches the memory.
```
// Sample invocation

void runKernel(queue &q) {
  int *IntegerVar = malloc_shared<int>(5,q);
  for (int i = 0; i < 5; ++i)
    IntegerVar[i] = i+1;

  int *Result = malloc_shared<int>(1,q);
  {
    q.single_task<test_k1_test>([=]() {
        int tmp = 0;
        for (int i = 0; i < 5; ++i) {
          tmp += IntegerVar[i];
        }
        *Result = tmp;
      }).wait_and_throw();
  }
  if (*Result != 15) {
    std::cout << "Failed: " << *Result << std::endl;
  }
}
```
As a workaround, do one of the following:
- On Arria 10 boards, add a sleep(5); after the call to wait_and_throw();. The number of seconds to sleep can be less than 5.
- On Stratix 10 or Agilex 7 FPGA boards, add the -Xshyper-optimized-handshaking=off option to the icpx command line or add a sleep(5); function after the call to wait_and_throw();. The number of seconds to sleep can be less than 5.

If a target FPGA SYCL HLS and your code contains a task sequence that has an infinite loop in it and no host or I/O pipes are present in the design, the compiler issues the following error. As a workaround, add a dummy host pipe.
```
Compiler Error: Board 'custom_ipa' does not contain host pipes so pipe 'return.xxxx' must be accessed from both endpoints, it is currently only read by Kernel 'xxxx'
```
On Red Hat Linux Enterprise Linux systems, you must install the libnsl library before you run the compiler. To install the libnsl library, run the sudo yum install libnsl command.
When targeting the BSP for the Intel® FPGA SmartNIC N6000-PL Platform (formerly code-named Arrow Creek), the compiler does not automatically detect and warn about timing violations occurring when compiling with BSPs or custom platforms based on the Open FPGA Stack. If you target one of these platforms, validate that your timing passed before executing the compiled design on hardware. When targetting the N6000 PL Platform BSP, you can validate your timing by reviewing the clock report in fim_platform/build/syn/board/n6001/syn_top/output_files/timing_report subdirectory inside of your compiler output directory (that is, your .prj folder).
An FPGA hardware compilation that targets the default Stratix® 10 device (-Xstarget=Stratix10) might fail with the following error message:
```
Error (22730): RAM Primitive "foo_di_inst|DotProductIP_std_ic_inst|DotProductIP_inst_0|kernel|theDotProductIP_function|thebb_DotProductIP_B1|<other names>|ram_block2a63" parameter operation_mode value QUAD_PORT is no longer supported for the target device. File: <filename>/altera_syncram_impl_klrp.tdf Line: 37
```
You can use one of the following workarounds for this issue:
- Specify a specific Stratix* 10 OPN with the -Xsdevice= compiler command option. Do not use the 1SG280LU3F50I2VG OPN.
- If you must use the 1SG280LU3F50I2VG Stratix® 10 OPN, run the following commands when compiling:
```
echo "skip_nd_sqp_power_temp_check=on" > /tmp/file.ini
icpx -Xsadd-ini=/tmp/file.ini <other compiler command options>
```
- Use the Quartus Prime Pro Edition version 24.3 or later.
Writing an argument to a CRA interface one cycle after the start signal is sent incorrectly causes the arguments after the start signal to be used.
If your code contains an assert macro that contains error message text, you might experience a compiler crash that includes an Error: Optimizer FAILED message as part of the crash message. As a workaround, avoid using text strings in your assert macros.
If your code reads a host-access device_global variable but never uses the result, your program will crash with an Error: Verilog generator FAILED error message as part of the crash message. To prevent this error, ensure that your code uses the result of the host-access device_global variable read elsewhere in the program.
The IO pipe classes sycl::ext::intel::kernel_readable_io_pipe and sycl::ext::intel::kernel_writable_io_pipe are not compatible with the pipe properties defined in the sycl::ext::intel::experimental namespace. This is planned to be addressed in a future release.

The SYCL* ext::oneapi::experimental::printf class is subject to the following limitations:

Output might be reordered in the Windows emulator. If you see a different order of output when printing to the console and redirecting output to a file, recompile your program with the -O0 compiler option.
For example, the following code generates different output order:


#include <sycl/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>

#ifdef __SYCL_DEVICE_ONLY__
#define CL_CONSTANT __attribute__((opencl_constant))
#else
#define CL_CONSTANT
#endif

using namespace sycl;
#define PRINTF(format, ...)                          \
  {                                                  \
   static const CL_CONSTANT char _format[] = format; \
   ext::oneapi::experimental::printf(_format,##__VA_ARGS__); \
  }

class BasicKernel;

int main(int argc, char* argv[]) {
  queue q;
    q.submit([&](handler& h) {
       h.single_task<BasicKernel>([=]() {
         PRINTF("Result1: Hello, World!\n");
         PRINTF("Result2: %%\n");
       });
     }).wait();
  return 0;
}

On Windows, this program prints the following output to the console:

Hello, World!
Result2: %

If you redirect the output to a file, the program creates the following results:

Result2: %
Hello, World!

The FPGA runtime can hang when multiple invocations of the same kernels are enqueued with explicit event dependences between them. As a workaround, remove the explicit event dependences. This workaround is safe for FPGA devices, but generally is not safe for CPU/GPU compiler targets.
On Windows, if you link a static library (.a file) containing your main function produced via the -fsycl-link=image flag you might see linker errors such as the following errors:
```
error LNK2001: unresolved external symbol __start_omp_offloading_entries
error LNK2001: unresolved external symbol __stop_omp_offloading_entries
```
```
error LNK1561: entry point must be defined
```
As a workaround split your source code across multiple files so that the translation unit you compile with -fsycl-link=image does not contain your main function.

Designs that access internal memory in a series of nested loops might experience inefficient memory accesses (that is, stallable loads and stores) regardless of the memory attributes specified to set the memory bank configuration. A potential workaround for this is to transpose the memory system so that the lowest dimension is accessed in parallel.
The following example experience inefficient memory accesses:


[[intel::fpga_memory("BLOCK_RAM")]]  // memory
unsigned int line_buffer[8][COLS];
...
for (int num_col = 0; num_col < COLS ; num_col++) {
    fpga_tools::UnrolledLoop<0, 4>([&](auto l) {  // loop
        line_buffer[l][num_col] = line_buffer[l + 1][num_col];
    });
    line_buffer[4][num_col] = pixel_a_traiter;
    fpga_tools::UnrolledLoop<0, 5>([&](auto li) {
        fpga_tools::UnrolledLoop<0, 4>([&](auto co) {  // loop
            fenetre[li][4] = line_buffer[li][num_col];
        });
    });
}

Implement the workaround changes the code into the following example:


[[intel::fpga_memory("BLOCK_RAM")]]  // memory
unsigned int line_buffer[8][COLS];
...
for (int num_col = 0; num_col < COLS ; num_col++) {
    fpga_tools::UnrolledLoop<0, 4>([&](auto l) {  // loop
        line_buffer[num_col][1] = line_buffer[l + 1][num_col];
    });
    line_buffer[4][num_col] = pixel_a_traiter;
    fpga_tools::UnrolledLoop<0, 5>([&](auto li) {
        fpga_tools::UnrolledLoop<0, 4>([&](auto co) {  // loop
            fenetre[li][4] = line_buffer[li][num_col];
        });
    });
}

For FPGA devices, you might run into performance issues when using switch statements instead of if statements. If the cases of the switch statement access external memories at different buffer_locations then the compiler might not be able to resolve the address space to the loads and stores, which results in the creation of extra loads and stores to dynamically resolve the address space at run time. If you encounter these issues, use if statements instead of switch statements.
The atomic_ref class is not supported for FPGA devices.
In the FPGA SYCL* HLS flow, the compiler might generate a wider than requested address bus for the Avalon MM Host interfaces when the ring interconnect is used to connect the LSUs. You can ignore the extra MSBs on the bus by leaving them unconnected.
Designs with host pipe reads and writes in an unrolled loop cause a compiler error message that contains text similar to the following text:
```
…pipe 'acl_c_MyID_pipe_channel' must be accessed from both endpoints…
```
If you receive this error message, unroll the loop manually to resolve this error.
When compiling for emulation, you might not receive an error message for this issue.
When you use the -fsycl-device-code-split=per_kernel compiler command option for a design that launches and collects multiple kernels, the first kernel that is returned provides correct results. However, subsequent kernels may intermittently return incorrect results.
For FPGA pipelined kernels in simulation, the reported II may not reflect the lowest II achievable by the hardware because the runtime cannot feed data to the simulator fast enough. One possible workaround, which allows lower II to be achieved, is to use pipelined kernels with streaming arguments only. If wall clock time is not a restriction, using the ‑Xsghdl=0 compiler command option should slow down the simulator sufficiently for the runtime to feed it data at the lowest achievable II.
Converting an ap_float number to an ac_fixed data type in SYCL device code in the form of ApFloatT x = (AcFixedT) y; may produce incorrect results in the FPGA emulation flow. This type of conversion works correctly in FPGA simulation and hardware compilation flows. A DPC++ program that runs kernels on one or more FPGA devices does not support multithreaded execution. This lack of support can be particularly problematic when you create host code to test a streaming kernel (that is, a kernel that continually reads input from a pipe, does some computation, and writes output to another pipe).
The typical way to express such a testbench for such a streaming kernel is to use one thread to write to the kernel input pipe while another thread reads from the kernel output pipe. However, such multithreaded execution of the host program is not supported. If you use the same thread to write to the kernel input pipe and read the kernel output pipe, your SYCL program might hang if the capacity of the pipes is exceeded (for example, if you write more data to the kernel input pipe than the pipe capacity). You can avoid such a hang without needing to increase pipe capacity by applying the following idiom:
```
struct my_kernel {
  void operator()() const {
    while(1) {
      auto in = in_pipe::read();

      out_pipe::write(...);
    }
  }
};

  // Host code
  int in_count= 0;
  int out_count = 0;
  q.single_task(my_kernel{});
  while (out_count < N) {
    bool success;
    if (in_count < N) {
      in_pipe::write(q, 1, success);
      in_count += success;
    }
    out_pipe::read(q, success);
    out_count += success;
  }
```

For Stratix® 10 FPGA reference boards, a rare failure can occur when initializing internal memory where the memory is initialized into an unknown state that can cause unexpected behavior. As a workaround, compile your design with the -Xsbsp-flow=flat compiler option to avoid this issue.
For large FPGA simulations, such as those that target Agilex™ 7 boards, you might receive a linker error that contains a PC-relative offset overflow message. If you receive this message, compile your simulation with the -fsycl-link-huge-device-code compiler command option. For lambda kernels generated in a loop, use templated classes to give the kernels procedurally generated names.
For ap_float data types, the ‑fp-model=fast compiler command option does not enable dot product inference. There is currently no workaround for this issue.
For FPGA, counting the leading zeros of an unsigned native integer type using a loop like in the following example can lead to a compiler error such as Compiler Error: undefined reference to 'llvm.ctlz.iN'
```
unsigned int leading_zeros = 0;
while (number) {
  leading_zeros += 1;
  number >>= 1;
}
```
You can workaround this issue by using the built-in function to count the leading zeros: __builtin_clz(unsigned) or __builtin_clzll(unsigned long long). When counting the leading zeros of unsigned char or unsigned short using the built-in functions, deduct the number of bits extended during type conversion from the return.
On Windows, compiling FPGA designs in a directory with a long path name might fail, and you might see the following error:
```
dpcpp: error: fpga compiler command failed with exit code 1 (use -v to see invocation)
NMAKE : fatal error U1077: ‘…\oneAPI\compiler\latest\windows\bin\dpcpp.EXE' : return code '0x1'
```
As a workaround, either compile the design in a directory with a short path name or reset TMP and TEMP environment variables to point to a shorter path (for example, C:\temp).
When compiling for FPGA, the compiler might pack structs differently on Windows than on Linux. This difference can result in structs with members that might not be well-aligned for optimal memory accesses. As a result, some designs that compile with an II=1 on Linux might have, for example, II=10 on Windows. As a workaround, force an alignment on the misaligned structs, as shown in the following example:
```
//Code with misaligned struct
struct Item {
  bool valid;
  int value1;
  unsigned char value2;
};

//Forced alignment of struct
struct Item {
  bool valid;
  bool __empty__[3];
  int value1;
  unsigned char value2;
  unsigned char __empty2__[3];
}
```
Due to a known issue pertaining to HTML files within the Jupyter Notebook, you cannot launch the FPGA Optimization Report in a Jupyter Notebook. As a workaround for this issue, copy the FPGA optimization reports directory to a local file system and launch it using a supported browser.

Intel® oneAPI DPC++/C++ Compiler Known Issues

For Intel® oneAPI DPC++/C++ Compiler related issues, refer to Intel® oneAPI DPC++/C++ Compiler Release Notes.

Code Samples

Download the oneAPI samples for FPGAs available on GitHub at oneAPI Samples for FPGA.

Previous Release Notes

Additional Documentation

Refer to the following guides for additional information:

Notices and Disclaimers

Intel® technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Currently, characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from a course of performance, course of dealing, or usage in trade.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

FPGA Support Package for the Intel® oneAPI DCP++/C++ Compiler Release Notes

Where To Find the FPGA Support Package

Supported Hardware and Operating System

FPGA Support Package Release Notes

2025.0 New and Changed Features

2025.0 Bugs Fixes

2025.0 Known Issues and Limitations

Intel® oneAPI DPC++/C++ Compiler Known Issues

Code Samples

Previous Release Notes

Additional Documentation

Notices and Disclaimers

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

FPGA Support Package for the Intel® oneAPI DCP++/C++ Compiler Release Notes

Where To Find the FPGA Support Package

Supported Hardware and Operating System

FPGA Support Package Release Notes

2025.0 New and Changed Features

2025.0 Bugs Fixes

2025.0 Known Issues and Limitations

Intel® oneAPI DPC++/C++ Compiler Known Issues

Code Samples

Previous Release Notes

Additional Documentation

Notices and Disclaimers

Product and Performance Information