Intel® High Level Synthesis Compiler Pro Edition: Version 21.4 Release Notes

ID 683682
Date 12/13/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

1.4. Known Issues and Workarounds

This section provides information about known issues that affect the Intel® HLS Compiler Pro Edition Version 21.4.

Description Workaround
When you use the deprecated class mm_master, the compiler emits a warning message like the following:
'operator[]' has been explicitly marked 
deprecated here 
[[deprecated("Use mm_host instead.")]]
This message does not indicate which part of your code needs needs to change.
Avoid this warning message by using the class mm_host, which replaces the deprecated class mm_master.

(Windows only) Compiling a design in a directory with a long path name can result in compile failures.

Check the debug.log file for "could not find file" errors. These errors can indicate that your path is too long.

Compile the design in a directory with a short path name.
(Windows only) A long path for your Intel® Quartus® Prime installation directory can prevent you from successfully compiling and running the Intel® HLS Compiler tutorials and example designs.

Check the debug.log file for "could not find file" errors. These errors can indicate that your path is too long.

Move the tutorials and examples to a short path name before trying to run them.
Libraries that target OpenCL* and are written in HLS cannot use streams or pipes as an interface between OpenCL* code and the library written in HLS.

However, the library in HLS can use streams or pipes if both endpoints are within the library (for example, a stream that connects two task functions).

N/A
Applying the ihc::maxburst parameter to Avalon® Memory-Mapped host interfaces can cause your design to hang in simulation. N/A
In some uncommon cases, if you have two classes whose constructors each require instances of the other class as input, the compiler might crash.
For example, compiling the following code snippet causes the compiler to crash:
struct foo;

struct bar {
  int a, b, c;
  bar() : a(0), b(0), c(0) {};
  bar(const foo x);
};

struct foo {
  int a, b, c;
  foo() : a(0), b(0), c(0) {};
  foo(const bar x) {};
};

bar::bar(const foo x) {};
Avoid creating a circular definition. Instead, use a pointer or reference in your copy constructor.
For example, transform the earlier code snippet into the following code and pass in the struct as a reference to the constructor:
struct bar {
  int a, b, c;
  bar() : a(0), b(0), c(0) {};
  bar(const foo &x);
};

struct foo {
  int a, b, c;
  foo() : a(0), b(0), c(0) {};
  foo(const bar &x) {};
};

bar::bar(const foo &x) {};
Libraries that target OpenCL* and are written in HLS might cause OpenCL* kernels that include the library to have a more conservative incremental compilation. N/A
When developing a library, if you have a #define defining a value that you use later in a #pragma, the fpga_crossgen command fails.
For example, the following code cannot be compiled by the fpga_crossgen command:
#define unroll_factor 5

int foo(int array_size) {
  int tmp[100];
  int sum =0;
//pragma unroll unroll_factor
#pragma ivdep array(tmp) safelen(unroll_factor)
  for (int i=0;i<array_size;i++) {
    sum+=tmp[i];
  }
  return sum;
}
Use __pragma instead of #pragma.
For example, the following compiles successfully with the fpga_crossgen command:
#define unroll_factor 5

int foo(int array_size) {
  int tmp[100];
  int sum =0;
//pragma unroll unroll_factor
__pragma ivdep array(tmp) safelen(unroll_factor)
  for (int i=0;i<array_size;i++) {
    sum+=tmp[i];
  }
  return sum;
}
When you use the -c command option to have separate compilation and linking stages in your workflow, and if you do not specify the -march option in the linking stage (or specify a different -march option value), your linking stage might fail with or without error messages. Ensure that you use the same -march option value for both the compilation with the -c command option stage and the linking stage.
Applying the hls_merge memory attribute to an array declared within an unrolled or partially unrolled loop causes copies of the array to be merged across the unrolled loop iterations.
#pragma unroll 2
for (int I = 0; I < 8; i++) {
   hls_merge(“WidthMerged”, “width”) int MyMem1[128];
   hls_merge(“WidthMerged”, “width”) int MyMem2[128];
   ...
   hls_merge(“DepthMerged”, “depth”) int MyMem3[128];
   hls_merge(“DepthMerged”, “depth”) int MyMem4[128];
   ...
}
Avoid using the hls_merge memory attribute in unrolled loops.

If you need to merge memories in an unrolled loop, explicitly declare an array of struct type for width merging, or declare a deeper array for depth merging.

struct Type {int A; int B;};
#pragma unroll 2
for (int I = 0; I < 8; i++) {
   Type WidthMerged[128];  // Manual width merging
   ...
   int DepthMerged[256];   // Manual depth merging
   ...
}
In the Function Memory Viewer high-level design report, some function-scoped memories might appear as "optimized away". None.

When a file contains functions that are components and functions that are not components, all function-scoped variables are listed in the Function Memory List pane, but only variables from components have information about them to show in the Function Memory View pane.

Some high-level design reports fail in Microsoft* Internet Explorer*. Use one of the following browsers to view the reports:
  • Google Chrome*
  • Microsoft Edge*
  • Mozilla* Firefox*
The Loop Viewer in the High-Level Design Reports has the following restrictions:
  • The behavior of stall-free clusters is not modeled in the Loop Viewer. The final latency shown in the Loop Viewer for a stall-free cluster is typically more pessimistic (that is, higher) than the actual latency of your design.

    For a description of clustering and stall-free clusters, refer to Clustering the Datapath in the Intel High Level Synthesis Compiler Pro Edition Best Practices Guide .

  • Stalls from reads and writes from memory or print statements are not modeled.
  • High-iteration counts (>1000) cause slow performance of the Loop Viewer.
  • You cannot specify an iteration count of zero (0) in the Loop Viewer.
None.
Links in some reports in the High-Level Design Reports generated on Windows systems do not work. Generate the High-Level Design Reports (that is, compile your code) on a Linux system.
Using a struct of a single ac_int data type in steaming interface that uses packets (ihc::usesPackets<true>) does not work.
For example, the following code snippet does not work:
// class definition
class DataType {
      ac_int<155, false> data;
...
}
// stream definition
typedef ihc::stream_in<DataType, 
                       ihc::usesPackets<true>,
                       ihc::usesEmpty<true>
                      > DataStreamIn;
To use this combination in your design, obey the following restrictions:
  • The internal ac_int data size must be multiple of 8
  • The stream interface type declaration must specify ihc::bitsPerSymbol<8>
    For example, the following code snippet works:
    // class definition
    class DataType {
          ac_int<160, false> data; 
    // data width must be multiple of 8
    ...
    }
    // stream definition
    typedef ihc::stream_in<DataType, 
                           ihc::usesPackets<true>, 
                           ihc::usesEmpty<true>, 
                           ihc::bitsPerSymbol<8>
                          > DataStreamIn; 
    // added ihc::bitsPerSymbol<8>
When running a high-throughput simulation of your component using enqueue function calls, if you do not use the ihc_hls_component_run_all function to run the enqueued component calls after all of the ihc_hls_enqueue calls for that component, the following behaviors occur:
  • In emulation, the enqueued component functions are run.
  • In simulation, the enqueued component functions are not run, with no error or warning messages provided.
Ensure that you use the ihc_hls_component_run_all function after all of the ihc_hls_enqueue calls for that component to run enqueued component function calls.
Launching a task function with ihc::launch_always_run strips away optimization attributes applied to the task function.
In the following code example, the attribute applied to the function is ignored. The High-Level Design Reports show an II of 1 for this task instead of the requested II of 4.
hls_component_ii(4) void noop()
{
    bool sop, eop;
    int empty;
    auto const data = data_in.read(sop, eop, empty);

    data_out.write(data, sop, eop, empty);
}

component void main_component()
{
    ihc::launch<noop>();
}
To avoid stripping away the optimization, add a while(1) loop to the affected function apply the corresponding control pragma to the while(1) loop instead of the function.
The following code example show how you can implement this change for the earlier code example:
void noop()
{
#pragma ii 4
    while (1)
    {
        bool sop, eop;
        int empty;
        auto const data = data_in.read(sop, eop, empty);

        data_out.write(data, sop, eop, empty);
    }
}

component void
main_component()
{
    ihc::launch_always_run<noop>();
}
For Cyclone® V projects that contain multiple HLS components, when you use the i++ command to compile your project to hardware (i++ -march=CycloneV), you might receive an error.

While the error text differs depending on your project, the error signature is an Intel® Quartus® Prime compilation failure due to bad Verilog syntax. A module tries to use a function that the Intel® Quartus® Prime compiler cannot find.

If you encounter this issue, put each HLS component in a separate project.