2. High Level Synthesis (HLS) Design Examples and Tutorials
The Intel® High Level Synthesis (HLS) Compiler Pro Edition includes design examples and tutorials to provide you with example components and demonstrate ways to model or code your components to get the best results from the Intel® HLS Compiler for your design.
High Level Synthesis Design Examples
The high level synthesis (HLS) design examples give you a quick way to see how various algorithms can be effectively implemented to get the best results from the Intel® HLS Compiler.
<quartus_installdir>/hls/examples/<design_example_name>
Where <quartus_installdir> is the directory where you installed the Intel® Quartus® Prime Design Suite. For example, /home/<username>/intelFPGA_pro/22.1 or C:\intelFPGA_pro\22.1 .
Focus area | Name | Description |
---|---|---|
Linear algebra | QRD | Uses the Modified Gram-Schmidt algorithm for QR factorization of a matrix. |
Signal processing | interp_decim_filter | Implements a simple and efficient interpolation/decimation filter. |
Simple design | counter | Implements a simple and efficient 32-bit counter component. |
Video processing | YUV2RGB | Implements a basic YUV422 to RGB888 color space conversion. |
Video processing | image_downsample | Implements an image downsampling algorithm to scale an image to a smaller size using bilinear interpolation. |
HLS Design Tutorials
The HLS design tutorials show you important HLS-specific programming concepts as well demonstrating good coding practices.
Each tutorial has a README file that gives you details about what the tutorial covers and instructions on how to run the tutorial.
Name | Description |
---|---|
You can find these tutorials in the following location on your Intel® Quartus® Prime system: |
|
ac_fixed_constructor | Demonstrates the use of the ac_fixed constructor where you can get a better QoR by using minor variations in coding style. |
ac_fixed_math_library | Demonstrates the use of the Intel® HLS Compiler ac_fixed_math fixed point math library functions. |
ac_int_basic_ops | Demonstrates the operators available for the ac_int class. |
ac_int_overflow | Demonstrates the usage of the DEBUG_AC_INT_WARNING and DEBUG_AC_INT_ERROR keywords to help detect overflow during emulation runtime. |
You can find these tutorials in the following location on your Intel® Quartus® Prime system: |
|
1_reduced_double | Demonstrates how your application can benefit from hls_float by changing the underlining type from double to hls_float<11, 44> (reduced double). |
2_explicit_arithmetic | Demonstrates how to use the explicit versions of hls_float binary operators to perform floating-point arithmetic operations based on your needs. |
3_conversions | Demonstrates when conversions appear in designs with hls_float types and how to use different conversion modes to generate compile-type constants using various hls_float types. |
Name | Description |
---|---|
You can find these tutorials in the following location on your Intel® Quartus® Prime system: |
|
attributes_on_mm_agent_arg | Demonstrates how to apply memory attributes to Avalon® Memory Mapped (MM) agent arguments. |
exceptions | Demonstrates how to use memory attributes on constants and struct members. |
memory_bank_configuration | Demonstrates how to control the number of load/store ports of each memory bank and optimize your component area usage, throughput, or both by using one or more of the following memory attributes:
|
memory_geometry | Demonstrates how to split your memory into banks and control the number of load/store ports of each memory bank by using one or more of the following memory attributes:
|
memory_implementation | Demonstrates how to implement variables or arrays in registers, MLABs, or RAMs by using the following memory attributes:
|
memory_merging | Demonstrates how to improve resource utilization by implementing two logical memories as a single physical memory by merging them depth-wise or width-wise with the hls_merge memory attribute. |
non_power_of_two_memory | Demonstrates how to use the force_pow2_depth memory attribute to control the padding of memories that are non-power-of-two deep, and how that impacts the FPGA memory resource usage. |
non_trivial_initialization | Demonstrates how to use the C++ keyword constexpr to achieve efficient initialization of read-only variables. |
static_var_init | Demonstrates how to control the initialization behavior of statics in a component using the hls_init_on_reset or hls_init_on_powerup memory attribute. |
Name | Description |
---|---|
You can find these tutorials in the following location on your Intel® Quartus® Prime system: |
|
overview | Demonstrates the effects on quality-of-results (QoR) of choosing different component interfaces even when the component algorithm remains the same. |
explicit_streams_buffer | Demonstrates how to use explicit stream_in and stream_out interfaces in the component and testbench. |
explicit_streams_packets_ empty | Demonstrates how to use the usesPackets, usesEmpty, and firstSymbolInHighOrderBits stream template parameters. |
explicit_streams_packets_ ready_valid | Demonstrates how to use the usesPackets, usesValid, and usesReady stream template parameters. |
mm_host_testbench_operators | Demonstrates how to invoke a component at different indicies of an Avalon Memory Mapped (MM) Host (mm_host class) interface. |
mm_agents | Demonstrates how to create Avalon-MM Agent interfaces (agent registers and agent memories). |
mm_agents_double_buffering | Demonstrates the effect of using the hls_readwrite_mode macro to control how memory hosts access the agent memories |
mm_agents_csr_volatile | Demonstrates the effect of using volatile keyword to allow concurrent agent memory accesses while your component is running. |
multiple_stream_call_sites | Demonstrates the benefits of using multiple stream call sites. |
pointer_mm_host | Demonstrates how to create Avalon-MM Host interfaces and control their parameters. |
stable_arguments | Demonstrates how to use the stable attribute for unchanging arguments to improve resource utilization. |
Name | Description |
---|---|
You can find these tutorials in the following location on your Intel® Quartus® Prime system: |
|
ac_datatypes | Demonstrates the effect of using ac_int datatype instead of int datatype. |
control_of_dsp_usage | Demonstrates the effects of controlling whether some supported data types and math functions implemented by DSPs or soft logic with the --dsp-mode option of the i++ command and the ihc::math_dsp_control function. |
const_global | Demonstrates the performance and resource utilization improvements of using const qualified global variables. |
divergent_loops | Demonstrates a source-level optimization for designs with divergent loops |
floating_point_contract | Demonstrates how to use the -ffp_contract option to improve the performance of your design for double-precision floating-point operations. |
floating_point_ops | Demonstrates the impact of -ffp-contract=fast and -ffp-reassociate flags in i++ on floating point operations using a 32-tap finite impulse response (FIR) filter design that is optimized for throughput. |
fpga_reg | Demonstrates how to use the fpga_reg macro to precisely tune pipelining in your design. |
hyper_optimized_handshaking | Demonstrates how to use the --hyper-optimized-handshaking option of the Intel HLS Compiler i++ command. |
loop_coalesce | Demonstrates the performance and resource utilization improvements of using loop_coalesce pragma on nested loops. While the #pragma loop_coalesce is provided with both Standard and Pro edition, the design tutorial is provided only with Pro edition. |
loop_fusion | Demonstrates the latency and resource utilization improvements of loop fusion. |
loop_memory_dependency | Demonstrates breaking loop carried dependencies using the ivdep pragma. |
lsu_control | Demonstrates the effects of controlling the types of LSUs instantiated for variable-latency Avalon® MM Host interfaces. |
parallelize_array_operation | Demonstrates how to improve fMAX by correcting a bottleneck that arises when performing operations on an array in a loop. |
optimize_ii_using_hls_register | Demonstrates how to use the hls_register attribute to reduce loop II and how to use hls_max_concurrency to improve component throughput |
parameter_aliasing | Demonstrates the use of the __restrict keyword on component arguments. |
random_number_generator | Demonstrates how to use the random number generator library. |
reduce_exit_fifo_width | Demonstrates how to improve fMAX by reducing the width of the FIFO belonging to the exit node of a stall-free cluster |
relax_reduction_dependency | Demonstrates a method to reduce the II of a loop that includes a floating point accumulator, or other reduction operation that cannot be computed at high speed in a single clock cycle. |
remove_loop_carried_dependency | Demonstrates how you can improve loop performance by removing accesses to the same variable across nested loops. |
resource_sharing_filter | Demonstrates an optimized-for-area variant of a 32-tap finite impulse response (FIR) filter design |
set_component_target_fmax_1 | Demonstrates how to the target fMAX in various ways by leveraging the Loop Analysis report in the High-Level Design Reports. |
set_component_target_fmax_2 | Demonstrates how the compiler handles the tradeoff between fMAX and II based on the presence or absence of the hls_scheduler_target_fmax_mhz component attribute and the ii loop pragma. |
shift_register | Demonstrates the recommended coding style for implementing shift registers. |
sincos_func | Demonstrates the effects of using sinpi or cospi functions in your component instead of sin or cos functions. |
single_vs_double_ precision_math | Demonstrates the effect of using single precision literals and functions instead of double precision literals and functions. |
stall_enable | Demonstrates how to replace stall-free clusters with stall-enabled clusters to improve latency in some small designs. |
struct_interface | Demonstrates how to use ac_int to implement interfaces with no padding bits. |
submnormal_and_rounding | Demonstrates the effects of use the --daz and --rounding i++ command options. |
swap_vs_copy | Demonstrates the impact of using deep copying with registers on the performance and resource utilization of a component design. |
triangular_loop | Demonstrates a method for describing triangular loop patterns with dependencies. |
Name | Description |
---|---|
You can find these tutorials in the following location on your Intel® Quartus® Prime system: |
|
full-design | Demonstrates a simple sort component in a minimal system as described in the HLS Walkthrough video series that is available through the Intel FPGA YouTube channel. |
compiler_interoperability | Demonstrates how to build your design using testbench code compiled with the Intel® HLS Compiler, GCC, or Microsoft* Visual Studio* and component code compiled separately with the Intel® HLS Compiler). |
enqueue_call | Demonstrates how to run components asynchronously and exercise their pipeline performance in the test bench using enqueue functionality. |
platform_designer_2xclock |
Demonstrates the recommended clock and reset generation for a component with a clock2x input. |
platform_designer_stitching |
Demonstrates how to combine multiple components to function as a single cohesive design. |
Name | Description |
---|---|
You can find these tutorials in the following location on your Intel® Quartus® Prime system: |
|
balancing_loop_delay | Demonstrates how to improve the throughput of a component that uses a system of tasks by buffering streams. |
balancing_pipeline_latency | Demonstrates how to improve the throughput of a component that uses a system of tasks by buffering streams. |
interfaces_sot | Demonstrates how to transfer information between, into, and out of tasks using Avalon® streaming and Avalon® memory-mapped host interfaces. |
internal_stream | Demonstrates how to use "internal streams" in HLS tasks with the ihc::stream object. |
launch_and_collect_capacity | Demonstrates how to use the capacity template parameter of the ihc::launch and ihc::collect functions to improve throughput in components that have systems of tasks. |
parallel_loop | Demonstrates how you can run sequential loops in a pipelined manner by using a system of HLS tasks in your component. |
resource_sharing | Demonstrates how you can share expensive compute blocks in your component to save area usage. |
task_reuse | Demonstrates how to invoke multiple copies of the same task function. |
Name | Description |
---|---|
You can find these tutorials in the following location on your Intel® Quartus® Prime system: |
|
basic_rtl_library_flow | Demonstrates the process of developing an RTL library and using it in an HLS component. |
rtl_struct_mapping | Demonstrates how to obtain a mapping from C++ struct fields to bit-slices of RTL module interface signals. |
Name | Description |
---|---|
You can find these tutorials in the following location on your Intel® Quartus® Prime system: |
|
max_interleaving |
Demonstrates a method to reduce the area utilization of a loop that meets the following conditions:
|
small_speculated_iterations | Demonstrates how decreasing the number of speculated iterations improves latency when a loop body has low latency and is expected to be frequently invoked. |
speculated_iterations | Demonstrates how increasing the number of speculated iterations improves II when the exit condition calculation is the bottleneck preventing a lower II. |