2. High Level Synthesis (HLS) Design Examples and Tutorials

Intel® High Level Synthesis Compiler Pro Edition: Getting Started Guide

Download PDF

ID 683680

Date 10/04/2021

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: wwj1485444159765

Ixiasoft

View Details

2. High Level Synthesis (HLS) Design Examples and Tutorials

The Intel® High Level Synthesis (HLS) Compiler Pro Edition includes design examples and tutorials to provide you with example components and demonstrate ways to model or code your components to get the best results from the Intel® HLS Compiler for your design.

High Level Synthesis Design Examples

The high level synthesis (HLS) design examples give you a quick way to see how various algorithms can be effectively implemented to get the best results from the Intel® HLS Compiler.

You can find the HLS design examples in the following location:

<quartus_installdir>/hls/examples/<design_example_name>

Where <quartus_installdir> is the directory where you installed the Intel® Quartus® Prime Design Suite. For example, /home/<username>/intelFPGA_pro/21.3 or C:\intelFPGA_pro\21.3 .

For instructions on running the examples, see the following sections:

Table 2. HLS design examples
Focus area	Name	Description
Linear algebra	QRD	Uses the Modified Gram-Schmidt algorithm for QR factorization of a matrix.
Signal processing	interp_decim_filter	Implements a simple and efficient interpolation/decimation filter.
Simple design	`counter`	Implements a simple and efficient 32-bit counter component.
Video processing	YUV2RGB	Implements a basic YUV422 to RGB888 color space conversion.
Video processing	image_downsample	Implements an image downsampling algorithm to scale an image to a smaller size using bilinear interpolation.

HLS Design Tutorials

The HLS design tutorials show you important HLS-specific programming concepts as well demonstrating good coding practices.

Each tutorial has a README file that gives you details about what the tutorial covers and instructions on how to run the tutorial.

Table 3. Arbitrary precision datatypes design tutorials
Name	Description
You can find these tutorials in the following location on your Intel® Quartus® Prime system: `<quartus_installdir>/hls/examples/tutorials/ac_datatypes`
ac_fixed_constructor	Demonstrates the use of the `ac_fixed` constructor where you can get a better QoR by using minor variations in coding style.
ac_fixed_math_library	Demonstrates the use of the Intel® HLS Compiler `ac_fixed_math` fixed point math library functions.
ac_int_basic_ops	Demonstrates the operators available for the `ac_int` class.
ac_int_overflow	Demonstrates the usage of the `DEBUG_AC_INT_WARNING` and `DEBUG_AC_INT_ERROR` keywords to help detect overflow during emulation runtime.
You can find these tutorials in the following location on your Intel® Quartus® Prime system: `<quartus_installdir>/hls/examples/tutorials/hls_float`
`1_reduced_double`	Demonstrates how your application can benefit from `hls_float` by changing the underlining type from `double` to `hls_float<11, 44>` (reduced double).
`2_explicit_arithmetic`	Demonstrates how to use the explicit versions of `hls_float` binary operators to perform floating-point arithmetic operations based on your needs.
`3_conversions`	Demonstrates when conversions appear in designs with `hls_float` types and how to use different conversion modes to generate compile-type constants using various `hls_float` types.

Table 4. Component memories design tutorials
Name	Description
You can find these tutorials in the following location on your Intel® Quartus® Prime system: `<quartus_installdir>/hls/examples/tutorials/component_memories`
`attributes_on_mm_agent_arg`	Demonstrates how to apply memory attributes to Avalon® Memory Mapped (MM) agent arguments.
`exceptions`	Demonstrates how to use memory attributes on constants and `struct` members.
memory_bank_configuration	Demonstrates how to control the number of load/store ports of each memory bank and optimize your component area usage, throughput, or both by using one or more of the following memory attributes: `hls_max_replicates` `hls_singlepump` `hls_doublepump` `hls_simple_dual_port_memory` non_power_of_two_memory non_trivial_initialization
memory_geometry	Demonstrates how to split your memory into banks and control the number of load/store ports of each memory bank by using one or more of the following memory attributes: `hls_bankwidth` `hls_numbanks` `hls_bankbits`
`memory_implementation`	Demonstrates how to implement variables or arrays in registers, MLABs, or RAMs by using the following memory attributes: `hls_register` `hls_memory` `hls_memory_impl`
`memory_merging`	Demonstrates how to improve resource utilization by implementing two logical memories as a single physical memory by merging them depth-wise or width-wise with the `hls_merge` memory attribute.
`non_power_of_two_memory`	Demonstrates how to use the `force_pow2_depth` memory attribute to control the padding of memories that are non-power-of-two deep, and how that impacts the FPGA memory resource usage.
`non_trivial_initialization`	Demonstrates how to use the C++ keyword `constexpr` to achieve efficient initialization of read-only variables.
`static_var_init`	Demonstrates how to control the initialization behavior of statics in a component using the `hls_init_on_reset` or `hls_init_on_powerup` memory attribute.

Table 5. Interface design tutorials
Name	Description
You can find these tutorials in the following location on your Intel® Quartus® Prime system: `<quartus_installdir>/hls/examples/tutorials/interfaces`
overview	Demonstrates the effects on quality-of-results (QoR) of choosing different component interfaces even when the component algorithm remains the same.
explicit_streams_buffer	Demonstrates how to use explicit stream_in and stream_out interfaces in the component and testbench.
explicit_streams_packets_ empty	Demonstrates how to use the usesPackets, usesEmpty, and firstSymbolInHighOrderBits stream template parameters.
explicit_streams_packets_ ready_valid	Demonstrates how to use the usesPackets, usesValid, and usesReady stream template parameters.
mm_host_testbench_operators	Demonstrates how to invoke a component at different indicies of an Avalon Memory Mapped (MM) Host (`mm_host` class) interface.
mm_agents	Demonstrates how to create Avalon-MM Agent interfaces (agent registers and agent memories).
mm_agents_double_buffering	Demonstrates the effect of using the `hls_readwrite_mode` macro to control how memory hosts access the agent memories
mm_agents_csr_volatile	Demonstrates the effect of using `volatile` keyword to allow concurrent agent memory accesses while your component is running.
multiple_stream_call_sites	Demonstrates the benefits of using multiple stream call sites.
pointer_mm_host	Demonstrates how to create Avalon-MM Host interfaces and control their parameters.
stable_arguments	Demonstrates how to use the `stable` attribute for unchanging arguments to improve resource utilization.

Table 6. Best practices design tutorials
Name	Description
You can find these tutorials in the following location on your Intel® Quartus® Prime system: `<quartus_installdir>/hls/examples/tutorials/best_practices`
ac_datatypes	Demonstrates the effect of using `ac_int` datatype instead of `int` datatype.
control_of_dsp_usage	Demonstrates the effects of controlling whether some supported data types and math functions implemented by DSPs or soft logic with the --dsp-mode option of the i++ command and the ihc::math_dsp_control function.
const_global	Demonstrates the performance and resource utilization improvements of using `const` qualified global variables.
divergent_loops	Demonstrates a source-level optimization for designs with divergent loops
floating_point_contract	Demonstrates how to use the `fp_contract` option to improve the performance of your design for double-precision floating-point operations.
floating_point_ops	Demonstrates the impact of `--fpc` and `--fp-relaxed` flags in i++ on floating point operations using a 32-tap finite impulse response (FIR) filter design that is optimized for throughput.
fpga_reg	Demonstrates how to use the `fpga_reg` macro to precisely tune pipelining in your design.
hyper_optimized_handshaking	Demonstrates how to use the `--hyper-optimized-handshaking` option of the Intel HLS Compiler i++ command.
loop_coalesce	Demonstrates the performance and resource utilization improvements of using `loop_coalesce` pragma on nested loops. While the `#pragma loop_coalesce` is provided with both Standard and Pro edition, the design tutorial is provided only with Pro edition.
loop_fusion	Demonstrates the latency and resource utilization improvements of loop fusion.
loop_memory_dependency	Demonstrates breaking loop carried dependencies using the `ivdep` pragma.
lsu_control	Demonstrates the effects of controlling the types of LSUs instantiated for variable-latency Avalon® MM Host interfaces.
parallelize_array_operation	Demonstrates how to improve f_MAX by correcting a bottleneck that arises when performing operations on an array in a loop.
optimize_ii_using_hls_register	Demonstrates how to use the `hls_register` attribute to reduce loop II and how to use `hls_max_concurrency` to improve component throughput
parameter_aliasing	Demonstrates the use of the __restrict keyword on component arguments.
random_number_generator	Demonstrates how to use the random number generator library.
reduce_exit_fifo_width	Demonstrates how to improve f_MAX by reducing the width of the FIFO belonging to the exit node of a stall-free cluster
`relax_reduction_dependency`	Demonstrates a method to reduce the II of a loop that includes a floating point accumulator, or other reduction operation that cannot be computed at high speed in a single clock cycle.
remove_loop_carried_dependency	Demonstrates how you can improve loop performance by removing accesses to the same variable across nested loops.
resource_sharing_filter	Demonstrates an optimized-for-area variant of a 32-tap finite impulse response (FIR) filter design
`set_component_target_fmax_1`	Demonstrates how to the target f_MAX in various ways by leveraging the Loop Analysis report in the High-Level Design Reports.
`set_component_target_fmax_2`	Demonstrates how the compiler handles the tradeoff between f_MAX and II based on the presence or absence of the `hls_scheduler_target_fmax_mhz` component attribute and the `ii` loop pragma.
shift_register	Demonstrates the recommended coding style for implementing shift registers.
sincos_func	Demonstrates the effects of using `sinpi` or `cospi` functions in your component instead of `sin` or `cos` functions.
single_vs_double_ precision_math	Demonstrates the effect of using single precision literals and functions instead of double precision literals and functions.
stall_enable	Demonstrates how to replace stall-free clusters with stall-enabled clusters to improve latency in some small designs.
struct_interface	Demonstrates how to use `ac_int` to implement interfaces with no padding bits.
submnormal_and_rounding	Demonstrates the effects of use the `--daz` and `--rounding` i++ command options.
swap_vs_copy	Demonstrates the impact of using deep copying with registers on the performance and resource utilization of a component design.
`triangular_loop`	Demonstrates a method for describing triangular loop patterns with dependencies.

Table 7. Usability design tutorials
Name	Description
You can find these tutorials in the following location on your Intel® Quartus® Prime system: `<quartus_installdir>/hls/examples/tutorials/usability`
compiler_interoperability	Demonstrates how to build your design using testbench code compiled with the Intel® HLS Compiler, GCC, or Microsoft* Visual Studio* and component code compiled separately with the Intel® HLS Compiler).
enqueue_call	Demonstrates how to run components asynchronously and exercise their pipeline performance in the test bench using enqueue functionality.
platform_designer_2xclock	Demonstrates the recommended clock and reset generation for a component with a `clock2x` input.
platform_designer_stitching	Demonstrates how to combine multiple components to function as a single cohesive design.

Table 8. System of tasks design tutorials
Name	Description
You can find these tutorials in the following location on your Intel® Quartus® Prime system: `<quartus_installdir>/hls/examples/tutorials/system_of_tasks`
balancing_loop_delay	Demonstrates how to improve the throughput of a component that uses a system of tasks by buffering streams.
balancing_pipeline_latency	Demonstrates how to improve the throughput of a component that uses a system of tasks by buffering streams.
interfaces_sot	Demonstrates how to transfer information between, into, and out of tasks using Avalon® streaming and Avalon® memory-mapped host interfaces.
internal_stream	Demonstrates how to use "internal streams" in HLS tasks with the `ihc::stream` object.
launch_and_collect_capacity	Demonstrates how to use the capacity template parameter of the ihc::launch and ihc::collect functions to improve throughput in components that have systems of tasks.
parallel_loop	Demonstrates how you can run sequential loops in a pipelined manner by using a system of HLS tasks in your component.
resource_sharing	Demonstrates how you can share expensive compute blocks in your component to save area usage.
task_reuse	Demonstrates how to invoke multiple copies of the same task function.

Table 9. HLS Libraries design tutorials
Name	Description
You can find these tutorials in the following location on your Intel® Quartus® Prime system: `<quartus_installdir>/hls/examples/tutorials/libraries`
basic_rtl_library_flow	Demonstrates the process of developing an RTL library and using it in an HLS component.
rtl_struct_mapping	Demonstrates how to obtain a mapping from C++ struct fields to bit-slices of RTL module interface signals.

Table 10. HLS Loop Control tutorials
Name	Description
You can find these tutorials in the following location on your Intel® Quartus® Prime system: `<quartus_installdir>/hls/examples/tutorials/loop_controls`
max_interleaving	Demonstrates a method to reduce the area utilization of a loop that meets the following conditions: The loop has an II > 1 The loop is contained in a pipelined loop The loop execution is serialized across the invocations of the pipelined loop
small_speculated_iterations	Demonstrates how decreasing the number of speculated iterations improves latency when a loop body has low latency and is expected to be frequently invoked.
speculated_iterations	Demonstrates how increasing the number of speculated iterations improves II when the exit condition calculation is the bottleneck preventing a lower II.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® High Level Synthesis Compiler Pro Edition: Getting Started Guide

2. High Level Synthesis (HLS) Design Examples and Tutorials

High Level Synthesis Design Examples

HLS Design Tutorials