3.3.3.1. Data Parallelism

Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

Download PDF

ID 683152

Date 6/20/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: tvi1593102572941

Ixiasoft

View Details

Document Table of Contents

Document Table of Contents x

1. Intel® HLS Compiler Pro Edition Best Practices Guide 2. Best Practices for Coding and Compiling Your Component 3. FPGA Concepts 4. Interface Best Practices 5. Loop Best Practices 6. fMAX Bottleneck Best Practices 7. Memory Architecture Best Practices 8. System of Tasks Best Practices 9. Datatype Best Practices 10. Advanced Troubleshooting A. Intel® HLS Compiler Pro Edition Best Practices Guide Archives B. Document Revision History for Intel® HLS Compiler Pro Edition Best Practices Guide

3. FPGA Concepts x

3.1. FPGA Architecture Overview 3.2. Concepts of FPGA Hardware Design 3.3. Methods of Hardware Design

3.1. FPGA Architecture Overview x

3.1.1. Adaptive Logic Module (ALM) 3.1.2. Digital Signal Processing (DSP) Block 3.1.3. Random-Access Memory (RAM) Blocks

3.1.1. Adaptive Logic Module (ALM) x

3.1.1.1. Lookup Table (LUT) 3.1.1.2. Register

3.2. Concepts of FPGA Hardware Design x

3.2.1. Maximum Frequency (fMAX) 3.2.2. Latency 3.2.3. Pipelining 3.2.4. Throughput 3.2.5. Datapath 3.2.6. Control Path 3.2.7. Occupancy

3.3. Methods of Hardware Design x

3.3.1. How Source Code Becomes a Custom Hardware Datapath 3.3.2. Scheduling 3.3.3. Mapping Parallelism Models to FPGA Hardware 3.3.4. Memory Types

3.3.1. How Source Code Becomes a Custom Hardware Datapath x

3.3.1.1. Mapping Source Code Instructions to Hardware 3.3.1.2. Mapping Arrays and Their Accesses to Hardware

3.3.2. Scheduling x

3.3.2.1. Dynamic Scheduling 3.3.2.2. Clustering the Datapath 3.3.2.3. Handshaking Between Clusters

3.3.3. Mapping Parallelism Models to FPGA Hardware x

3.3.3.1. Data Parallelism 3.3.3.2. Task Parallelism

3.3.3.1. Data Parallelism x

3.3.3.1.1. Executing Independent Operations Simultaneously 3.3.3.1.2. Pipelining

3.3.3.1.2. Pipelining x

3.3.3.1.2.1. Pipelining Loops Within A Component 3.3.3.1.2.2. Pipelining Across Component Invocations

3.3.4. Memory Types x

3.3.4.1. Component Memory 3.3.4.2. External Memory

4. Interface Best Practices x

4.1. Choose the Right Interface for Your Component 4.2. Control LSUs For Your Variable-Latency MM Host Interfaces 4.3. Avoid Pointer Aliasing

4.1. Choose the Right Interface for Your Component x

4.1.1. Pointer Interfaces 4.1.2. Avalon® Memory Mapped Host Interfaces 4.1.3. Avalon® Memory Mapped Agent Memories 4.1.4. Avalon® Memory Mapped Agent Registers 4.1.5. Avalon® Streaming Interfaces 4.1.6. Pass-by-Value Interface

5. Loop Best Practices x

5.1. Reuse Hardware By Calling It In a Loop 5.2. Parallelize Loops 5.3. Construct Well-Formed Loops 5.4. Minimize Loop-Carried Dependencies 5.5. Avoid Complex Loop-Exit Conditions 5.6. Convert Nested Loops into a Single Loop 5.7. Place if-Statements in the Lowest Possible Scope in a Loop Nest 5.8. Declare Variables in the Deepest Scope Possible 5.9. Raise Loop II to Increase fMAX 5.10. Control Loop Interleaving

5.2. Parallelize Loops x

5.2.1. Pipeline Loops 5.2.2. Unroll Loops 5.2.3. Example: Loop Pipelining and Unrolling

6. fMAX Bottleneck Best Practices x

6.1. Balancing Target fMAX and Target II

7. Memory Architecture Best Practices x

7.1. Example: Overriding a Coalesced Memory Architecture 7.2. Example: Overriding a Banked Memory Architecture 7.3. Merge Memories to Reduce Area 7.4. Example: Specifying Bank-Selection Bits for Local Memory Addresses

7.3. Merge Memories to Reduce Area x

7.3.1. Example: Merging Memories Depth-Wise 7.3.2. Example: Merging Memories Width-Wise

8. System of Tasks Best Practices x

8.1. Executing Multiple Loops in Parallel 8.2. Sharing an Expensive Compute Block 8.3. Implementing a Hierarchical Design 8.4. Balancing Capacity in a System of Tasks

8.4. Balancing Capacity in a System of Tasks x

8.4.1. Enable the Intel® HLS Compiler to Infer Data Path Buffer Capacity Requirements 8.4.2. Explicitly Add Buffer Capacity to Your Design When Needed

9. Datatype Best Practices x

9.1. Avoid Implicit Data Type Conversions 9.2. Avoid Negative Bit Shifts When Using the ac_int Datatype

10. Advanced Troubleshooting x

10.1. Component Fails Only In Simulation 10.2. Component Gets Poor Quality of Results

1. Intel® HLS Compiler Pro Edition Best Practices Guide

2. Best Practices for Coding and Compiling Your Component

3. FPGA Concepts

3.1. FPGA Architecture Overview

3.1.1. Adaptive Logic Module (ALM)

3.1.1.1. Lookup Table (LUT)

3.1.1.2. Register

3.1.2. Digital Signal Processing (DSP) Block

3.1.3. Random-Access Memory (RAM) Blocks

3.2. Concepts of FPGA Hardware Design

3.2.1. Maximum Frequency (fMAX)

3.2.2. Latency

3.2.3. Pipelining

3.2.4. Throughput

3.2.5. Datapath

3.2.6. Control Path

3.2.7. Occupancy

3.3. Methods of Hardware Design

3.3.1. How Source Code Becomes a Custom Hardware Datapath

3.3.1.1. Mapping Source Code Instructions to Hardware

3.3.1.2. Mapping Arrays and Their Accesses to Hardware

3.3.2. Scheduling

3.3.2.1. Dynamic Scheduling

3.3.2.2. Clustering the Datapath

3.3.2.3. Handshaking Between Clusters

3.3.3. Mapping Parallelism Models to FPGA Hardware

3.3.3.1. Data Parallelism

3.3.3.1.1. Executing Independent Operations Simultaneously

3.3.3.1.2. Pipelining

3.3.3.1.2.1. Pipelining Loops Within A Component

3.3.3.1.2.2. Pipelining Across Component Invocations

3.3.3.2. Task Parallelism

3.3.4. Memory Types

3.3.4.1. Component Memory

3.3.4.2. External Memory

4. Interface Best Practices

4.1. Choose the Right Interface for Your Component

4.1.1. Pointer Interfaces

4.1.2. Avalon® Memory Mapped Host Interfaces

4.1.3. Avalon® Memory Mapped Agent Memories

4.1.4. Avalon® Memory Mapped Agent Registers

4.1.5. Avalon® Streaming Interfaces

4.1.6. Pass-by-Value Interface

4.2. Control LSUs For Your Variable-Latency MM Host Interfaces

4.3. Avoid Pointer Aliasing

5. Loop Best Practices

5.1. Reuse Hardware By Calling It In a Loop

5.2. Parallelize Loops

5.2.1. Pipeline Loops

5.2.2. Unroll Loops

5.2.3. Example: Loop Pipelining and Unrolling

5.3. Construct Well-Formed Loops

5.4. Minimize Loop-Carried Dependencies

5.5. Avoid Complex Loop-Exit Conditions

5.6. Convert Nested Loops into a Single Loop

5.7. Place if-Statements in the Lowest Possible Scope in a Loop Nest

5.8. Declare Variables in the Deepest Scope Possible

5.9. Raise Loop II to Increase fMAX

5.10. Control Loop Interleaving

6. fMAX Bottleneck Best Practices

6.1. Balancing Target fMAX and Target II

7. Memory Architecture Best Practices

7.1. Example: Overriding a Coalesced Memory Architecture

7.2. Example: Overriding a Banked Memory Architecture

7.3. Merge Memories to Reduce Area

7.3.1. Example: Merging Memories Depth-Wise

7.3.2. Example: Merging Memories Width-Wise

7.4. Example: Specifying Bank-Selection Bits for Local Memory Addresses

8. System of Tasks Best Practices

8.1. Executing Multiple Loops in Parallel

8.2. Sharing an Expensive Compute Block

8.3. Implementing a Hierarchical Design

8.4. Balancing Capacity in a System of Tasks

8.4.1. Enable the Intel® HLS Compiler to Infer Data Path Buffer Capacity Requirements

8.4.2. Explicitly Add Buffer Capacity to Your Design When Needed

9. Datatype Best Practices

9.1. Avoid Implicit Data Type Conversions

9.2. Avoid Negative Bit Shifts When Using the ac_int Datatype

10. Advanced Troubleshooting

10.1. Component Fails Only In Simulation

10.2. Component Gets Poor Quality of Results

A. Intel® HLS Compiler Pro Edition Best Practices Guide Archives

B. Document Revision History for Intel® HLS Compiler Pro Edition Best Practices Guide

Visible to Intel only — GUID: tvi1593102572941

Ixiasoft

View Details