Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 10/04/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

10.3. Simplifying Memory Access to Local Memories

The Intel® Stratix® 10 FPGA hardware has two ports per M20K memory. For other Intel® FPGA device families, the Intel® FPGA SDK for OpenCL™ Offline Compiler allows the Memory System Clock to run at 2x the main clock frequency, effectively providing four ports per M20K memory. For more information, refer to Double Pumping. However, for Intel® Stratix® 10 FPGAs, the offline compiler is currently discouraged from inferring a 2x Memory System Clock because transferring data between the 2x clock domain and the main clock at high speed generally leads to significant fMAX degradation. You can still force double pumping for a given Memory System by applying the memory attribute __attribute__((doublepump)).

Due to the potential fMAX implications of double pumping on Intel® Stratix® 10, Intel® recommends limiting the number of concurrent stores.

Multiple concurrent stores often require more than two ports to implement. Multiple concurrent stores also cause the compiler to create stallable memories where memory accesses must be arbitrated on every clock cycle. To determine if a memory is arbitrated, examine the load and store units in the system or memory viewer of the High Level Design Report. In the viewer, load and store units that are highlighted red are stallable memories. The report presents this information when you hover over each highlighted unit.