Intel® FPGA SDK for OpenCL™ Standard Edition: Custom Platform Toolkit User Guide

ID 683398
Date 5/04/2018
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

1.4.1.1.1. General Quality of Results Considerations for the Exported Board Partition

When generating a post-place-and-route partition, take into account several design considerations for the exported board partition that might have unexpected consequences on the Intel® FPGA SDK for OpenCL™ Standard Edition compilation results. The best approach to optimizing the board partition is to experiment with a range of different OpenCL kernels.

The list below captures some of the parameters that might impact the quality of SDK compilation results:

  • Resources Used

    Minimize the number of resources the partition uses to maximize the resources available for the OpenCL kernels.

  • Kernel Clock Frequency

    Intel® recommends that the kernel clock has a high clock constraint (for example, greater than 350 MHz for a Stratix® V device). The amount of logic in the partition clocked by the kernel clock should be relatively small. This logic should not limit the kernel clock speed for even the simplest OpenCL kernels. Therefore, at least within the partition, the kernel clock should have a high clock constraint.

  • Host-to-Memory Bandwidth

    The host-to-memory bandwidth is the transfer speed between the host processor to the physical memories on the accelerator card. To measure this memory bandwidth, compile and run the host application included with the Custom Platform Toolkit.

  • Kernel-to-Memory Bandwidth

    The kernel-to-memory bandwidth is the maximum transfer speed possible between the OpenCL kernels and global memory.

    To measure this memory bandwidth, compile and run the host program included in the /tests/boardtest/host directory of the Custom Platform Toolkit.

  • Fitter Quality of Results (QoR)

    To ensure that OpenCL designs consuming much of the device's resources can still achieve high clock frequencies, region-constrain the partition to the edges of the FPGA. The constraint allows OpenCL kernel logic to occupy the center of the device, which has the most connectivity with all other nodes.

    Test compile large designs to ensure that other Fitter-induced artifacts in the partition do not interfere with the QoR of the kernel compilations.

  • Routability

    The routing resources that the partition consumes can affect the routability of a compiled OpenCL design. A kernel might use every digital signal processing (DSP) block or memory block on the FPGA; however, routing resources that the partition uses might render one of these blocks unroutable. This routing issue causes compilation of the Intel® Quartus® Prime project to fail at the fitting step. Therefore, it is imperative that you test a partition with designs that use all DSP and memory blocks.