DSP Builder for Intel® FPGAs (Advanced Blockset): Handbook

ID 683337
Date 9/30/2024
Public
Document Table of Contents

8.7. Flow Control in DSP Builder Designs

Use DSP Builder valid and channel signals with data to indicate when data is valid for synchronizing. You should use these signals to process valid data and ignore invalid data cycles in a streaming style to use the FPGA efficiently. You can build designs that run as fast as the data allows and are not sensitive to latency or devices fMAX and that can be responsive to backpressure.

This style uses FIFO buffers for capturing and flow control of valid outputs, loops, and for loops, for simple and complex nested counter structures. Also add latches to enable only components with state—thus minimizing enable line fan-out, which can otherwise be a bottleneck to performance.

Flow Control Using Latches

Generally hardware designers avoid latches. However, these subsystems synthesize to flip-flops.

Often designs need to stall or enable signals. Routing an enable signal to all the blocks in the design can lead to high fan-out nets, which become the critical timing path in the design. To avoid this situation, enable only blocks with state, while marking output data as invalid when necessary.

DSP Builder provides the following utility functions in the Additional Blocks Control library, which are masked subsystems.

  • Zero-Latency Latch (latch_0L)
  • Single-Cycle Latency Latch (latch_1L)
  • Reset-Priority Latch (SRlatch_PS)
  • Set-Priority Latch (SRlatch)

Some of these blocks use the Simulink Data Type Prop Duplicate block, which takes the data type of a reference signal ref and back propagates it to another signal prop. Use this feature to match data types without forcing an explicit type that you can use in other areas of your design.

Forward Flow Control Using Latches

The demo_forward_pressure example design shows how to use latches to implement forward flow control.

Flow Control Using FIFO Buffers

You can use FIFO buffers to build flexible, self-timed designs insensitive to latency. They are an essential component in building parameterizable designs with feedback, such as those that implement back pressure.

Flow Control and Backpressure Using FIFO Buffers

The demo_back_pressure design example shows how to use latches to implement back pressure flow control.

You must acknowledge reading of invalid output data. Consider a FIFO buffer with the following parameters:

  • Depth = 8
  • Fill threshold = 2
  • Fill period = 7

A three cycle latency exists between the first write and valid going high. The q output has a similar latency in response to writes. The latency in response to read acknowledgements is only one cycle for all output ports. The valid out goes low in response to the first read, even though the design writes two items to the FIFO buffer. The second write is not older than three cycles when the read occurs.

With the fill threshold set to a low value, the t output can go high even though the v out is still zero. Also, the q output stays at the last value read when valid goes low in response to a read.

Problems can occur when you use no feedback on the read line, or if you take the feedback from the t output instead with fill threshold set to a very low value (< 3). A situation may arise where a read acknowledgement is received shortly following a write but before the valid output goes high. In this situation, the internal state of the FIFO buffer does not recover for many cycles. Instead of attempting to reproduce this behavior, Simulink issues a warning when a read acknowledgement is received while valid output is zero. This intermediate state between the first write to an empty FIFO buffer and the valid going high, highlights that the input to output latency across the FIFO buffer is different in this case. This situation is the only time when the FIFO buffer behaves with a latency greater than one cycle. With other primitive blocks, which have consistent constant latency across each input to output path, you never have to consider these intermediate states.

You can mitigate this issue by taking care when using the FIFO buffer. The model needs to ensure that the read is never high when valid is low using the simple feedback. If you derive the read input from the t output, ensure that you use a sufficiently high threshold.

You can set fill threshold to a low number (<3) and arrive at a state where output t is high and output v is low, because of differences in latency across different pairs of ports—from w to v is three cycles, from r to t is one cycle, from w to t is one cycle. If this situation arises, do not send a read acknowledgement signal to the FIFO buffer. Ensure that when the v output is low, the r input is also low. A warning appears in the MATLAB command window if you ever violate this rule. If you derive the read acknowledgement signal with a feedback from the t output, ensure that the fill threshold is set to a sufficiently high number (3 or above). Similarly for the f output and the full period.

If you supply vector data to the d input, you see vector data on the q output. DSP Builder does not support vector signals on the w or r inputs, as the behavior is unspecified. The v, t, and f outputs are always scalar.

Flow Control using Simple Loop

Designs may require counters, or nested counters to implement indexing of multidimensional data. The Loop block provides a simple nested counter—equivalent to a simple software loop.

The enable input and demo_kronecker design example demonstrate flow control using a loop.

Flow Control Using the ForLoop Block

You can use either Loop or ForLoop blocks for building nested loops.

The Loop block has the following advantages:

  • A single Loop block can implement an entire stack of nested loops.
  • No wasted cycles when the loop is active but the count is not valid.
  • The implementation cost is lower because no overhead for the token-passing scheme exists.

The ForLoop block has the following advantages:

  • Loops may count either up or down.
  • You may specify the initial value and the step, not just the limit value.
  • The token-passing scheme allows the construction of control structures that are more sophisticated than just nesting rectangular loops.

When a stack of nested loops is the appropriate control structure (for example, matrix multiplication) use a single Loop block. When a more complex control structure is required, use multiple ForLoop blocks.