Visible to Intel only — GUID: hco1423076628126
Ixiasoft
Visible to Intel only — GUID: hco1423076628126
Ixiasoft
6.7.4. Floating-Point Mandlebrot Set
A complex number C is in the Mandelbrot set if for the following equation the value remains finite when repeatedly iterated:
z(n + 1) = z n 2 + C
where n is the iteration number and C is the complex conjugate
The system takes longer to perform floating-point calculations than for the corresponding fixed-point calculations. You cannot wait around for partial results to be ready, if you want to achieve maximum efficiency. Instead, you must ensure your algorithm fully uses the floating-point calculation engines. The design contains two floating-point math subsystems: one for scaling and offsetting pixel indexes to give a point in the complex plane; the other to perform the main square-and-add iteration operation.
For this design example, the total latency is approximately 19 clock cycles, depending on target device and clock speed. The latency is not excessive; but long enough that it is inefficient to wait for partial results.
FIFO buffers control the circulation of data through the iterative process. The FIFO buffers ensure that if a partial result is available for a further iteration in the z(n +1) = z n 2 + C progression, the design works on that point.
Otherwise, the design starts a new point (new value of C). Thus, the design maintains a full flow of data through the floating-point arithmetic. This main iteration loop can exert back pressure on the new point calculation engine. If the design does not read new points off the command queue FIFO buffers quickly enough, such that they fill up, the loop iteration stalls. The design does not explicitly signal the calculation of each point when it is required (and thus avoid waiting through the latency cycles before you can use it). The design does not attempt to exactly calculate this latency in clock cycles. The design tries to issue generate point commands the exact number of clock-cycles before you need them. You must change them each time you retarget a device, or change target clock rate. Instead, the design calculates the points quickly from the start and catches them in a FIFO buffer. If the FIFO buffer starts to get full—a sufficient number of cycles ahead of full—The design stops the calculation upstream without loss of data. This selfregulating flow mitigates latency while remaining flexible.
Avoid inefficiencies by designing the algorithm implementation around the latency and availability of partial results. Data dependencies in processing can stall processing.
The design example uses the FinishedThisPoint signal as the valid signal. Although the system constantly produces data on the output, it marks the data as valid only when the design finishes a point. Downstream components can then just process valid data, just as the enabled subsystem in the testbench captures and plot the valid points.
In both feedback loops, you must provide sufficient delay for the scheduler to redistribute as pipelining. In feed-forward paths you can add pipelining without changing the algorithm—DSP Builder changes only the timing of the algorithm. But in feedback loops, inserting a delay can alter the meaning of an algorithm. For example, adding N cycles of delay to an accumulator loop increments N different numbers, each incrementing every N clock cycles. The design must provide enough slack in each loop for the scheduler, which redistributes delays and pipelines operators, to be able to close timing by redistributing this slack. The scheduler must not change the total latency around the loop. The scheduler must ensure the function of the algorithm is unaltered. It must not change the total latency around the loop. It must ensure the function of the algorithm is unaltered. Such slack delays are in the top-level design of the synthesizable design in the feedback loop controlling the generation of new points, and in the FeedBackFIFO subsystem controlling the main iteration calculation. DSP Builder uses the minimum delay feature on the SampleDelay blocks to set these slack delays to the minimum possible delay that satisfies the scheduling solver. The example sets the SampleDelay block to the minimum latency that satisfies the schedule, which the DSP Builder solves as part of the integer linear programming problem that finds an optimum pipelining and scheduling solution. You can group delays into numbered equivalence groups to match other delays. In this design example, the single delay around the coordinate generation loop is in one equivalence group, and all the slack delays around the main calculation loop are in another equivalence group. The equivalence group field can contain any MATLAB expression that evaluates to a string. The SampleDelay block displays the delay that DSP Builder uses.
The FIFO buffers operate in show-ahead mode—they display the next value to be read. The read signal is a read acknowledgement, which reads the output value, discards it, and shows the next value. The design uses multiple FIFO buffers with the same control signal, which are full and give a valid output at the same time. The design only needs the output control signals from one of the FIFO buffers and can ignore the corresponding signals from the other FIFO buffers. As floating-point simulation is not bit accurate to the hardware, some points in the complex plane take fewer or more iterations to complete in hardware compared to the Simulink simulation. The results, when you are finished with a particular point, may come out in a different order. You must build a testbench mechanism that is robust to this feature. Use the testbench override feature in the Run All Testbenches block:
- Set the condition on mismatches to Warning
- Use the Run All Testbenches block to set an import variable, which brings the ModelSim results back into MATLAB and a custom verification function that sets the pass or fail criteria.
The model file is Mandelbrot_S.mdl.