50.3.4. External Memory for Warp IP

Video and Vision Processing Suite Intel® FPGA IP User Guide

Download PDF

ID 683329

Date 5/08/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: onu1619702466280

Ixiasoft

View Details

50.3.4. External Memory for Warp IP

The IP requires access to two separate areas of external memory: one for its input and output video buffers and one for its coefficient tables. The processor system running the Warp Software API must be able to access the coefficient tables but does not need access to the buffer area.

Memory Space Allocation in External Memory

Table 1007. Warp IP Video Buffer Memory Region - maximum usageThe table defines how much space is required in external memory by the Warp IP for the video buffer region when you turn off Use easy warp and turn off Use single memory bounce. This space depends on the size of the images to be processed in a system. It is defined by the Memory frame buffer size parameter. Six buffers require space in total: four input and two output.
Buffer Space Configuration	Region Size (MB)	Memory Region Required	Alignment (multiples of)
SD buffer size (1024x1024)	24	0x0180_0000	0x0200_0000
HD buffer size (2048x2048)	96	0x0600_0000	0x0800_0000
UHD buffer size (4096x4096)	384	0x1800_0000	0x2000_0000

Table 1008. **Warp IP Video Buffer Memory Region - reduced usage**
The table defines how much space is required in external memory by the Warp IP for the video buffer region when you turn on Use easy warp or turn on **Use single memory bounce**. This space depends on the size of the images to be processed in a system. It is defined by the **Memory frame buffer size** parameter. Four buffers require space in total.
Buffer Space Configuration	Region Size (MB)	Memory Region Required	Alignment (multiples of)
SD buffer size (1024x1024)	16	0x0100_0000	0x0100_0000
HD buffer size (2048x2048)	64	0x0400_0000	0x0400_0000
UHD buffer size (4096x4096)	256	0x1000_0000	0x1000_0000

The IP passes the base address of the memory region allocated to the frame buffers to the software API using the ram_addr element in the structure.

The memory region that the coefficient tables require is related to the number of warp engines, the resolution of the images, and the type of warp. The IP only requires this memory region when you turn off Use easy warp.

Table 1009. Warp IP Coefficient Tables Memory RegionThe table shows the maximum size of the coefficient table memory region, per engine.
Warp Engines	Region Size (MB)	Memory Region Required	Alignment (multiples of)
1	16	0x0100_0000	0x0100_0000
2	32	0x0200_0000	0x0200_0000

Bandwidth to External Memory

The performance of the interface from the Warp IP to the external memory is important for the correct operation of a system using the Warp IP.

The Warp IP generates a substantial amount of memory traffic. It has up to four video streams passing to and from external memory. In addition, each engine has three read streams to access the coefficient tables. All these streams combine to make Warp IP memory accesses complex. The streams affect how much efficiency you can obtain when accessing memory such as DDR4.

The Warp IP memory controller mitigates potential inefficiencies caused by these complex access patterns. It uses burst lengths of 8 beats for all its read and write accesses to improve the burst performance of DDR4 memory. It also attempts to cluster individual read and write bursts together to eliminate some of the issues with read and write turnaround dead time at the DDR4 interface.

These memory access patterns depend on the image transform that you apply. Some complex image transforms may reduce memory traffic because of the skip region functionality. When Use single memory bounce is off, one of the worst transforms for generated memory traffic is a unity warp that gives a 1:1 mapping between input and output pixels. When Use single memory bounce is on, memory traffic is proportional to the amount of compression in the vertical direction of the transform. The higher the amount of compression in the transform vertically, the higher the memory bandwidth.

The operation of the Warp IP is easier to predict when it is the only user of the DDR4 memory in a system. Ensure the Warp IP is the only high bandwidth user of the DDR4 memory in a system. When other high bandwidth accesses are made to the memory at the same time as the Warp IP, ensure that any interactions don’t adversely affect performance.

Total Warp IP Memory Traffic

Three different configurations of the warp IP affect the way it connects to the external memory and hence affect the total memory traffic:

When Use easy warp is on
Use easy warp is off and Use single memory bounce is either on or off.

Figure 136. External Memory Data Streams

The figure shows a schematic view of all the possible data streams the IP uses. Depending on the settings of Use easy warp and Use single memory bounce, the IP does not require all of these streams.

Table 1010. Active Data Streams for Different Configurations
The table shows which external memory data streams are active for the different configuration settings.
Data Stream	Use easy warp on	Use single memory bounce on	Use single memory bounce off
Input video	Active	Active	Active
Cache loads		Active	Active
Warped image			Active
Coefficient reads		Active	Active
Output video	Active		Active

The bandwidth for each data stream is:

A video stream (input video, warped image and output video)
A coefficient stream (coefficient reads)
Cache loads (a scaled version of a video stream).

Peak Memory Bandwidth Approximation for Video Streams

The equation calculates the peak bandwidth (in bits per second) for the video streams passing through the warp (where number_of_lines includes blanking). Each pixel of data is transferred to or from external memory as a 32bit word.

Peak bandwidth = 32 * pixels_per_line * number_of_lines * frame_rate

Table 1011. Video Stream Memory Bandwidth Examples
The table shows the peak memory bandwidth per video stream for various resolutions and frame rates.
Resolution	Frame Rate (fps)	Peak Data Rate (Gbps)
3840x2160 (2250 lines)	60	16.6
3840x2160 (2250 lines)	30	8.3
1920x1080 (1125 lines)	60	4.2

Peak Memory Bandwidth Approximation for Coefficient Streams

You can approximate the memory bandwidth required by the coefficient streams as 9% of a video stream.

Table 1012. Coefficient Stream Memory Bandwidth Examples
The table shows the peak memory bandwidth per stream for various resolutions and frame rates.
Resolution	Frame Rate (fps)	Peak Data Rate (Gbps)
3840x2160	60	1.5
3840x2160	30	0.75
1920x1080	60	0.4

Peak Memory Bandwidth Approximation for Cache Loads

The memory bandwidth required by the IP for the cache loads is closely linked to the warp transform that the IP applies. The memory bandwidth is proportional to the bandwidth required by a standard video stream. It is also affected by Use single memory bounce.

When Use single memory bounce is off, the worst-case bandwidth required for cache loads is experienced for a 1:1 or unity warp and is approximately 25% higher than a normal video stream. The unity warp gives an upper bound for the bandwidth. Other transforms give lower.

When Use single memory bounce is on, the amount of vertical compression in the transform that the IP performs affects the bandwidth required for cache loads. A 1:1 or unity warp has a 1:1 vertical compression. The resultant cache load bandwidth is equivalent to the bandwidth for a standard video stream. In contrast, a 53 degree vertical keystone warp has a vertical compression approaching 2:1 in some regions. The maximum cache load bandwidth for this warp is two times the standard video stream bandwidth. The 2:1 ratio provides an upper bound for the cache load bandwidth for recommended operation.

The recommended 2:1 compression limit comes from the limitations of the low pass filtering that takes place in the bicubic pixel interpolation process when generating output pixels. However, you can generate transforms that have a compression ratio of greater than 2:1, which can potentially affect the single memory bounce memory bandwidth and reduce the final picture quality.

Memory Interface Bandwidth Requirements

Table 1013. Memory Interface Bandwidth Requirements – Easy WarpThe table shows the burst data rate scaling across resolutions and frame rates when **Use easy warp** is set to on. When **Use easy warp** is on, you can approximate the memory bandwidth using two video streams.
Resolution	Frame Rate (fps)	Maximum Burst Data Rate (Gbps)
3840x2160	60	34
3840x2160	30	17
1920x1080	60	8.5

Table 1014. Memory Interface Bandwidth Requirements – Single Memory BounceThe table shows the burst data rate scaling across resolutions and frame rates when **Use single memory bounce** is on. Tthe cache load data rate assumes a 2:1 vertical compression factor, which gives worst-case figures. When **Use single memory bounce** is on, you can approximate the memory bandwidth using one video stream, a coefficient stream, and the cache load stream (with an upper bound of twice the bandwidth of a standard video stream).
Resolution	Frame Rate (fps)	Maximum Burst Data Rate (Gbps)
3840x2160	60	51.3
3840x2160	30	25.7
1920x1080	60	8.5

Table 1015. Memory Interface Bandwidth Requirements - Double Memory Bounce The table shows the burst data rate scaling across resolutions and frame rates when **Use single memory bounce** is off. The cache load data rate assumes a 1:1 warp transform, which gives the worst-case figures. You can approximate the memory bandwidth using three video streams, a coefficient stream, and the cache load stream (with a 25% overhead above the bandwidth of a standard video stream).
Resolution	Frame Rate (fps)	Maximum Burst Data Rate (Gbps)
3840x2160	60	72
3840x2160	30	36
1920x1080	60	18

Intel references these burst data rates to the memory interface of the IP. The total data rates available are affected by other factors outside of the IP such as the performance of the interconnect fabric and the efficiency of the memory controller.

Example system sharing access to memory

In this example system the Warp IP shares the DDR4 interface with a frame buffer in a system that processes UHD frames at 60 fps. The system runs on an Intel Arria 10 GX Development Kit with the DDR4 EMIF running a 2,133 MHz interface to a DDR4 memory. This system has four memory-mapped hosts accessing the memory: processor, frame buffer read, frame buffer write, and warp.

Figure 137. Warp and Video Frame Buffer Platform DesignerThe figure shows the Platform Designer connectivity where the Frame Buffer II component is sharing access to the DDR4 EMIF with the Warp IP. The Frame Buffer is part of the same video processing pipeline as the Warp IP. For clarity, the figure only shows the Avalon memory-mapped Interfaces and Show Arbitration Shares is on.

For this system to work:

Configure Frame Buffer to use bursts of 32 beats for read and write.
Configure Frame Buffer to use read and write FIFO depths of 256
Set the arbitration weighting at the front end of the DDR4 EMIF to 16:1 in favor of the Warp IP (versus the processor and the Frame Buffer’s read and write interfaces connected through the mm_vfb_bridge component).
Set the Maximum pending read transactions parameter in the pipelined transfers section of the Avalon memory-mapped ports to be at 8.
Set Limit interconnect pipeline stages to for the domain at the front end of the DDR4 EMIF to 4. This limit helps to meet timing on the memory interface clock domain.

Memory accesses from the processor may adversely affect the performance of the rest of the system. If the performance is affected, increase the arbitration weightings at the mm_vfb_bridge component to be 1:2:2 or higher in favor of the Frame Buffer’s read and write interfaces. Alternatively, implement a fixed priority arbitration scheme at the mm_vfb_bridge to reduce the effect of the processor’s memory accesses.

Figure 138. Video Frame Buffer Parameterization

Figure 139. Maximum Pending Read Transactions

Figure 140. Parameterizing interconnect pipeline stages

Multiple Warp IPs sharing access to memory

Figure 141. Multiple Warp IPs sharing access to memory The figure shows an example with two Warp IPs that share a DDR4 interface. To match the burst access patterns of the Warp IP, set the arbitration values at the combining interface to 8.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Video and Vision Processing Suite Intel® FPGA IP User Guide

50.3.4. External Memory for Warp IP

Memory Space Allocation in External Memory

Bandwidth to External Memory

Total Warp IP Memory Traffic

Peak Memory Bandwidth Approximation for Video Streams

Peak Memory Bandwidth Approximation for Coefficient Streams

Peak Memory Bandwidth Approximation for Cache Loads

Memory Interface Bandwidth Requirements

Example system sharing access to memory

Multiple Warp IPs sharing access to memory