Video and Vision Processing Suite Intel® FPGA IP User Guide

ID 683329
Date 7/08/2024
Public
Document Table of Contents
1. About the Video and Vision Processing Suite 2. Getting Started with the Video and Vision Processing IPs 3. Video and Vision Processing IPs Functional Description 4. Video and Vision Processing IP Interfaces 5. Video and Vision Processing IP Registers 6. Video and Vision Processing IPs Software Programming Model 7. Protocol Converter Intel® FPGA IP 8. 1D LUT Intel® FPGA IP 9. 3D LUT Intel® FPGA IP 10. Adaptive Noise Reduction Intel® FPGA IP 11. Advanced Test Pattern Generator Intel® FPGA IP 12. AXI-Stream Broadcaster Intel® FPGA IP 13. Bits per Color Sample Adapter Intel FPGA IP 14. Black Level Correction Intel® FPGA IP 15. Black Level Statistics Intel® FPGA IP 16. Chroma Key Intel® FPGA IP 17. Chroma Resampler Intel® FPGA IP 18. Clipper Intel® FPGA IP 19. Clocked Video Input Intel® FPGA IP 20. Clocked Video to Full-Raster Converter Intel® FPGA IP 21. Clocked Video Output Intel® FPGA IP 22. Color Plane Manager Intel® FPGA IP 23. Color Space Converter Intel® FPGA IP 24. Defective Pixel Correction Intel® FPGA IP 25. Deinterlacer Intel® FPGA IP 26. Demosaic Intel® FPGA IP 27. FIR Filter Intel® FPGA IP 28. Frame Cleaner Intel® FPGA IP 29. Full-Raster to Clocked Video Converter Intel® FPGA IP 30. Full-Raster to Streaming Converter Intel® FPGA IP 31. Genlock Controller Intel® FPGA IP 32. Generic Crosspoint Intel® FPGA IP 33. Genlock Signal Router Intel® FPGA IP 34. Guard Bands Intel® FPGA IP 35. Histogram Statistics Intel® FPGA IP 36. Interlacer Intel® FPGA IP 37. Mixer Intel® FPGA IP 38. Pixels in Parallel Converter Intel® FPGA IP 39. Scaler Intel® FPGA IP 40. Stream Cleaner Intel® FPGA IP 41. Switch Intel® FPGA IP 42. Tone Mapping Operator Intel® FPGA IP 43. Test Pattern Generator Intel® FPGA IP 44. Unsharp Mask Intel® FPGA IP 45. Video and Vision Monitor Intel FPGA IP 46. Video Frame Buffer Intel® FPGA IP 47. Video Frame Reader Intel FPGA IP 48. Video Frame Writer Intel FPGA IP 49. Video Streaming FIFO Intel® FPGA IP 50. Video Timing Generator Intel® FPGA IP 51. Vignette Correction Intel® FPGA IP 52. Warp Intel® FPGA IP 53. White Balance Correction Intel® FPGA IP 54. White Balance Statistics Intel® FPGA IP 55. Design Security 56. Document Revision History for Video and Vision Processing Suite User Guide

52.3. Warp IP Block Description

The Warp IP accepts RGB or YUV format video input from its Intel FPGA streaming video interface and directly stores the video data in external memory. The video input process has a pool of four frame buffers that it uses to store the incoming video data. The IP accesses the buffers in a cyclic order.

The video output process generates RGB or YUV format Intel FPGA streaming video using the warped image data.

The IP generates arbitrary warps based on a transform mesh. If you turn on Use easy warp, it offers a fixed set of transforms (rotations and mirroring) without needing a transform mesh. If you turn off Use easy warp, the IP processes the buffered input video by the configured number of warp engines to apply the required warp. Three coefficient tables control the warp engines to define the warp that the IP applies. External memory stores the three coefficient tables that are generated using the Warp IP software API.

The IP defines the required warp with a backward mapping from the output to the input pixel positions. It represents the warp as a subsampled mesh that defines the mapping in 8x8 or 16x16 regions. For output pixel mappings within the 8x8 or 16x16 positions, the warp engine applies bilinear interpolation. For any input image dimensions (height or width) that are greater than 3840, use a 16x16 mesh.

If you turn off Use single memory bounce, the IP operates with two bounces through external memory. The IP buffers the input video and it writes back the resultant warped image to external memory into one of two output video buffers. The IP writes to these dual output buffers alternately. The IP reads the warped image in the output video buffers and passes out of the IP.

If you turn on Use single memory bounce, the IP operates with just a single bounce through external memory—the buffering of the input video. The IP transfers the resultant warped image directly to the video output process without passing through external memory.

Figure 134. Warp IP block diagram (Double Memory Bounce)

The figure shows a high-level block diagram for the Warp IP with its connection to external memory when Use single memory bounce is off. In this configuration, the engines read and write video data through the external memory.

Figure 135. Warp IP block diagram (Single Memory Bounce)The figure shows a high-level block diagram for the Warp IP with its connection to external memory when Use single memory bounce is on. In this configuration, the engines only read video data from the external memory. The engines transmit their video data directly to the output process without the data passing through external memory.
Figure 136. Warp image transform examplesFrom left: arbitrary warp with a 5x5 array of control points, four corner warp with some radial distortion.

Coefficient Tables

Each engine within the Warp IP has read access to its own set of three coefficient tables that define and control the image transform that the IP applies. The three different tables are:

  • Mesh coefficients that define the output to input pixel transform
  • Fetch coefficients that control the loading of the input image into the cache memory within the engine(s).
  • Filter coefficients that control the mapping from the cache memory as the IP generates the interpolated or filtered output pixels.

The format of the mesh coefficients is different to the mesh data that you provide to the software API. The Software API uses 32-bit signed integers for the mesh values; the Warp IP uses a 16-bit offset binary format.

The IP needs just the mesh data to define the warp. The software API uses this mesh data to generate the required coefficient tables.

Warp Mesh Interpolation

The IP defines the warp transform using either an 8x8 or 16x16 subsampled mesh. This mesh defines the mapping from the output pixel positions to the corresponding input pixel positions. Thesubsampled mesh requires that only the mappings for the following output pixel positions are defined:

8x8 mesh

(0,0), (8,0), (16,0) … (W, 0)
(0,8), (8,8), (16,8) … (W, 8)
.
(0,H), (8, H), (16, H) … (W, H)

where W=8*ceil(image width/8) and H=8*ceil(image height/8)

16x16 mesh
(0,0), (16,0), (32,0) … (W, 0)
(0,16), (16,16), (32,16) … (W, 16)
 . 
(0,H), (16, H), (32, H) … (W, H) 

where W=16*ceil(image width/16) and H=16*ceil(image height/16)

Using either the 8x8 or 16x16 mesh is based on the input image size. If either the input image width or height is greater than 3840, you must use a 16x16 mesh. If both the input image width and height are equal to or less than 3840, you must use an 8x8 mesh.

To generate the output pixel positions that lie in between these 8x8 or 16x16 positions, the Warp IP uses bilinear interpolation.

Output Pixel Interpolation and Filtering

The IP generates output pixels with the pixel data from the associated input pixel positions as defined by the warp that the IP applies. The IP generates output pixel values with a bicubic interpolation calculation using a 4x4 kernel of the associated input pixel values.

The weightings for the interpolation over the 4x4 kernel are a combination of a bicubic function and a variable low pass filtering function. The software API automatically applies low pass filtering, which it bases on the amount of downscaling that results for that region of the warp.

When Use mipmaps is on, the Warp IP software automatically identifies regions of the output image that require the downscaled, mipmap images and the interpolation and filtering settings are chosen accordingly.

Blank Skip Regions

When you configure the Warp IP to substantially downscale regions of an image, large areas of the output image can map to points outside the input image. These unmapped regions result in the IP producing black.

Because these regions in the output image do not require any processing of the input image by the Warp IP, for efficiency the IP skips the processing associated with these regions. This skipping process is setup automatically by the software API which determines, from the desired warp mapping, which regions the IP skips. You see this behavior only when Use single memory bounce is off.

Easy Warp

When you turn on Use easy warp:

  • The IP supports rotations of 0°, 90°, 180° and 270° and you can apply a horizontal mirror operation before the selected rotation.
  • The IP does not configure any engines.
  • The IP applies the required rotational or mirrored transform by the video output block as it reads data from external memory.
  • The IP does not require any processing engines, giving resource savings and memory bandwidth savings.

When Use easy warp is on, ensure that both Use mipmaps and Use single memory bounce are off.

For easy warp rotations of 0 or 180°, the maximum input width dimension must not exceed the maximum output width. You set the maximum width with the Maximum output video width parameter. For easy warp rotations of 90° or 270°, the transposing of vertical and horizontal dimensions places restrictions on the input resolution.

Table 1064.  Input Resolution Restrictions for 90° or 270°
Maximum Output Video Width Input Height Restriction
2048 1088
3840 2176
Figure 137. Warp IP block diagram (Easy Warp)

The figure shows a high-level block diagram for the Warp IP with Use Easy warp with its connection to external memory.

Figure 138. Easy Warp image transform examplesFrom top left clockwise: original image, mirrored, 90° rotate and 180° rotate.

Single Memory Bounce

Generally, turning on Use single memory bounce gives reduced memory bandwidth compared to when it is off. However, single memory bounce may require larger internal RAM. The RAM usage depends on the specific warp transform. For some transforms the IP requires larger amounts of cache. Any increased cache requirement requires increased internal block RAM usage.

With Use single memory bounce on, three cache size options are available: 256, 512 and 1024 cache blocks per engine. With Use single memory bounce off, the IP has a fixed cache size of 256 cache blocks per engine.

Whether you turn on Use single memory bounce depends on the actual transform the IP performs. Intel provides a software tool that determines how much cache the IP needs to process the desired transform. You provide the tool with information such as the input and output resolutions, the transform required, and the number of engines to use. The tool then provides guidance on how much cache to use to process the transform.

For information on the Warp block cache tool, refer to Block Cache Tool. For information on the memory bandwidth implications of running with a single memory bounce, refer to External Memory for Warp IP.

Single Memory Bounce Cache Example

An example of cache usage is given based on a 45 degree rotation with UHD input and output resolutions at 60 fps on an Intel Arria 10 device. This throughput requires that the IP uses two engines.

With Use single memory bounce off, the block RAM usage by the IP is 365 M20Ks.

With Use single memory bounce on, the IP requires 512 cache blocks per engine because of the constraints of the 45 degree rotation. This cache block requirement translates to a block RAM usage of 407 M20Ks.

Table 1065.  Block RAM Usage – UHD 45 Degree Rotate
Use single memory bounce Cache blocks per engine Total block RAM usage
On 512 407
Off 256 (fixed) 365

Transform Compression Limits and Mipmaps

There is a fundamental compression limit of 2:1 in the way the Warp IP applies the required transform in generating the output image. This limit comes from the low-pass filtering performed during the interpolation calculation and from the structure of the cache blocks used in the pixel processing. The limit may be overcome by using downscaled versions of the input image as the source data in the interpolation and filtering stage.

The figures show how you may use un this way the downscaled versions of the input image, available as a pyramid of the various downscale factorsy.

Figure 139. Transform Compression Limit with Use mipmaps offUsing only the original image in the application of the transform gives a final compression limit of 2:1.
Figure 140. Transform Compression Limit with Use mipmaps onUsing the original image and a pyramid of downscaled images in the application of the transform gives a final compression limit of 256:1.

When Use mipmaps is on, the Warp IP automatically generates and uses this pyramid of downscaled input images as required by the transform it performs.