30.3. Scaler IP Functional Description

Video and Vision Processing Suite Intel® FPGA IP User Guide

Download PDF

ID 683329

Date 10/02/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: hfb1639651400541

Ixiasoft

View Details

30.3. Scaler IP Functional Description

The scaler supports three options for the algorithm that resizes the video fields: nearest neighbor, bilinear and polyphase.

Nearest neighbor

Nearest neighbor is the lowest cost and lowest quality algorithm. Each pixel in the output field is a direct copy of a pixel from the input field, with no filtering or interpolation between pixels. The algorithm repeats or drops input pixels according to the required scaling ratio. If out[i,j] is the pixel value at horizontal position i and vertical position j in the output field. in[x,y] is the pixel value at horizontal position x and vertical position y in the input field. The equation defines the pixel values in the output field:

Equation 3. Nearest Neighbor Equations

$x = (\frac{i \times w i d t h_{i n}}{w i d t h_{o u t}})$

$y = (\frac{j \times h e i g h t_{i n}}{h e i g h t_{o u t}})$

$o u t [i, j] = i n [x, y]$

The nearest neighbor algorithm requires a line buffer with storage for a single line of the input field to allow for this line to repeat multiple times at the output during upscales. Because nearest neighbor scaling does not involve any interpolation or filtering, the resulting image can have a blocky appearance.

Bilinear

Bilinear scaling offers improved quality compared to nearest neighbor by interpolating between neighboring pixels to remove the blocky look of nearest neighbor scaling. Bilinear scaling is better suited to upscaling (increasing image size) than downscaling (reducing image size). The cut-off frequency of the basic bilinear filter is generally too high to remove all the aliasing artifacts that downscaling can introduce. However, even for upscales the results can look somewhat blurred, with the edges in the image softened.

The bilinear algorithm selects the same input pixel to create each output pixel as the nearest neighbor algorithm. It builds a 2x2 pixel window around the target input pixel, with the target pixel in the top left corner of the window. The floor function calculates the input pixel position (the values of x and y) to give integer indices, as pixels only exist at integer locations. But the integer indices have some error compared to the ideal location that preserves all the fractional position information. For example, if the ideal value before applying the floor function is 1.5, the integer value after applying the floor function is 1, and the error is 0.5. The bilinear algorithm uses the horizontal and vertical position error values, err _h and err _v respectively, to create coefficients that, when applied to the 2x2 pixel window created around the integer pixel location, produce a resulting pixel that is effectively located at the desired fractional position. The equations show how the values of the coefficients are created and applied to the input window of pixels to create the output pixel.

Equation 4. Bilinear Equations

$c o e f f_{0} = ((1 ≪ f r a c_{h}) - e r r_{h}) \times ((1 ≪ f r a c_{v}) - e r r_{v})$

$c o e f f_{1} = e r r_{h} \times ((1 ≪ f r a c_{v}) - e r r_{v})$

$c o e f f_{2} = ((1 ≪ f r a c_{h}) - e r r_{h}) \times e r r_{v}$

$c o e f f_{3} = e r r_{h} \times e r r_{v}$

$o u t (i, j) = i n (x, y) \times {c o e f f}_{0} + i n (x + 1, y) \times {c o e f f}_{1} + i n (x, y + 1) \times {c o e f f}_{2} + i n (x + 1, y + 1) \times {c o e f f}_{3}$

To calculate the values of err _h and err _v with exact precision for all possible scaling ratios requires an infinite number of fractional bits in the hardware implementing the mathematics. You must specify via parameters how many fraction bits you want to include for the calculations in the horizontal and vertical directions, frac _h and frac _v respectively. The IP takes the value for frac _h from the Horizontal coefficient fraction bits parameter, and the value for frac _v from the Vertical coefficient fraction bits parameter. With the desired level of precision set, the equation shows the values of and

${e r r}_{h} = \frac{(i \times w i d t h_{i n} % w i d t h_{o u t}) ≪ f r a c_{h}}{w i d t h_{o u t}}$

${e r r}_{v} = \frac{(i \times h e i g h t_{i n} % h e i g h t_{o u t}) ≪ f r a c_{v}}{h e i g h t_{o u t}}$

To create the 2x2 pixel window required for the bilinear filter, the bilinear algorithm requires a line buffer with storage for two lines of input video.

Polyphase

The polyphase algorithm requires the most resources, but it produces the highest quality results. It uses interpolation filters that are larger than the 2x2 tap filter used for bilinear scaling. Depending on the coefficients you select, these filters can provide improved frequency response, resulting in less blurring on the edges during upscales, and less aliasing artifacts during downscales. The increased size of the filters requires an increase in the number of input video lines that must be stored to create the vertical window, and increased DSP block (multiplier) usage to implement the filter mathematics.

The polyphase algorithm uses the same initial integer pixel position as nearest neighbor scaling and calculates positional error values in the same way as the bilinear algorithm. However, instead of using these error values directly to calculate the filter coefficients, the polyphase algorithm uses the error values as addresses into horizontal and vertical filter coefficients memories. Each address in the coefficient memory is referred to as a phase (for reasons that are explained in the coefficient selection section) and you define the number of horizontal and vertical phases, num_phase _h and num_phase _v respectively, via parameters. The Number of horizontal phases parameter sets the value for num_phase _h and the Number of vertical phases parameter sets the value for num_phase _v . The equation shows the horizontal phase, phase _h, and the vertical phase, phase _v, for each output pixel:

Equation 5. Phase Equations

${p h a s e}_{h} = \frac{(i \times w i d t h_{i n} % w i d t h_{o u t}) \times n u m_p h a s e s_{h}}{w i d t h_{o u t}}$

$p h a s e_{v} = \frac{(i \times h e i g h t_{i n} % h e i g h t_{o u t}) \times n u m_p h a s e s_{v}}{h e i g h t_{o u t}}$

You define the number of taps used in the horizontal and vertical scaling filters (num_taps _h and num_taps _v respectively). The number of taps can be any value between 4 and 64. A higher number of taps can allow for a more precise filter transfer function but comes at the cost of extra DSP block utilization and, in the case of the vertical filter, increased block memory utilization in the line buffer required to create the vertical sample window. Each vertical or horizontal phase in the coefficient memory contains one coefficient for each tap of the vertical or horizontal filter.

The scaler implements the vertical scaling function first (if selected), followed by the horizontal scaling function (if selected). The result of the vertical scaling is an intermediate image with the desired output height but retaining the original input width. If inter[x,j] as the pixel value in the intermediate image at horizontal position x and vertical position j, and coeff f _v [n] as the vertical scaling filter coefficient for tap N (selected from phase _v), the equation shows how the IP calculates the intermediate image. The filter taps are indexed with 0 the ‘oldest’ data (closest to the top edge of the image) and num_taps _v - 1 the newest data (closest to the bottom edge of the image).

Equation 6. Intermediate Image Equation

$i n t e r (x, j) = \sum_{n = 0}^{n < n u m_t a p s_{v}} i n (x, (y - \frac{(n u m_t a p s_{v} - 1)}{2} + n)) \times c o e f f_{v} [n]$

If coeff f _h [n] is the horizontal scaling filter coefficient for tap N (selected from phase _v), the equation shows how the IP calculates the final output image from the intermediate image. The filter taps are again indexed with 0 the oldest data (closest to the left edge of the image) and num_taps _h - 1 the newest data (closest to the right edge of the image)

Equation 7. Final Output Image Equation

$o u t (i, j) = \sum_{n = 0}^{n < n u m_t a p s_{h}} i n t e r ((x - \frac{(n u m_t a p s_{h} - 1)}{2} + n), j) \times c o e f f_{h} [n]$

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Video and Vision Processing Suite Intel® FPGA IP User Guide

30.3. Scaler IP Functional Description

Nearest neighbor

Bilinear

Polyphase