Visible to Intel only — GUID: GUID-55092ACF-929A-4FA5-97CE-A61C71308209
Tiling and Threading
The API of Integration Wrappers (IW) is designed to simplify tile-based processing of images. Tiling is based on the concept of region of interest (ROI).
Most IW image processing functions operate not only on whole images but also on image areas - ROIs. Image ROI is a rectangular area that is either some part of the image or the whole image.
ROI of an image is defined by the size and offset from the image origin, as shown in the figure below. The origin of an image is in the top left corner, with x values increasing from left to right and y values increasing downwards.
Borders Overlapping
Image filters use the borders concept to correctly process image pixels around the current pixel. A filter kernel can be applied to pixels that are outside of image boundaries, and the function must either extrapolate pixels using one of the border extrapolation methods (replicate, mirror, etc.) or use pixels from memory if the image border physically exists in memory.
Borders can complicate tiling, because for each tile you need to apply proper border InMem flags according to the current tile position relative to the image. If the filter border size is greater than 1 pixel, for some tile positions filter and image borders can overlap, which means that the filter border can be inside and outside of the image at the same time. Intel IPP functions do not support input with undefined borders, in such cases filtering may result in distorted pixels around the image borders.
Overlapping may happen only if the filter border size is more than 1 pixel and the following conditions are true:
- For left and top borders: tile_size < border_size
- For right and bottom borders: (image_size%tile_size > 0) && (image_size%tile_size < border_size)
You can ignore overlapped borders if you do not need the bit-exact quality of tiling around image boundaries. But to provide the same result as without tiling, you must tune the tile size manually to avoid overlapping or use special Integration Wrappers APIs, which can handle this problem for you. For more details, see the sections below.
The sections below explain the following IW tiling techniques:
- Manual tiling
- IwiTile-based tiling, including:
Manual tiling
IW functions are designed to be tiled using the IwiTile and IwsTile interfaces for image and signal functions, respectively. But if for some reasons automatic tiling with IwiTile is not suitable, there are special APIs to perform tiling manually.
When using manual tiling you need to:
- Shift images to a correct position for a tile using iwiImage_GetRoiImage
- If necessary, pass correct border InMem flags to a function using iwiTile_GetTileBorder
- If necessary, check the filter border around the image border using iwiTile_CorrectBordersOverlap
Here is an example of IW threading with OpenMP* using manual tiling:
#include <iostream>
#include "iw++/iw.hpp"
#ifdef _OPENMP
#include <omp.h>
#endif
int main(int, char**)
{
int fail = 0;
// Create images
ipp::IwiImage srcImage, cvtImage, dstImage;
srcImage.Alloc(ipp::IwiSize(320, 240), ipp8u, 3);
cvtImage.Alloc(srcImage.m_size, ipp8u, 1);
dstImage.Alloc(srcImage.m_size, ipp16s, 1);
#ifdef _OPENMP
int threads = omp_get_max_threads(); // Get threads number
#else
int threads = 4; // Just divide to porcess by tiles
#endif
ipp::IwiSize tileSize(dstImage.m_size.width, (dstImage.m_size.height + threads - 1)/threads); // One tile per thread
ipp::IwiBorderSize sobBorderSize = iwiSizeToBorderSize(iwiMaskToSize(ippMskSize3x3)); // Convert mask size to border size
ipp::IwiBorderType border = ippBorderRepl;
#ifdef _OPENMP
#pragma omp parallel num_threads(threads)
#endif
{
// Declare thread-scope variables
ipp::IwiBorderType threadBorder;
ipp::IwiImage srcTile, cvtTile, dstTile;
try
{
// Color convert threading
#ifdef _OPENMP
#pragma omp for
#endif
for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height)
{
ipp::IwiRoi tile(0, row, tileSize.width, tileSize.height); // Create actual tile rectangle
// Get images for current ROI
srcTile = srcImage.GetRoiImage(tile);
cvtTile = cvtImage.GetRoiImage(tile);
// Run functions
ipp::iwiColorConvert(srcTile, iwiColorRGB, cvtTile, iwiColorGray);
}
// Sobel threading
#ifdef _OPENMP
#pragma omp for
#endif
for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height)
{
ipp::IwiRoi tile(0, row, tileSize.width, tileSize.height); // Create actual tile rectangle
ipp::IwiTile::CorrectBordersOverlap(tile, border, sobBorderSize, cvtImage.m_size); // Check borders overlap and correct tile of necessary
threadBorder = ipp::IwiTile::GetTileBorder(tile, border, sobBorderSize, cvtImage.m_size); // Get actual tile border
// Get images for current ROI
cvtTile = cvtImage.GetRoiImage(tile);
dstTile = dstImage.GetRoiImage(tile);
// Run functions
ipp::iwiFilterSobel(cvtTile, dstTile, iwiDerivHorFirst, ippMskSize3x3, ipp::IwDefault(), threadBorder);
}
}
catch(...)
{
fail = 1;
}
}
if(fail)
{
std::cout << "Failure!\n";
return 1;
}
std::cout << "Success!\n";
return 0;
}
IwiTile-based tiling
IwiTile is a main interface structure for tiling in IW. This interface has two associated APIs:
- Basic tiling API with the iwiTile_ prefix
- Pipeline tiling API with the iwiTilePipeline_ prefix
Most IW image processing functions have the IwiTile parameter. For example, see the API of the iwiFilterSobel function:
iwiFilterSobel(
const IwiImage *pSrcImage,
IwiImage *pDstImage,
IwiDerivativeType opType,
IppiMaskSize kernelSize,
const IwiFilterSobelParams *pAuxParams,
IwiBorderType border,
const Ipp64f *pBorderVal,
const IwiTile *pTile
);
- pSrcImage and pDstImage are initialized with the size of the whole source and destination images accordingly
pTile is a pointer to the IwiTile structure. You do not need to shift input/output buffers and check borders manually. The IwiTile initialization function and processing function will place input and output buffers automatically. If you do not need to use tiling, pass NULL to pTile, and the whole image will be processed at once.
If a function does not have the IwiTile parameter, it means that the function cannot be tiled because of algorithmic limitations. You can use manual tiling for such functions, but it may produce incorrect results.
Basic tiling
You can use basic tiling to tile or thread one standalone function or a group of functions without borders. To apply basic tiling, initialize the IwiTile structure with the current tile ROI and pass it to the processing function.
For functions operating with different sizes for source and destination images, use the destination size as a base for tile parameters.
Here is an example of IW threading with OpenMP* using basic tiling with IwiTile:
#include <iostream>
#include "iw++/iw.hpp"
#ifdef _OPENMP
#include <omp.h>
#endif
int main(int, char**)
{
int fail = 0;
// Create images
ipp::IwiImage srcImage, cvtImage, dstImage;
srcImage.Alloc(ipp::IwiSize(320, 240), ipp8u, 3);
cvtImage.Alloc(srcImage.m_size, ipp8u, 1);
dstImage.Alloc(srcImage.m_size, ipp16s, 1);
#ifdef _OPENMP
int threads = omp_get_max_threads(); // Get threads number
#else
int threads = 4; // Just divide to porcess by tiles
#endif
ipp::IwiSize tileSize(dstImage.m_size.width, (dstImage.m_size.height + threads - 1)/threads); // One tile per thread
ipp::IwiBorderType border = ippBorderRepl;
#ifdef _OPENMP
#pragma omp parallel num_threads(threads)
#endif
{
// Declare thread-scope variables
ipp::IwiRoi roi;
try
{
// Color convert threading
#ifdef _OPENMP
#pragma omp for
#endif
for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height)
{
// Run functions with the current tile rectangle
ipp::iwiColorConvert(srcImage, iwiColorRGB, cvtImage, iwiColorGray, IwValueMax, ipp::IwDefault(), ipp::IwiRoi(0, row, tileSize.width, tileSize.height));
}
// Sobel threading
#ifdef _OPENMP
#pragma omp for
#endif
for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height)
{
// Run functions with the current tile rectangle
ipp::iwiFilterSobel(cvtImage, dstImage, iwiDerivHorFirst, ippMskSize3x3, ipp::IwDefault(), border, ipp::IwiRoi(0, row, tileSize.width, tileSize.height));
}
}
catch(...)
{
fail = 1;
}
}
if(fail)
{
std::cout << "Failure!\n";
return 1;
}
std::cout << "Success!\n";
return 0;
}
Pipeline tiling
With the IwiTile interface you can easily tile pipelines by applying a current tile to an entire pipeline at once instead of tiling each function one by one. This operation requires borders handling and tracking pipeline dependencies, which increases complexity of the API. But when used properly, pipeline tiling can increase scalability of threading or performance of non-threaded functions by performing all operations inside the CPU cache.
Here are some important details that you should take into account when performing pipeline tiling:
- Pipeline tiling is performed in reverse order: from destination to source, therefore:
- Use the tile size based on the destination image size
- Initialize the IwiTile structure with the IwiTilePipeline_Init for the last operation
- Initialize the IwiTile structure for other operations from the last to the first with IwiTilePipeline_InitChild
- Obtain the border size for each operation from its mask size, kernel size, or using the specific function returning the border size, if any.
- In case of threading, copy initialized IwiTile structures to a local thread or initialize them on a per-thread basis. Access to structures is not thread-safe.
- Do not exceed the maximum tile size specified during initialization. Otherwise, this can lead to buffers overflow.
The following example demonstrates IW threading with OpenMP* using IwiTile pipeline tiling.
#include <iostream>
#include "iw++/iw.hpp"
#ifdef _OPENMP
#include <omp.h>
#endif
int main(int, char**)
{
int fail = 0;
// Create images
ipp::IwiImage srcImage, dstImage;
srcImage.Alloc(ipp::IwiSize(320, 240), ipp8u, 3);
dstImage.Alloc(srcImage.m_size, ipp16s, 1);
#ifdef _OPENMP
int threads = omp_get_max_threads(); // Get threads number
#else
int threads = 4; // Just divide to porcess by tiles
#endif
ipp::IwiSize tileSize(dstImage.m_size.width, (dstImage.m_size.height + threads - 1)/threads); // One tile per thread
ipp::IwiBorderSize sobBorderSize = iwiSizeToBorderSize(iwiMaskToSize(ippMskSize3x3)); // Convert mask size to border size
ipp::IwiBorderType border = ippBorderRepl;
#ifdef _OPENMP
#pragma omp parallel num_threads(threads)
#endif
{
// Declare thread-scope variables
ipp::IwiImage cvtImage;
ipp::IwiTilePipeline roiConvert, roiSobel;
try
{
roiSobel.Init(tileSize, dstImage.m_size, border, sobBorderSize); // Initialize last operation ROI first
roiConvert.InitChild(roiSobel); // Initialize next operation as a dependent
// Allocate intermediate buffer
cvtImage.Alloc(roiConvert.GetDstBufferSize(), ipp8u, 1);
// Joined pipeline threading
#ifdef _OPENMP
#pragma omp for
#endif
for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height)
{
roiSobel.SetTile(ipp::IwiRoi(0, row, tileSize.width, tileSize.height)); // Set IwiRoi chain to current tile coordinates
// Run functions
ipp::iwiColorConvert(srcImage, iwiColorRGB, cvtImage, iwiColorGray, IwValueMax, ipp::IwDefault(), roiConvert);
ipp::iwiFilterSobel(cvtImage, dstImage, iwiDerivHorFirst, ippMskSize3x3, ipp::IwDefault(), border, roiSobel);
}
}
catch(...)
{
fail = 1;
}
}
if(fail)
{
std::cout << "Failure!\n";
return 1;
}
std::cout << "Success!\n";
return 0;
}
Product and Performance Information |
---|
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201 |