End-to-End Object Detection for Unity* with IceVision* and OpenVINO™ Toolkit Part 2

ID 763014
Updated 11/17/2022
Version Latest
Public

author-image

By

By Christian Mills, with Introduction by Peter Cross

Overview

Part 1 of this tutorial series covered fine-tuning an object detection model using the IceVision library and exporting it as an OpenVINO™ IR model. This part covers creating a dynamic link library (DLL) file in Visual Studio to perform inference with this model using OpenVINO™ toolkit.

Important: This post assumes Visual Studio is present on your system.

Install OpenVINO™ Toolkit

We need to download the OpenVINO™ Toolkit before creating our Visual Studio project. Go to the OpenVINO™ toolkit download page linked below.

Download OpenVINO™ Toolkit

Select the options outlined in the image below and click the Download button.

Double-click the file once it finishes downloading and click the Extract button in the popup window.

The installer will then verify the computer meets the system requirements. The toolkit includes the Python scripts for converting models, which require Python 3.6, 3.7, 3.8, or 3.9 to run. We will only use the files for C++ development in this post.

 We can stick with the default Recommended Installation option.

The installer will then ask whether Intel can collect some information before starting the installation process.

Click Finish once the installation process completes.

Inspect OpenVINO™ Folder

If we look at the installation folder for the toolkit, we can see it also includes a version of OpenCV. We will use OpenCV to prepare image data from Unity* before feeding it to the model.

I like to copy the OpenVINO™ folder to a separate directory with other dependencies for my C++ projects.

Now, we can create our Visual Studio DLL project.

Create DLL Project

Open Visual Studio and select the Create a new project option.

Type DLL into the text box and select the Dynamic-Link Library (DLL) option. This option automatically configures a few parameters for us compared to starting with a standard console application.

Choose a name and location for the project and click the Create button. By default, the DLL file will use the project name.

Configure the Project

At the top of the window, open the Solution Configurations dropdown menu, and select Release.

Then, open the Solution Platform dropdown menu and select x64.

Add Include Directories

We need to tell Visual Studio where OpenVINO™ and OpenCV are so we can access their APIs. Right-click the project name in the Solution Explorer panel.

Select the Properties option in the popup menu.

In the Properties Window, open on the C/C++ dropdown. Select the Additional Include Directories section and click on <Edit..> in the dropdown.

Add the paths for the following folders, replacing <parent-folder-path> with the full path to the parent folder for the OpenVINO Toolkit, and click OK.

  • <parent-folder-path>\openvino_2022.1.0.643\runtime\include\ie
  • <parent-folder-path>\openvino_2022.1.0.643\runtime\include
  • <parent-folder-path>\openvino_2022.1.0.643\opencv\include
  • <parent-folder-path>\openvino_2022.1.0.643\runtime\3rdparty\tbb\include

Link Libraries

Next, open the Linker dropdown in the Properties window and select Input. Select Additional Dependencies and click <Edit..>.

Add the paths to the following files, replacing <parent-folder-path> with the full path to the parent folder for the OpenVINO Toolkit, and click OK.

  • <parent-folder-path>\openvino_2022.1.0.643\opencv\lib\*
  • <parent-folder-path>\openvino_2022.1.0.643\runtime\lib\intel64\Release\*
  • <parent-folder-path>\openvino_2022.1.0.643\runtime\3rdparty\tbb\lib\*.lib

Post Build Events

Our DLL file will depend on the following DLL files included with the OpenVINO™ and OpenCV libraries.

OpenCV DLL files

OpenVINO™ DLL files

We can add a post-build event in Visual Studio to automatically copy these DLL files to the build folder for the project at compile time. Open the Build Events dropdown in the Properties window and select Post-Build Event. Select Command Line and click <Edit..>.

Add the following commands, replacing <parent-folder-path> with the full path to the parent folder for the OpenVINO Toolkit, and click OK.

  • xcopy <parent-folder-path>\openvino_2022.1.0.643\opencv\bin\opencv_core453.dll $(SolutionDir)$(Platform)\$(Configuration)\ /c /y
  • xcopy <parent-folder-path>\openvino_2022.1.0.643\opencv\bin\opencv_imgproc453.dll $(SolutionDir)$(Platform)\$(Configuration)\ /c /y
  • xcopy <parent-folder-path>\openvino_2022.1.0.643\opencv\bin\opencv_imgcodecs453.dll $(SolutionDir)$(Platform)\$(Configuration)\ /c /y
  • xcopy <parent-folder-path>\openvino_2022.1.0.643\runtime\bin\intel64\Release\* $(SolutionDir)$(Platform)\$(Configuration)\ /c /y
  • xcopy <parent-folder-path>\openvino_2022.1.0.643\runtime\3rdparty\tbb\bin\tbb.dll $(SolutionDir)$(Platform)\$(Configuration)\ /c /y

Finally, click the Apply button and close the Properties window.

With the dependencies taken care of, we can start modifying the code.

Update Precompiled Header File

We will first update the pch.h Precompiled Header file with the required header files. We can open the pch.h file by selecting it in the Solution Explorer window.

Comment or remove the #include line for the framework.h header file.

// pch.h: This is a precompiled header file.
// Files listed below are compiled only once, improving build performance for future builds.
// This also affects IntelliSense performance, including code completion and many code browsing features.
// However, files listed here are ALL re-compiled if any one of them is updated between builds.
// Do not add files here that you will be updating frequently as this negates the performance advantage.

#ifndef PCH_H
#define PCH_H

// add headers that you want to pre-compile here
//#include "framework.h"

#endif //PCH_H

Add required header files

Next, we will add the required header files for OpenVINO and OpenCV below //#include "framework.h" line.

// pch.h: This is a precompiled header file.
// Files listed below are compiled only once, improving build performance for future builds.
// This also affects IntelliSense performance, including code completion and many code browsing features.
// However, files listed here are ALL re-compiled if any one of them is updated between builds.
// Do not add files here that you will be updating frequently as this negates the performance advantage.

#ifndef PCH_H
#define PCH_H

// add headers that you want to pre-compile here
//#include "framework.h"

#include "openvino/openvino.hpp"
#include <opencv2/opencv.hpp>

#endif //PCH_H

Update dllmain File

By default, the dllmain.cpp file contains the following code.

// dllmain.cpp : Defines the entry point for the DLL application.
#include "pch.h"

BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
                     )
{
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

We can delete everything below the #include "pch.h" line.

Create a macro to mark functions we want to make accessible in Unity*

// dllmain.cpp : Defines the entry point for the DLL application.
#include "pch.h"


// Create a macro to quickly mark a function for export
#define DLLExport __declspec (dllexport)

Wrap the code in extern "C" to prevent name-mangling issues with the compiler

The rest of our code will go inside here.

// Wrap code to prevent name-mangling issues
extern "C" {

}

Define variables

Inside the wrapper, we will declare the persistent variables needed for the DLL.

// Inference engine instance
ov::Core core;
// The user define model representation
std::shared_ptr<ov::Model> model;
// A device-specific compiled model
ov::CompiledModel compiled_model;

// List of available compute devices
std::vector<std::string> available_devices;
// An inference request for a compiled model
ov::InferRequest infer_request;
// Stores the model input data
ov::Tensor input_tensor;
// A pointer for accessing the input tensor data
float* input_data;

// model has only one output
ov::Tensor output_tensor;
// A pointer for accessing the output tensor data
float* out_data;

// The current source image width
int img_w;
// The current source image height
int img_h;
// The current model input width
int input_w;
// The current model input height
int input_h;
// The total number pixels in the input image
int n_pixels;
// The number of color channels 
int n_channels = 3;

// Stores information about a single object prediction
struct Object
{
    float x0;
    float y0;
    float width;
    float height;
    int label;
    float prob;
};

// Store grid offset and stride values to decode a section of the model output
struct GridAndStride
{
    int grid0;
    int grid1;
    int stride;
};

// The scale values used to adjust the model output to the source image resolution
float scale_x;
float scale_y;

// The minimum confidence score to consider an object proposal
float bbox_conf_thresh = 0.3;
// The maximum intersection over union value before an object proposal will be ignored
float nms_thresh = 0.45;

// Stores the grid and stride values to navigate the raw model output
std::vector<GridAndStride> grid_strides;
// Stores the object proposals with confidence scores above bbox_conf_thresh
std::vector<Object> proposals;
// Stores the indices for the object proposals selected using non-maximum suppression
std::vector<int> proposal_indices;

// The stride values used to generate the gride_strides vector
std::vector<int> strides = { 8, 16, 32 };

Define a function to get the number of compute devices

The first function we'll define will create a list of available device names and return the number of devices accessible by OpenVINO. We'll use this information to select which device to use to perform inference from the Unity application. There might be an option named GNA (Gaussian & Neural Accelerator). GNA is a highly specialized neural coprocessor for tasks like noise cancellation. We'll exclude it from the list of devices presented to the end user.

  • ov::Core::get_available_devices(): Returns devices available for inference
/// <summary>
/// Get the number of available compute devices
/// </summary>
/// <returns>The number of available devices</returns>
DLLExport int GetDeviceCount() 
{
    // Reset list of available compute devices
    available_devices.clear();

    // Populate list of available compute devices
    for (std::string device : core.get_available_devices()) {
        // Skip GNA device
        if (device.find("GNA") == std::string::npos) {
            available_devices.push_back(device);
        }
    }
    // Return the number of available compute devices
    return available_devices.size();
}

Define a function to get the name of a compute device

Next, we will define a function to return the name of a device at a specified index for the list of available devices.

/// <summary>
/// Get the name of the compute device name at the specified index
/// </summary>
/// <param name="index"></param>
/// <returns>The name of the device at the specified index</returns>
DLLExport std::string* GetDeviceName(int index) {
    return &available_devices[index];
}

Define method to generate stride values to navigate the raw model output

The method for generating the offset values used to traverse the output array is almost identical to the Python implementation from part 1.

/// <summary>
/// Generate offset values to navigate the raw model output
/// </summary>
/// <param name="height">The model input height</param>
/// <param name="width">The model input width</param>
void GenerateGridsAndStride(int height, int width)
{
    // Remove the values for the previous input resolution
    grid_strides.clear();

    // Iterate through each stride value
    for (auto stride : strides)
    {
        // Calculate the grid dimensions
        int grid_height = height / stride;
        int grid_width = width / stride;

        // Store each combination of grid coordinates
        for (int g1 = 0; g1 < grid_height; g1++)
        {
            for (int g0 = 0; g0 < grid_width; g0++)
            {
                grid_strides.push_back(GridAndStride{ g0, g1, stride });
            }
        }
    }
}

Define a function to set the minimum confidence score from Unity*

We might want to try different confidence thresholds for keeping object proposals from the Unity application, so we will add a function to enable this.

/// <summary>
/// Set minimum confidence score for keeping bounding box proposals
/// </summary>
/// <param name="min_confidence">The minimum confidence score for keeping bounding box proposals</param>
DLLExport void SetConfidenceThreshold(float min_confidence)
{
    bbox_conf_thresh = min_confidence;
}

Define a function to load an OpenVINO model

OpenVINO™ needs to compile models for the target device. This process can take several seconds when using GPU inference. We can create a cache directory, so we only need to compile models for a specific resolution-device pair once.

We'll place the code for loading an OpenVINO™ model inside a try-catch block to avoid crashing the application if we pass an incorrect file path.

If the model loads successfully, we will attempt to reshape the model input to the desired input dimensions. After reshaping the model input, we can compile the model for the target device.

We can get pointers to the model input tensor and create an inference request using the compiled model.

  • ov::Core::set_property(): Sets properties for a device
  • ov::Core::read_model(): Reads models from IR/ONNX/PDPD formats
  • ov::Model::reshape(): Updates input shapes and propagates them down to the outputs of the model through all intermediate layers
  • ov::Core::compile_model(): Creates a compiled model from a source model object
  • ov::CompiledModel::create_infer_request(): Creates an inference request object used to infer the compiled model
  • ov::InferRequest::get_input_tensor(): Gets an input tensor for inference
/// <summary>
/// Load a model from the specified file path
/// </summary>
/// <param name="model_path">The full model path to the OpenVINO IR model</param>
/// <param name="index">The index for the available_devices vector</param>
/// <param name="image_dims">The source image dimensions</param>
/// <returns>A status value indicating success or failure to load and reshape the model</returns>
DLLExport int LoadModel(char* model_path, int index, int image_dims[2]) 
{

    int return_val = 0;
    // Set the cache directory for compiled GPU models
    core.set_property("GPU", ov::cache_dir("cache"));

    // Try loading the specified model
    try { model = core.read_model(model_path); }
    catch (...) { return 1; }

    // The dimensions of the source input image
    img_w = image_dims[0];
    img_h = image_dims[1];
    // Calculate new input dimensions based on the max stride value
    input_w = (int)(strides.back() * std::roundf(img_w / strides.back()));
    input_h = (int)(strides.back() * std::roundf(img_h / strides.back()));
    n_pixels = input_w * input_h;

    // Calculate the value used to adjust the model output to the source image resolution
    scale_x = input_w / (img_w * 1.0);
    scale_y = input_h / (img_h * 1.0);

    // Generate the grid and stride values based on input resolution
    grid_strides.clear();
    GenerateGridsAndStride(input_h, input_w);

    // Try updating the model input dimensions
    try { model->reshape({ 1, 3, input_h, input_w }); }
    catch (...) { return_val = 2; }

    // Create a compiled model for the taret compute device
    auto compiled_model = core.compile_model(model, "MULTI",
                                             ov::device::priorities(available_devices[index]),
                                             ov::hint::performance_mode(ov::hint::PerformanceMode::LATENCY),
                                             ov::hint::inference_precision(ov::element::f32));

    // Create an inference request
    infer_request = compiled_model.create_infer_request();

    // Get input tensor by index
    input_tensor = infer_request.get_input_tensor(0);
    // Get a pointer to the input tensor data
    input_data = input_tensor.data<float>();

    // Get output tensor
    output_tensor = infer_request.get_output_tensor();
    // Get a pointer to the output tensor data
    out_data = output_tensor.data<float>();

    // Replace the initial input dims with the updated values
    image_dims[0] = input_w;
    image_dims[1] = input_h;

    // Return a value of 0 if the model loads successfully
    return return_val;
}

Define method to generate object detection proposals from the raw model output

The method to generate object proposals is nearly identical to the Python* implementation from part 1.

/// <summary>
/// Generate object detection proposals from the raw model output
/// </summary>
/// <param name="out_ptr">A pointer to the output tensor data</param>
void GenerateYoloxProposals(float* out_ptr, int proposal_length)
{
    // Remove the proposals for the previous model output
    proposals.clear();

    // Obtain the number of classes the model was trained to detect
    int num_classes = proposal_length - 5;

    for (int anchor_idx = 0; anchor_idx < grid_strides.size(); anchor_idx++)
    {
        // Get the current grid and stride values
        int grid0 = grid_strides[anchor_idx].grid0;
        int grid1 = grid_strides[anchor_idx].grid1;
        int stride = grid_strides[anchor_idx].stride;

        // Get the starting index for the current proposal
        int start_idx = anchor_idx * proposal_length;

        // Get the coordinates for the center of the predicted bounding box
        float x_center = (out_ptr[start_idx + 0] + grid0) * stride;
        float y_center = (out_ptr[start_idx + 1] + grid1) * stride;

        // Get the dimensions for the predicted bounding box
        float w = exp(out_ptr[start_idx + 2]) * stride;
        float h = exp(out_ptr[start_idx + 3]) * stride;

        // Calculate the coordinates for the upper left corner of the bounding box
        float x0 = x_center - w * 0.5f;
        float y0 = y_center - h * 0.5f;

        // Get the confidence score that an object is present
        float box_objectness = out_ptr[start_idx + 4];

        // Initialize object struct with bounding box information
        Object obj = { x0, y0, w, h, 0, 0 };

        // Find the object class with the highest confidence score
        for (int class_idx = 0; class_idx < num_classes; class_idx++)
        {
            // Get the confidence score for the current object class
            float box_cls_score = out_ptr[start_idx + 5 + class_idx];
            // Calculate the final confidence score for the object proposal
            float box_prob = box_objectness * box_cls_score;

            // Check for the highest confidence score
            if (box_prob > obj.prob)
            {
                obj.label = class_idx;
                obj.prob = box_prob;
            }
        }

        // Only add object proposals with high enough confidence scores
        if (obj.prob > bbox_conf_thresh) proposals.push_back(obj);
    }

    // Sort the proposals based on the confidence score in descending order
    std::sort(proposals.begin(), proposals.end(), [](Object& a, Object& b) -> bool
              { return a.prob > b.prob; });
}

Define function to sort bounding box proposals using Non-Maximum Suppression

The C++ API for OpenCV has built-in functionality to perform comparison operations between rectangles. Therefore, we don't need to define helper functions to calculate the intersection and union areas of two bounding boxes. Otherwise, the method to sort bounding box proposals using Non-Maximum Suppression is almost identical to the Python* implementation from part 1.

/// <summary>
/// Filter through a sorted list of object proposals using Non-maximum suppression
/// </summary>
void NmsSortedBboxes()
{
    // Remove the picked proposals for the previous model outptut
    proposal_indices.clear();

    // Iterate through the object proposals
    for (int i = 0; i < proposals.size(); i++)
    {
        Object& a = proposals[i];

        // Create OpenCV rectangle for the Object bounding box
        cv::Rect_<float> rect_a = cv::Rect_<float>(a.x0, a.y0, a.width, a.height);

        bool keep = true;

        // Check if the current object proposal overlaps any selected objects too much
        for (int j : proposal_indices)
        {
            Object& b = proposals[j];

            // Create OpenCV rectangle for the Object bounding box
            cv::Rect_<float> rect_b = cv::Rect_<float>(b.x0, b.y0, b.width, b.height);

            // Calculate the area where the two object bounding boxes overlap
            float inter_area = (rect_a & rect_b).area();
            // Calculate the union area of both bounding boxes
            float union_area = rect_a.area() + rect_b.area() - inter_area;
            // Ignore object proposals that overlap selected objects too much
            if (inter_area / union_area > nms_thresh)
                keep = false;
        }

        // Keep object proposals that do not overlap selected objects too much
        if (keep) proposal_indices.push_back(i);
    }
}	

Define a function to perform inference

We will access the pixel data for the input image from Unity with a pointer to a uchar (unsigned 1-byte integer) array and wrap the data in a cv::Mat variable for processing.

We don't need to normalize the input image since the IR model does it internally.

After processing the model output, we'll return the final number of detected objects to Unity so we can initialize an array of objects.

/// <summary>
/// Perform inference with the provided texture data
/// </summary>
/// <param name="image_data">The source image data from Unity</param>
/// <returns>The final number of detected objects</returns>
DLLExport int PerformInference(uchar* image_data) 
{

    // Store the pixel data for the source input image in an OpenCV Mat
    cv::Mat input_image = cv::Mat(img_h, img_w, CV_8UC4, image_data);

    // Remove the alpha channel
    cv::cvtColor(input_image, input_image, cv::COLOR_RGBA2RGB);

    // Resize the input image
    cv::resize(input_image, input_image, cv::Size(input_w, input_h));

    // Iterate over each pixel in image
    for (int p = 0; p < n_pixels; p++)
    {
        input_data[0*n_pixels + p] = input_image.data[p*n_channels + 0] / 255.0f;
        input_data[1*n_pixels + p] = input_image.data[p*n_channels + 1] / 255.0f;
        input_data[2*n_pixels + p] = input_image.data[p*n_channels + 2] / 255.0f;
    }

    // Perform inference
    infer_request.infer();

    // Generate new proposals for the current model output
    GenerateYoloxProposals(out_data, output_tensor.get_shape()[2]);

    // Pick detected objects to keep using Non-maximum Suppression
    NmsSortedBboxes();

    // return the final number of detected objects
    return (int)proposal_indices.size();
}

Define a function to populate an array of objects from Unity

Next, we will define a function to populate an array of objects from Unity. We call this function after initializing the list based on the current number of detected objects. We will also scale the bounding box information from the input dimensions to the source image resolution.

/// <summary>
/// Fill the provided array with the detected objects
/// </summary>
/// <param name="objects">A pointer to a list of objects from Unity</param>
DLLExport void PopulateObjectsArray(Object* objects) 
{

    for (int i = 0; i < proposal_indices.size(); i++)
    {
        Object obj = proposals[proposal_indices[i]];

        // Adjust offset to source image resolution and clamp the bounding box
        objects[i].x0 = std::min(obj.x0 / scale_x, (float)img_w);
        objects[i].y0 = std::min(obj.y0 / scale_y, (float)img_h);
        objects[i].width = std::min(obj.width / scale_x, (float)img_w);
        objects[i].height = std::min(obj.height / scale_y, (float)img_h);

        objects[i].label = obj.label;
        objects[i].prob = obj.prob;
    }
}

Define a function to reset the vectors when the Unity application exits

This last function will free the memory allocated by the vectors. We will call it when the Unity application shuts down.

/// <summary>
/// Reset vectors
/// </summary>
DLLExport void FreeResources() 
{
    available_devices.clear();
    grid_strides.clear();
    proposals.clear();
    proposal_indices.clear();
}

That is all the code needed for the plugin. We can now build the solution to generate the DLL file.

Build Solution

Open the Build menu at the top of the Visual Studio window and click Build Solution. Visual Studio will generate a new x64 folder in the project directory containing the DLL file and its dependencies.

Gather Dependencies

Right-click the project name in the Solution Explorer panel and select Open Folder in File Explorer from the popup menu.

In the new File Explorer window, go to the parent folder.

Open the x64 → Release subfolder.

We will need to copy all the DLL files in this folder and the plugins.xml file to the Unity* project.

Summary

This post covered creating a dynamic link library (DLL) file to perform inference with a YOLOX model using OpenVINO™ toolkit. In part 3, we build a project in Unity* that uses this DLL.

In This Series

End-to-End Object Detection for Unity* with IceVision* and OpenVINO™ Toolkit Part 1 

End-to-End Object Detection for Unity* with IceVision* and OpenVINO™ Toolkit Part 2 

End-to-End Object Detection for Unity* with IceVision* and OpenVINO™ Toolkit Part 3 

Beginner Tutorial

Fastai to Unity* Beginner Tutorial Pt. 1

Related Links

In-Game Style Transfer Tutorial in Unity*

Targeted GameObject Style Transfer Tutorial in Unity*

Developing OpenVINO™ Toolkit Inferencing Plugin for Unity* Tutorial

Developing OpenVINO™ Toolkit Object Detection Inferencing Plugin for Unity*

Project Resources

 GitHub Repository