Why Is the Model Load Time to GPU Longer Than to CPU?

Content Type: Maintenance & Performance | Article ID: 000057525 | Last Reviewed: 01/31/2023

Description Resolution Additional information

Environment

OpenVINO™ toolkit GPU plugin CPU plugin

Description

Loading an input model's Intermediate Representation (IR) to GPU takes longer than loading the same model to a CPU.

Resolution

Manually create cl_cache directory in the working directory of your application.

The driver will use this directory to store the binary representations of the compiled kernels. This will work on all supported OSes.

Additional information

Refer to this article for more information on managing the cl_cache.

Loading your input model in Intermediate Representation (IR) format to GPU takes longer than loading the same model to a CPU because the GPU stack is based on OpenCL*. The load time depends on the compilation time of OpenCL* kernels.

When you enable the cl_cache, the first time you load the model it will still take a long time because the OpenCL* kernel will compile. However, each subsequent load of the same model will be much faster.

Why Is the Model Load Time to GPU Longer Than to CPU?

Environment

Description

Resolution

Additional information

Related Products

Discontinued Products

Need more help?