Due to a problem in version 19.1 and 19.2 of the Intel® FPGA SDK for OpenCL™, When running OpenCL™ designs, customer may find the kernel execution and memory data transfer can not run simultaneously even there's no dependency between them in host code. Below is an example, there's no event dependency between clEnqueueWriteBuffer and clEnqueueNDRangeKernel , and different command queue is used to innovate these command. But in the profile report, it shows the kernel execution and data transfer run in order rather than in parallelism.
This was due to a bug in version 19.1 and 19.2 of the Intel® FPGA SDK for OpenCL™ , that was causing the delayed launch of the kernels, when there is a concurrent DDR access from the host and the Kernel. The Kernel launches were getting delayed even if the kernel and host were accessing different part of DDR memory.
This incorrect dependency of kernel launching has been fixed in version 19.3 of the the Intel® FPGA SDK for OpenCL™ .