Visible to Intel only — GUID: GUID-63761051-E00B-471C-88DA-BC1A9427EC1B
Visible to Intel only — GUID: GUID-63761051-E00B-471C-88DA-BC1A9427EC1B
Writing to a Shared Resource
According to the OpenCL™ specification, you need to ensure that the commands that change the content of a shared memory object, complete in the previous command queue before the memory object is used by commands, executed in another command-queue. One way to achieve this is using events:
cl_event eventGuard; cl_buffer bufferShared=clCreateBuffer(shared_context,CL_MEM_READ_WRITE…); //Populating the buffer from the host, queue is regular in-order clEnqueueWriteBuffer(cpu_queue, bufferShared,…); //Setting the arguments and processing buffer with a kernel SetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&bufferShared); … clEnqueueNDRangeKernel(cpu_queue, kernel, … &eventGuard); //make sure the first device is done clWaitForEvents(1, &eventGuard); //alternatively you can use clFinish(cpu_queue) if in the same thread … //Now using buffer by second device clEnqueueWriteBuffer(gpu_queue, bufferShared,…); clEnqueueNDRangeKernel(gpu_queue, kernel, … &eventGuard); …
If you want to write data (or output kernel results) to the same buffer simultaneously on two devices, use properly aligned, non-overlapping sub-buffers.
cl_buffer bufferShared = clCreateBuffer(shared_context, CL_MEM_ WRITE …); //make sure alignment for the resp devices cl_int gpu_align; clGetDeviceInfo(gpuDeviceId, CL_DEVICE_MEM_BASE_ADDR_ALIGN,…&gpu_align); gpu_align /= 8; //in bytes //make sure that cpuPortion is properly aligned first! cl_buffer_region cpuBufferRegion = { 0, cpuPortion}; cl_buffer_region gpuBufferRegion = { cpuPortion, theRest}; cl_buffer subbufferCPU = clCreateSubBuffer(bufferShared, 0, CL_BUFFER_CREATE_TYPE_REGION, &cpuBufferRegion, &err); cl_buffer subbufferGPU = clCreateSubBuffer(bufferShared, 0, CL_BUFFER_CREATE_TYPE_REGION, &gpuBufferRegion, &err); //now work with 2 sub-buffers on 2 devices simultaneously - (refer to the //prev. section) .. //the sub-resources should be released properly clReleaseMemObject(subbufferCPU); clReleaseMemObject(subbufferGPU); clReleaseMemObject(bufferShared);
See Also
The OpenCL™ 1.2 Specification at http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf