Visible to Intel only — GUID: GUID-994AF61A-EF96-4D95-BBDB-49B5AB5EA5AF
Visible to Intel only — GUID: GUID-994AF61A-EF96-4D95-BBDB-49B5AB5EA5AF
OpenMP Execution Model
The OpenMP execution model has a single host device but multiple target devices. A device is a logical execution engine with its own local storage and data environment.
When executing on Intel® Data Center GPU Max Series, the entire GPU (which may have multiple stacks) can be considered as a device, or each stack can be considered as a device.
OpenMP starts executing on the host. When a host thread encounters a target construct, data is transferred from the host to the device (if specified by map clauses, for example), and code in the construct is offloaded onto the device. At the end of the target region, data is transferred from the device to the host (if so specified).
By default, the host thread that encounters the target construct waits for the target region to finish before proceeding further. nowait on a target construct specifies that the host thread does not need to wait for the target region to finish. In other words, the nowait clause allows the asynchronous execution of the target region.
Synchronizations between regions of the code executing asynchronously can be achieved via the taskwait directive, depend clauses, (implicit or explicit) barriers, or other synchronization mechanisms.