Visible to Intel only — GUID: GUID-0F6B1037-19A3-4A27-85BB-7B12E78BA6F7
Visible to Intel only — GUID: GUID-0F6B1037-19A3-4A27-85BB-7B12E78BA6F7
Troubleshooting Discrepancies in Hardware and Emulator Results
When you emulate a kernel, your kernel might produce results different from the kernel compiled for hardware. You can further debug your kernel before you compile for hardware by running your kernel through simulation.
The most common reasons for differences in emulator and hardware results are as follows:
Your kernel code is using the ivdep attribute. The emulator does not model your kernel when the ivdep attribute breaks a true dependence. During a full hardware or simulator compilation, you observe this as an incorrect result.
Your kernel code relies on uninitialized data. Examples of uninitialized data include uninitialized variables and uninitialized or partially initialized global buffers, and arrays.
Your kernel code behavior depends on the precise results of floating-point operations. The emulator uses floating-point computation hardware of the CPU, whereas the hardware run uses floating-point cores implemented as FPGA cores.
NOTE:The SYCL* standard allows one or more least significant bits of floating-point computations to differ between platforms while still being considered correct on both such platforms.Your kernel code behavior depends on the order of pipe accesses in different kernels. The emulation of pipe behavior has limitations, especially for conditional pipe operations where the kernel does not call the pipe operation in every loop iteration. In such cases, the emulator might execute pipe operations in an order different from that of the hardware. For instance, if you have two kernels connected by pipes, and each kernel contains a loop containing a read() or write() function that does not happen every loop iteration (for example, if it is gated by an if-statement), the emulator might interleave the read() or write() calls differently than the hardware.
Your kernel or host code is accessing global memory buffers out-of-bounds.
NOTE:- Uninitialized memory read and write behaviors are platform-dependent. Verify the sizes of your global memory buffers when using all addresses within kernels.
You can use software memory leak detection tools, such as Valgrind, on the emulated version of your kernel to analyze memory-related problems. The absence of warnings from such tools does not mean the absence of issues. It only means that the tool could not detect any problem. In such a scenario, Intel recommends manual verification of your kernel or host code.
Your kernel code is accessing local variables out-of-bounds. For example, accessing a local array out-of-bounds or accessing a variable after it has gone out of scope.
NOTE:In software terms, these issues are stack corruption issues because accessing variables out of bounds usually affects unrelated variables located close to the variable being accessed on a stack. Emulated kernels are implemented as regular CPU functions and have an actual stack that can be corrupted. When targeting hardware, no stack exists. Hence, the stack corruption issues are guaranteed to manifest differently. When you suspect a stack corruption, use memory leak analyzer tools, such as Valgrind. However, stack-related issues are usually difficult to identify. Intel recommends manual verification of your kernel code to debug a stack-related issue.Your kernel code uses shifts that are larger than the type being shifted. For example, shifting a 64-bit integer by 65 bits. According to the SYCL specification version 1.0, the behavior of such shifts is undefined.
When you compile your kernel for emulation, the default pipe depth is different from the default pipe depth generated when your kernel is compiled for hardware. This difference in pipe depths might lead to scenarios where execution on the hardware hangs while kernel emulation works without any issue. Refer to Emulate Pipe Depth for information about fixing the pipe depth difference.
In terms of ordering the printed lines, the output of the cout stream function might be ordered differently on the emulator and hardware. This is because, in the hardware, cout stream data is stored in a global memory buffer and flushed from the buffer only when the kernel execution is complete or when the buffer is full. In the emulator, the cout stream function uses the x86 stdout.
The hardware and emulator might produce different results if you perform an unaligned load/store through upcasting of types. A load/store of this type is undefined in the C99 specification.