Optimization Flags
The following table lists OpenCL optimization flags and their equivalents in SYCL*:
OpenCL | SYCL | Description |
---|---|---|
-clock=<clock_target> | -Xsclock=<clock_target> | Determines the pipelining effort the scheduler attempts during the scheduling process. |
-no-interleaving=<global_memory_type> | -Xsno-interleaving=<global_memory_name> | Disables burst-interleaving for all global memory banks of the same type and manages them manually. |
-global-ring | -Xsglobal-ring | Overrides compiler's choice of optimal global memory interconnect topology (based on various design characteristics) and forces a ring topology. |
-force-single-store-ring | -Xsforce-single-store-ring | Narrows the interconnect to save area while limiting write-only throughput to one bank's worth. |
-num-reorder | -Xsnum-reorder=<N> | Narrows the interconnect to save area while reducing read-only throughput, where N is the number of bank's worth of read bandwidth you desire. |
-ffp-reassociate | Fast math is enabled by default. See fp-model, fp topic in the Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference for additional information. |
Relaxes the order of arithmetic floating-point operations using a balanced-tree hardware implementation. |
-ffp-contrast=fast | Fast math is enabled by default. See fp-model, fp topic in the Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference for additional information. |
Removes intermediary floating-point rounding operations and conversions whenever possible and carries additional bits to maintain precision. |
-no-hardware-kernel-invocation-queue | -Xsno-hardware-kernel-invocation-queue | Reduces kernel area use by removing kernel invocation queues. This can introduce delays between back-to-back executions of the same kernel. |
-hyper-optimized-handshaking | -Xshyper-optimized-handshaking | Modifies the internal handshaking protocol used by the design. |
-auto-pipeline | -Xsauto-pipeline | Pipelines loops in non-task (NDRange in OpenCL or parallel_for in SYCL) kernels. |
Not supported | -Xsdisable-auto-loop-fusion | Disables automatic loop fusion when compiling your design. |
Not supported | -Xsenable-unequal-tc-fusion | Fuses adjacent loops with different trip count into a single loop without affecting either loop's functionality. |
-const-cache-bytes=<N> | Not supported | Configures the constant memory cache size (rounded up to the closest power of 2). The default constant cache size is 16 kB. |
Not supported | -Xsrounding<rounding_type> | Modifies the rounding mode of floating-point operations in your design. |