Visible to Intel only — GUID: GUID-F588C690-C385-492F-9D00-78229546F7DA
Visible to Intel only — GUID: GUID-F588C690-C385-492F-9D00-78229546F7DA
Intel® FPGA Emulation Platform for OpenCL™ Getting Started Guide
This guide provides quick steps to install the technical preview of Intel® FPGA Emulation Platform for OpenCL™, compile and run OpenCL kernels on the Emulator.
About the Intel® FPGA Emulation Platform for OpenCL™
Intel® FPGA Emulation Platform for OpenCL™ technical preview includes the runtime and compiler, which runs on Intel® Core™ and Intel® Xeon® processors. It is capable of compiling and running programs written with Intel® OpenCL™ FPGA extensions (for example, with the FPGA 'channels' extension).
The emulator aims to provide:
- rapid compilation time (seconds)
- source code portability between emulator and FPGA
- reasonable performance (average benchmarks run at 5x-10x slowdown in comparison with FPGA hardware)
This is a technical preview version of the emulator, which does not provide full functional equivalence with an FPGA device. It is provided for evaluation purposes only without any warranties.
System Requirements
Supported OS:
- Ubuntu* 16.04 (64-bit)
- RedHat* 7.x or CentOS* 7.x (64-bit)
Supported Hardware:
- Intel® Core™ CPU 5th generation (formerly known as Intel® microarchitecture code-named Broadwell) or higher
- Intel® Xeon® CPU E5 v5 (formerly known as Intel® microarchitecture code-name Broadwell) or higher
Installing Intel® FPGA Emulation Platform for OpenCL™ Technical Preview
To configure environment of current session for using OpenCL™ standalone binaries do the following steps:
- Unpack provided binaries to any working directory
- Create new icd file with following content as shown below: > echo /path/to/binaries/libintelocl_emu.so >> /etc/OpenCL/vendors/intel_fpga_fast_emu.icd
- Set INSTALLDIR variable in setupvars.sh script to the path where binaries have been unpacked.
If installation succeeded the following OpenCL™ platform will be available:
Platform [#1] :
Profile : FULL_PROFILE
Version : OpenCL 2.0 LINUX
Name : Intel(R) FPGA Emulation Platform for OpenCL(TM) (preview)
Vendor : Intel(R) Corporation
Devices : 1
Device [#1] :
Type : accelerator
Profile : FULL_PROFILE
Version : OpenCL 2.0 (Build 5)
Name : Intel(R) FPGA Emulation Device (preview)
Vendor : Intel(R) Corporation
C version : OpenCL C 2.0
Driver version : 1.2.0.5
Getting Started with Intel® FPGA Emulation Platform for OpenCL™ Technical Preview
The emulator provides a separate OpenCL™ platform with one OpenCL™ CPU device. It supports Intel® FPGA OpenCL™ extensions.
OpenCL programs written for FPGA device can be compiled and executed on this device, using standard OpenCL API (including clCreateProgramWithBinary(), see the Offline Compilation section).
There are sets of environment variable affecting emulator execution.
- The set of emulator specific libraries:
-
- OCL_FPGA_EMU - should be set to enable FPGA style of channels. This environment variable is mandatory to use FPGA specific extensions.
OCL_TBB_NUM_WORKERS ([1..]) - maximum number of threads which can be used by TBB.
VOLCANO_CPU_ARCH (core-avx2, skx) - can be set to force SIMD instruction set used for OpenCL kernel compilation where skx corresponds to AVX-512 support
VOLCANO_CLANG_OPTIONS - internal environment variable allowing to force some options to OpenCL compiler. For example:
-
-fopenmp -fintel-openmp -fopenmp-tbb - enables OpenMP* support
-ffast-math - forces fast math built-ins
-DINTEL_OCL_FPGA_CPU_EMU - adds corresponding define to the OpenCL kernel
VOLCANO_LLVM_OPTIONS (-vector-library=SVML) - Internal option that allows force usage of short vector math library. Must be set if OpenMP support is enabled.
- OpenCL related environment variables (see Intel® OpenCL™ CPU RT documentation for details):
-
- CL_CONFIG_USE_VECTORIZER (True, False) - NDRange vectorizer control. Does not affect OpenMP pragma vectorization of single work-item kernels. Should be set to False to speedup kernel compilation time.
CL_CONFIG_CPU_FORCE_LOCAL_MEM_SIZE (e.g 256KB) - amount of available OpenCL local memory.
- OpenMP environment variables (see OpenMP documentation for details):
-
- KMP_AFFINITY - affinity settings for OpenMP threads. For example, "norespect,physical,20"
- OMP_NUM_THREADS ([1..]) - Number of available OpenMP threads
Optimization guide using OpenMP is available in the directory with binaries (Optimization_guide.pdf).
Offline Compilation
The Emulator supports OpenCL™ kernels compilation into binaries (similar to .aocx files used for FPGA device), which can be used in clCreateProgramWithBinary().
Use Intel® SDK for OpenCL™ Applications - offline compiler ('ioc64' tool) to compile kernel binaries for the emulator from OpenCL C source code. This tool is distributed as part of Intel® Code Builder for OpenCL™ API:
> ioc64 -bo='-cl-std=CL2.0' -device-fpga_fast_emu -input=source.cl -ir=kernel_binary.elf
Name for the output file is arbitrary, and it can have .aocx extension to let a host program use the same names for both FPGA device and the emulator.
Kernel binaries produced by the 'ioc' tool are not compatible with binaries compiled for FPGA device and vice versa.
Execution
Set of environment variables mentioned in the Getting Started section can affect the emulator behavior.
Bash script (setupvars.sh) distributed with binaries can be used to simplify the environment setting. Please uncomment/modify value for required variable in the script and run the command below:
> . /path/to/binaries/setupvars.sh
After that all application running in current console will use environment variables set in the script.
Generating FPGA static reports
To generate the FPGA static reports on build perform the following steps:
- Right click the session in the Code Builder Session Explorer window and select Session Options.
- Go the Build Artifacts tab and check the Static Reports option.
After each build, the static reports is generated and listed in the session tree as report.html under Build Artifacts.
The static report menu is divided into three section:
- Session info - Shows your kernel sources and general info about your session: name, target board/device, aoc version, aoc command etc.
- Summary reports - Contains a summary list of your kernels, an estimated resource usage summary and compiler warnings.
- FPGA analysis - Provides four different analysis reports:
-
- FPGA Loops - contains analysis of all the loops in your code and their unroll statuses, with this analysis you can check if your loops were "fully unrolled", "partially unrolled" or "pipelined". Use this data to maximize your throughput.
- Area Analysis (System/Source) - contains analysis of area usage of your OpenCL system, the data is separated into three lists:
- Global - Contains your resources: channels, global variables etc.
- Partitions - Contains data about the static partition.
- Kernels - Show the list of your kernels, where for each kernel you can see its private resources, area by system and area by source.
The Source section shows an approximation of how a source code line affects the area usage.
The System section shows the closest approximation to the hardware that is implemented in the FPGA (the kernel is divided into logic blocks).
The Area Analysis data gives an insight into the generated hardware and offers suggestions on how to resolve potential inefficiencies.
- System Viewer - Contains a stall point graph showing load and store information between kernels and different memories, channels connected between kernels, loops and an abstracted netlist of your OpenCL system, which allows you to verify memory replication, and identify any load and store instructions that are "stall"-able.
- Kernel Memory Viewer - Shows how the offline compiler interprets the data connections across the memory system of your kernel, the data help you identify data movement bottlenecks in your kernel design and find some patterns in memory accesses that can cause undesired arbitration in the load-store units (LSUs), which can affect the throughput performance of your kernel.