4.2. Host Application Description
This host application is simplified compared with a production design host application. All API calls occur in the main source file. In a production design, you are more likely to make these API calls in libraries, similarly to the DMA AFU design. For clarity, all the host code is in a single source file, hls_afu/sw/src/hls_afu_host.c. The headings in this section match the headings in hls_afu_host.c.
Slave Name | Address Range |
---|---|
Device feature header slave (afu_id_avmm_slave_0) |
0x0000 to 0x003F |
HLS component (fpVectorReduce_ac_int_internal_0 or fpVectorReduce_float_internal_0) |
0x0040 to 0x007F |
Preamble/Header Files
The first section of the host code includes necessary libraries, and defines several constant address offsets. The design derives the CSR constants for the HLS component from the constants in fpVectorReduce_float_csr.h, which the HLS compiler emits. Because the HLS component's slave interface shares a memory space with the AFU ID MM slave, a base offset ensures that each register in the AFU ID MM slave and the HLS component has a unique address.
Discover/Grab FPGA Resources
This block of code is boilerplate. The design queries the FPGA hardware for available accelerators, and if the design finds the accelerator required by the host application, the host application attempts to control the FPGA device. In this design, the host also exercises the AF registers that the Acceleration Stack requires. The HLS component does not implement these registers, which are in the AFU device feature header Avalon-MM slave.
Setup and Populate Host-Side Memory
This block of code configures a contiguous host-side memory buffer that the AF can access. When you run an Acceleration Stack host, make sure that you configure it to use 2 MB hugepages using this command (you do not need this command if you are running using ASE):
# sudo sh -c "echo 20 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"
This command allows the host application to allocate 2MB pinned continuous buffers in its memory.
The fpgaPrepareBuffer() function allocates the host-side buffer that the design shares with the AF. This function allocates a block of memory starting at a user-specified address. Additionally, it guarantees that the memory block is 64-byte (512 bits) aligned, which makes the AFU accesses efficient. fpgaGetIOAddress() gets a pointer that the AF can use to access the same memory space as the host. The host can populate the block of RAM as it does for any other array.
Setup Interrupts
This design uses an interrupt framework to allow the AF to report to the host when it finishes processing. The HLS component generates the required interrupt in this design, so the host needs to write into the HLS component's slave memory space to enable the interrupt. First, the host checks if the interrupt is already enabled, by reading the CSR_INTERRUPT_ENABLE register. Refer to the Hello Interrupt AFU example included with Intel Acceleration Stack for more details about interrupts.
Start AF and Wait for Result
To start the AF, the host writes input variables into the HLS component's slave space (input data starting address, output data starting address, data size). Then it writes a 1 into the START bit in the HLS component's slave space. Using the poll API, the host waits for the AF to finish.
Check to Make Sure that the Calculation Was Correct
The host checks that the interrupt returned correctly (or did not time out) and verifies that the output memory contains the expected values. This design also prints out some debug data at the end of the memory space, to illustrate that AFs can only perform 512-bit reads and writes. If you pass a vector whose length is not a multiple of 512 bits, (64 bytes), the design overwrites some data in the output vector memory space.
Cleanup
Finally, the host application disposes of the resources that it allocated during its execution.