Intel® FPGA Academic Program
FPGA Community Research
Learn about discoveries of programmable solutions by Intel engineers and research partners.
Beyond Peak Performance: Comparing the Real Performance of AI-Optimized FPGAs and GPUs
Learn about the first performance evaluation of the Intel® Stratix® 10 NX FPGA, optimized for AI. This research compares the Intel Stratix FPGA to current AI-optimized GPUs, NVIDIA* T4 and V100, on a large suite of real-time, deep learning inference workloads.
PyGA: A Python*-to-FPGA Compiler Prototype
Read about a proof-of-concept Python*-to-FPGA compiler that is based on the Numba* Just-In-Time (JIT) compiler for Python and the Intel® FPGA SDK for OpenCL™ software technology. It allows for a seamless use of an FPGA card as an accelerator for Python.
Activation Function Architectures for FPGAs
Examine the impact of activation function quality (in both area and latency) on recurrent neural network performance.
Compiler and FPGA Overlay for Neural Network Inference Acceleration
See how tailoring an overlay to a specific application domain can maintain its full programmability without the performance overhead traditionally associated with overlays.
High Density and Performance Multiplication for FPGA
Get an introduction to multiplier regularization, which restructures common multiplier algorithms into smaller, more efficient architectures.
Harness Numerical Flexibility for Deep Learning on FPGAs
See how using a block floating point (FP) implementation that shares the exponent across many numbers can significantly improve FPGA performance without affecting accuracy.
A Large-Scale Consensus-Based Clustering Algorithm for High-Performance FPGAs
Based on a new concept of consensus building at a large scale, this innovative parallel clustering algorithm works specifically with designs with millions of elements.
Deterministic Latency Image Acquisition & Processing System Based on FPGAs for Automated Driving Systems
Learn how the flexible input/output structure of FPGAs enables implementation of a deterministic latency acquisition and processing system.
FP Tangent Implementation for FPGAs
This architecture implements an FP tangent function optimized for an FPGA containing hard floating point (HFP) digital signal processing locks.
QRD for Parallel Arithmetic Structures
Get a description of the algorithm and architecture of a new organization of the QR decomposition (QRD), which is optimized for parallel arithmetic structures found in current FPGAs.
Single Precision Logarithm & Exponential Architectures for FPGAs Enabled with Hard FPs
Explore a novel method for implementing FP elementary functions using the new FP single precision addition and multiplication features of the Intel® Arria® 10 FPGA and Intel Stratix 10 FPGA.
Hardware Implementation of Evolvable Block-Based Neural Networks
Learn about the efficient hardware implementation in FPGAs of an evolvable block-based neural network (BbNN) that uses a novel and cost-efficient, sigmoid-like activation function.
Efficient FP Polynomial Evaluation on FPGAs
This technique uses the Horner scheme to evaluate polynomials and removes the majority of alignment shifters present in FP adders by building a fused evaluation operator. The result is a reduction in circuit latency and logic consumption.