1.3. Performance Results
We compared the performance of this implementation on an FPGA with the optimized k-mean implementation on a CPU. For both FPGA and CPU runs, the same data set was used.
The FPGA used for performance comparison was an Intel® Arria® 10 GX FPGA Development Kit. The FPGA was programmed with Intel® FPGA SDK for OpenCL™ Version 17.1 Update 1. During testing, the FPGA had an fMAX of 320 MHz.
The CPU used for performance comparison was an Intel® Xeon® E5-2680 (24 cores, no hyperthreading).
Data Size (bytes) | FPGA | CPU | |
---|---|---|---|
Time with initialization method 1 (ms) | Time with initialization method 2 (ms) | Time (ms) | |
512 | 0.028 | 0.016 | 0.065 |
1024 | 0.042 | 0.032 | 0.573 |
2048 | 0.051 | 0.037 | 0.627 |
4096 | 0.089 | 0.039 | 0.804 |
8192 | 0.105 | 0.044 | 0.919 |
In this experiment, the number of clusters are set to 10.
Each data set includes 2 features of floating type and different numbers of input data sets (512 to 8192) are used to compare the performance of FPGA and CPU.
For the FPGA runs, we tried two initialization methods. In the first method, we used the first k-data as the centroids of the clusters. In the second method, we chose centroids randomly. With randomly-chosen initial centroids, the algorithm required fewer iterations and therefore achieved faster times.