Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm
Résumé
This paper compares the latency, accuracy, training time and hardware costs of neural networks compressed with our new multi-objective evolutionary algorithm called NEMOKD, and with quantisation. We evaluate NEMOKD on Intel’s Movidius Myriad X VPU processor, and quantisation on Xilinx’s programmable Z7020 FPGA hardware. Evolving models with NEMOKD increases inference accuracy by up to 82% at the cost of 38% increased latency, with throughputperformance of 100–590 image frames-per-second (FPS). Quantisation identifies a sweet spot of 3 bit precision in the trade-off between latency, hardware requirements, training time and accuracy. Parallelising FPGA implementations of 2 and 3 bit quantised neural networks increases throughput from 6 k FPS to 373 k FPS, a 62x speedup.
Fichier principal
Stewart-2021-Optimising Hardware Accelerated Neural Networks.pdf (1.32 Mo)
Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte