Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks

Abstract : Convolutional Neural Networks (CNNs) are state-of-the-art in numerous computer vision tasks such as object classification and detection. However, the large amount of parameters they contain leads to a high computational complexity and strongly limits their usability in budget-constrained devices such as embedded devices. In this paper, we propose a combination of a new pruning technique and a quantization scheme that effectively reduce the complexity and memory usage of convolutional layers of CNNs, and replace the complex convolutional operation by a low-cost multiplexer. We perform experiments on the CIFAR10, CIFAR100 and SVHN and show that the proposed method achieves almost state-of-the-art accuracy, while drastically reducing the computational and memory footprints. We also propose an efficient hardware architecture to accelerate CNN operations. The proposed hardware architecture is a pipeline and accommodates multiple layers working at the same time to speed up the inference process.
Document type :
Preprints, Working Papers, ...
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01965304
Contributor : Ghouthi Boukli Hacene <>
Submitted on : Tuesday, December 25, 2018 - 10:58:11 PM
Last modification on : Wednesday, July 21, 2021 - 7:42:01 AM
Long-term archiving on: : Tuesday, March 26, 2019 - 2:45:09 PM

Files

hardware_quake.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01965304, version 1
  • ARXIV : 1812.11337

Citation

Ghouthi Boukli Hacene, Vincent Gripon, Matthieu Arzel, Nicolas Farrugia, Yoshua Bengio. Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks. 2018. ⟨hal-01965304⟩

Share

Metrics

Record views

531

Files downloads

226