Deep learning, Nature, vol.9, issue.7553, pp.436-444, 2015. ,
DOI : 10.1007/s10994-013-5335-x
ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision, vol.1010, issue.1, pp.211-252, 2015. ,
DOI : 10.1007/978-3-642-15555-0_11
URL : http://dspace.mit.edu/bitstream/1721.1/104944/1/11263_2015_Article_816.pdf
Fully convolutional networks for semantic segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3431-3440, 2015. ,
DOI : 10.1109/CVPR.2015.7298965
URL : http://arxiv.org/pdf/1411.4038
Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint ,
DOI : 10.21437/interspeech.2016-1446
URL : http://arxiv.org/pdf/1701.02720
Very deep convolutional networks for large-scale image recognition . arXiv preprint, pp.1-14, 2014. ,
Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?, Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '17, pp.5-14, 2017. ,
DOI : 10.1109/MICRO.2014.58
Going Deeper with Embedded FPGA Platform for Convolutional Neural Network, Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '16, pp.26-35, 2016. ,
DOI : 10.1109/92.784091
Intel® Stratix® 10 Variable Precision DSP Blocks User Guide, 2017. ,
Deep Learning on FPGAs: Past, Present, and Future. arXiv e-print, p.2016 ,
EEcient Processing of Deep Neural Networks: A Tutorial and Survey, Proceedings of the IEEE, pp.2295-2329 ,
Gradient-based learning applied to document recognition, Proceedings of the IEEE, pp.2278-2324, 1998. ,
DOI : 10.1109/5.726791
URL : http://www.cs.berkeley.edu/~daf/appsem/Handwriting/papers/00726791.pdf
ImageNet Classiication with Deep Convolutional Neural Networks, Advances in Neural Information Processing Systems -NIPS'12, pp.1-9, 2012. ,
DOI : 10.1145/3065386
URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf
Receptive elds, binocular interaction and functional architecture in the cat's visual cortex, The Journal of physiology, vol.160, issue.1, pp.106-154, 1962. ,
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Proceedings of the International Conference on Machine Learning -ICML '15, pp.448-456, 2015. ,
Ran El-Yaniv, and Yoshua Bengio, Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. arXiv e-print, p.2016 ,
Going deeper with convolutions, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. ,
DOI : 10.1109/CVPR.2015.7298594
URL : http://arxiv.org/pdf/1409.4842
Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778 ,
DOI : 10.1109/CVPR.2016.90
Minimizing Computation in Convolutional Neural Networks, Proceedings of the International Conference on Artiicial Neural Networks -ICANN '14, pp.281-290, 2014. ,
DOI : 10.1007/978-3-319-11179-7_36
Parameterized convolution ltering in a eld programmable gate array, Proceedings of the International Workshop on Field Programmable Logic and Applications on More FPGAs, pp.274-280, 1994. ,
1.1 Computing's energy problem (and what we can do about it), 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp.10-14 ,
DOI : 10.1109/ISSCC.2014.6757323
URL : https://hal.archives-ouvertes.fr/insu-01488608
GPU-Based Deep Learning Inference: A Performance and Power Analysis, 2015. ,
cuDNN: EEcient Primitives for Deep Learning, 2014. ,
Deep CL: OpenCL library to train deep convolutional neural networks, 2017. ,
Caae: Convolutional Architecture for Fast Feature Embedding, Proceedings of the ACM International Conference on Multimedia, 2014. ,
TensorFlow: A System for Large-Scale Machine Learning, Proceedings of the USENIX Symposium on Operating Systems Design and Implementation -OSDI '16, pp.265-284, 2016. ,
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks, Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '16, pp.16-25, 2016. ,
DOI : 10.1145/2664666.2664670
An OpenCL(TM) Deep Learning Accelerator on Arria 10, Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays -FPGA '17, pp.55-64, 2017. ,
Caffeinated FPGAs: FPGA framework For Convolutional Neural Networks, 2016 International Conference on Field-Programmable Technology (FPT), 2016. ,
DOI : 10.1109/FPT.2016.7929549
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System, Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '17, pp.35-44, 2017. ,
DOI : 10.1145/2847263.2847276
Design of an Energy- EEcient Accelerator for Training of Convolutional Neural Networks using Frequency-Domain Computation, Proceedings of the Annual Conference on Design Automation -DAC '17, 2017. ,
FpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs, Proceedings of the IEEE Annual International Symposium on Field- Programmable Custom Computing Machines -FCCM '16, pp.40-47, 2016. ,
From high-level deep neural models to FPGAs, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp.1-12, 2016. ,
DOI : 10.1109/MICRO.2016.7783720
A high performance FPGA-based accelerator for large-scale convolutional neural networks, Proceedings of the International Conference on Field Programmable Logic and Applications -FPL '16, pp.1-9 ,
On How to Design Dataaow FPGA- Based Accelerators for Convolutional Neural Networks, Proceedings of the IEEE Computer Society Annual Symposium on VLSI -ISVLSI' 17, pp.639-644, 2017. ,
Tactics to Directly Map CNN Graphs on Embedded FPGAs, IEEE Embedded Systems Letters, vol.9, issue.4, pp.1-4, 2017. ,
DOI : 10.1109/LES.2017.2743247
URL : https://hal.archives-ouvertes.fr/hal-01626462
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '15, pp.161-170, 2015. ,
DOI : 10.1145/1498765.1498785
Design space exploration of FPGA-based Deep Convolutional Neural Networks, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pp.575-580 ,
DOI : 10.1109/ASPDAC.2016.7428073
Curbing the Roooine : a Scalable and Flexible Architecture for CNNs on FPGA, Proceedings of the ACM International Conference on Computing Frontiers -CF '16, pp.376-383, 2016. ,
PLACID, ACM Transactions on Multimedia Computing, Communications, and Applications, pp.1-62 ,
DOI : 10.1145/2684746.2689060
Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs, Proceedings of the 54th Annual Design Automation Conference 2017 on , DAC '17, pp.1-6, 2017. ,
DOI : 10.1145/1815961.1815993
An FPGA Realization of a Deep Convolutional Neural Network Using a Threshold Neuron Pruning, Proceedings of the International Symposium on Applied Reconngurable Computing -ARC'16, pp.268-280, 2017. ,
DOI : 10.1145/2684746.2689060
Deep Learning with Limited Numerical Precision, Proceedings of the International Conference on Machine Learning -ICML '15, pp.1737-1746, 2015. ,
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients, 2016. ,
Training deep neural networks with low precision multiplications. arXiv e-print, p.2014 ,
Hardware-oriented Approximation of Convolutional Neural Networks, arXiv preprint, p.8, 2016. ,
BinaryConnect: Training Deep Neural Networks with binary weights during propagations, Advances in Neural Information Processing Systems - NIPS'15, pp.3123-3131, 2015. ,
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks, Proceedings of the European Conference on Computer Vision -ECCV'16, pp.525-542, 2016. ,
DOI : 10.1103/PhysRevLett.115.128101
URL : http://arxiv.org/pdf/1603.05279
FINN, Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '17, pp.65-74, 2017. ,
DOI : 10.1145/1498765.1498785
URL : http://arxiv.org/pdf/1612.07119
YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp.2016-236, 2016. ,
DOI : 10.1109/ISVLSI.2016.111
Saliency detection by multi-context deep learning, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1265-1274, 2015. ,
DOI : 10.1109/CVPR.2015.7298731
A New Stochastic Computing Multiplier with Application to Deep Convolutional Neural Networks, Proceedings of the 54th Annual Design Automation Conference 2017 on , DAC '17, pp.1-6, 2017. ,
DOI : 10.1109/ASPDAC.2017.7858405
Energy-EEcient Hybrid Stochastic-Binary Neural Networks for Near-Sensor Computing, Proceedings of the Conference on Design, Automation and Test in Europe -DATE '17, 2017. ,
Dynamic Energy-accuracy Trade-oo Using Stochastic Computing in Deep Neural Networks, Proceedings of the Annual Conference on Design Automation -DAC '16, pp.1-124, 2016. ,
Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network, Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '17, pp.25-34, 2017. ,
DOI : 10.1201/EBK1439811924
Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs, Proceedings of the International Work-Conference on Artiicial Neural Networks-IWANN '17, pp.271-282, 2017. ,
ClCaae: OpenCL accelerated caae for convolutional neural networks, Proceedings of the IEEE International Parallel and Distributed Processing Symposium -IPDPS '16, pp.50-57, 2016. ,
DOI : 10.1109/ipdpsw.2016.182
The Intel® FPGA SDK for Open Computing Language (OpenCL), 2016. ,
Energy-EEcient CNN Implementation on a Deeply Pipelined FPGA Cluster, Proceedings of the International Symposium on Low Power Electronics and Design -ISLPED '16, pp.326-331, 2016. ,
Caaeine: Caaeine: Towards uniformed representation and acceleration for deep convolutional neural networks, Proceedings of the International Conference on Computer-Aided Design -ICCAD '16, pp.1-8, 2016. ,
Numerical solution of linear equations with Toeplitz and Vector Toeplitz matrices, Numerische Mathematik, vol.13, issue.10, pp.404-424 ,
Arithmetic complexity of computations, Siam, vol.33, 1980. ,
DOI : 10.1137/1.9781611970364
Fast Algorithms for Convolutional Neural Networks. arXiv e-print, pp.150-2015 ,
DOI : 10.1109/cvpr.2016.435
URL : http://arxiv.org/pdf/1509.09308
Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp.101-108, 2017. ,
DOI : 10.1109/FCCM.2017.64
The scientist and engineer's guide to digital signal processing. California Technical Pub, 1997. ,
A Massively Parallel Coprocessor for Convolutional Neural Networks, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, pp.53-60, 2009. ,
DOI : 10.1109/ASAP.2009.25
CNP: An FPGA-based processor for Convolutional Networks, 2009 International Conference on Field Programmable Logic and Applications, pp.1689-1699, 2009. ,
DOI : 10.1109/FPL.2009.5272559
A dynamically configurable coprocessor for convolutional neural networks, ACM SIGARCH Computer Architecture News, vol.38, issue.3, pp.247-257 ,
DOI : 10.1145/1816038.1815993
NeuFlow: A runtime reconngurable dataaow processor for vision, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '11, pp.109-116 ,
A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.696-701 ,
DOI : 10.1109/CVPRW.2014.106
EEcient FPGA acceleration of Convolutional Neural Networks using logical-3D compute array, Proceedings of the Conference on Design, Automation and Test in Europe -DATE '16, 2016. ,
Fused-layer CNN accelerators, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016. ,
DOI : 10.1109/MICRO.2016.7783725
Optimizing Loop Operation and Dataaow in FPGA Acceleration of Deep Convolutional Neural Networks, Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays -FPGA '17, pp.45-54, 2017. ,
Loop tiling for reconngurable accelerators, Proceedings of the International Conference on Field Programmable Logic and Applications -FPL '01, pp.398-408, 2001. ,
DOI : 10.1007/3-540-44687-7_41
Roofline, Communications of the ACM, vol.52, issue.4, p.65, 2009. ,
DOI : 10.1145/1498765.1498785
End-to-end scalable FPGA accelerator for deep residual networks, 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pp.1-4 ,
DOI : 10.1109/ISCAS.2017.8050344
An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks, 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp.1-8 ,
DOI : 10.23919/FPL.2017.8056824
Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks, ACM Transactions on Reconfigurable Technology and Systems, vol.10, issue.3, pp.1-23, 2017. ,
DOI : 10.1145/2684746.2689060
A Preliminary Architecture for a Basic Data--ow Processor, Proceedings of the International Symposium on Computer Architecture -ISCA '75, pp.126-132, 1975. ,
Low power design methodology for signal processing systems using lightweight dataaow techniques, Proceedings of the Conference on Design and Architectures for Signal and Image Processing -DASIP' 16, pp.82-89, 2016. ,
A lightweight dataaow approach for design and implementation of SDR systems, Proceedings of the Wireless Innovation Conference and Product Exposition, pp.640-645, 2010. ,
Synchronous data ow, Proceedings of the IEEE, 1987. ,
Latency-Driven Design for FPGA-based Convolutional Neural Networks, Proceedings of the International Conference on Field Programmable Logic and Applications -FPL '17, 2017. ,
A Survey of Techniques for Approximate Computing, ACM Computing Surveys, vol.48, issue.4, pp.1-33 ,
DOI : 10.1109/DAC.2014.6881426
Fixed point optimization of deep convolutional neural networks for object recognition, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015. ,
DOI : 10.1109/ICASSP.2015.7178146
Fixed Point Quantization of Deep Convolutional Networks, Proceedings of the International Conference on Machine Learning -ICML '16, pp.2849-2858, 2016. ,
Ran El-Yaniv, and Yoshua Bengio, Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. arxiv e-print, p.2016 ,
Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks, Journal of Computer Science and Technology, vol.115, issue.3, pp.667-682, 2017. ,
DOI : 10.1007/978-94-010-0201-1_1
Quantized Convolutional Neural Networks for Mobile Devices, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4820-4828, 2016. ,
DOI : 10.1109/CVPR.2016.521
Hardware Complexity of Modular Multiplication and Exponentiation, IEEE Transactions on Computers, vol.56, issue.10, pp.1308-1319, 2007. ,
DOI : 10.1109/TC.2007.1084
Dynamically scaled xed point arithmetic, Proceedings of the IEEE Paciic Rim Conference on Communications, Computers and Signal Processing Conference, pp.315-318, 1991. ,
DOI : 10.1109/pacrim.1991.160742
FixCaae: Training CNN with Low Precision Arithmetic Operations by Fixed Point Caae, Proceedings of the International Workshop on Advanced Parallel Processing Technologies -APPT '17, pp.38-50 ,
DOI : 10.1007/978-3-319-67952-5_4
A fully connected layer elimination for a binarizec convolutional neural network on an FPGA, 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp.1-4 ,
DOI : 10.23919/FPL.2017.8056771
Ran El-Yaniv, and Yoshua Bengio, Binarized neural networks. In Advances in Neural Information Processing Systems -NIPS'16, pp.4107-4115 ,
Trained Ternary Quantization, Proceedings of the International Conference on Learning Representations -ICLR'17, p.2017 ,
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs, Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '17, 2017. ,
DOI : 10.1145/2897937.2898003
Scalable High-Performance Architecture for Convolutional Ternary Neural Networks on FPGA, Proceedings of the International Conference on Field Programmable Logic and Applications -FPL '17, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01701116
Fast and Accurate Computation using Stochastic Circuits, Proceedings of the Conference on Design, Automation and Test in Europe -DATE '14, 2014. ,
DOI : 10.7873/date.2014.089
Sparse Convolutional Neural Networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '15, pp.806-814, 2015. ,
Learning both Weights and Connections for EEcient Neural Network, Advances in Neural Information Processing Systems -NIPS'15, pp.1135-1143, 2015. ,
Designing Energy-EEcient Convolutional Neural Networks using Energy-Aware Pruning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition -CVPR '17, 2017. ,
DOI : 10.1109/cvpr.2017.643
URL : http://arxiv.org/pdf/1611.05128
Deep Compression -Compressing Deep Neural Networks with Pruning, Trained Quantization and Huuman Coding, Proceedings of the International Conference on Learning Representations -ICLR'16, pp.1-13, 2016. ,
Learning Separable Filters, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, issue.1, pp.94-106, 2015. ,
DOI : 10.1109/TPAMI.2014.2343229
A scalable sparse matrix-vector multiplication kernel for energy-eecient sparse-blas on FPGAs, Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays -FPGA '14, pp.161-170, 2014. ,