Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

The Energy and Carbon Footprint of Training End-to-End Speech Recognizers

Abstract : Deep learning contributes to reaching higher levels of artificial intelligence. Due to its pervasive adoption, however, growing concerns on the environmental impact of this technology have been raised. In particular, the energy consumed at training and inference time by modern neural networks is far from being negligible and will increase even further due to the deployment of ever larger models. This work investigates for the first time the carbon cost of end-to-end automatic speech recognition (ASR). First, it quantifies the amount of CO2 emitted while training state-of-the-art (SOTA) ASR systems on a university-scale cluster. Then, it shows that a tiny performance improvement comes at an extremely high carbon cost. For instance, the conducted experiments reveal that a SOTA Transformer emits 50% of its total training released CO2 solely to achieve a final decrease of 0.3 of the word error rate. With this study, we hope to raise awareness on this crucial topic and we provide guidelines, insights, and estimates enabling researchers to better assess the environmental impact of training speech technologies.
Document type :
Preprints, Working Papers, ...
Complete list of metadata
Contributor : Titouan Parcollet <>
Submitted on : Tuesday, April 6, 2021 - 9:12:51 AM
Last modification on : Thursday, April 8, 2021 - 9:41:49 AM


Files produced by the author(s)


  • HAL Id : hal-03190119, version 1



Titouan Parcollet, Mirco Ravanelli. The Energy and Carbon Footprint of Training End-to-End Speech Recognizers. 2021. ⟨hal-03190119⟩



Record views


Files downloads