An Intrinsic Difference Between Vanilla RNNs and GRU Models - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

An Intrinsic Difference Between Vanilla RNNs and GRU Models

Résumé

In order to perform well in practice, Recurrent Neural Networks (RNN) require computationally heavy architectures, such as Gated Recurrent Unit (GRU) or Long Short Term Memory (LSTM). Indeed, the original Vanilla model fails to encapsulate middle and long term sequential dependencies. The aim of this paper is to show that gradient training issues, which have motivated the introduction of LSTM and GRU models, are not sufficient to explain the failure of the simplest RNN. Using the example of Reber's grammar, we propose an experimental measure of both Vanilla and GRU models, which suggest an intrinsic difference in their dynamics. A better mathematical understanding of this difference could lead to more efficient models without compromising performance.
Fichier non déposé

Dates et versions

hal-01522887 , version 1 (15-05-2017)

Identifiants

  • HAL Id : hal-01522887 , version 1

Citer

Tristan Stérin, Nicolas Farrugia, Vincent Gripon. An Intrinsic Difference Between Vanilla RNNs and GRU Models. COGNITIVE 2017 : Ninth International Conference on Advanced Cognitive Technologies and Applications, Feb 2017, Athènes, Greece. pp.76 - 81. ⟨hal-01522887⟩
321 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More