An Intrinsic Difference Between Vanilla RNNs and GRU Models

Tristan Stérin; Nicolas Farrugia; Vincent Gripon

Communication Dans Un Congrès Année : 2017

An Intrinsic Difference Between Vanilla RNNs and GRU Models

(1) , (2, 3) , (2, 3)

1
2
3

Tristan Stérin

Fonction : Auteur

École normale supérieure de Lyon

Nicolas Farrugia

Fonction : Auteur
PersonId : 18587
IdHAL : nicolas-farrugia
ORCID : 0000-0002-1159-3513

Département Electronique

Lab-STICC_IMTA_CACS_IAS

Vincent Gripon

Fonction : Auteur
PersonId : 21307
IdHAL : vincent-gripon
ORCID : 0000-0002-4353-4542
IdRef : 16122203X

Département Electronique

Lab-STICC_IMTA_CACS_IAS

Résumé

In order to perform well in practice, Recurrent Neural Networks (RNN) require computationally heavy architectures, such as Gated Recurrent Unit (GRU) or Long Short Term Memory (LSTM). Indeed, the original Vanilla model fails to encapsulate middle and long term sequential dependencies. The aim of this paper is to show that gradient training issues, which have motivated the introduction of LSTM and GRU models, are not sufficient to explain the failure of the simplest RNN. Using the example of Reber's grammar, we propose an experimental measure of both Vanilla and GRU models, which suggest an intrinsic difference in their dynamics. A better mathematical understanding of this difference could lead to more efficient models without compromising performance.

Domaines

Réseau de neurones [cs.NE] Electronique Traitement du signal et de l'image [eess.SP]

Ex-Bibliothèque Télécom Bretagne (devenu IMT Atlantique) : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01522887

Soumis le : lundi 15 mai 2017-16:44:39

Dernière modification le : mercredi 7 février 2024-08:54:43

Dates et versions

hal-01522887 , version 1 (15-05-2017)

Identifiants

HAL Id : hal-01522887 , version 1

Citer

Tristan Stérin, Nicolas Farrugia, Vincent Gripon. An Intrinsic Difference Between Vanilla RNNs and GRU Models. COGNITIVE 2017 : Ninth International Conference on Advanced Cognitive Technologies and Applications, Feb 2017, Athènes, Greece. pp.76 - 81. ⟨hal-01522887⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON UNIV-BREST INSTITUT-TELECOM CNRS LAB-STICC_UBO ENIB LAB-STICC IMT-ATLANTIQUE UDL

321 Consultations

0 Téléchargements

An Intrinsic Difference Between Vanilla RNNs and GRU Models

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager