How Many Layers and Why? An Analysis of the Model Depth in Transformers - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

How Many Layers and Why? An Analysis of the Model Depth in Transformers

Benoît Crabbé
  • Fonction : Auteur
  • PersonId : 1129720

Résumé

In this study, we investigate the role of the multiple layers in deep transformer models. We design a variant of ALBERT that dynamically adapts the number of layers for each token of the input. The key specificity of ALBERT is that weights are tied across layers. Therefore, the stack of encoder layers iteratively repeats the application of the same transformation function on the input. We interpret the repetition of this application as an iterative process where the token contextualized representations are progressively refined. We analyze this process at the token level during pretraining, fine-tuning, and inference. We show that tokens do not require the same amount of iterations and that difficult or crucial tokens for the task are subject to more iterations.
Fichier principal
Vignette du fichier
2021.acl-srw.23.pdf (945.25 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-03601412 , version 1 (08-03-2022)

Identifiants

  • HAL Id : hal-03601412 , version 1

Citer

Antoine Simoulin, Benoît Crabbé. How Many Layers and Why? An Analysis of the Model Depth in Transformers. Association of Computational Linguistics (student), 2021, Bangkok, Thailand. ⟨hal-03601412⟩
80 Consultations
76 Téléchargements

Partager

Gmail Facebook X LinkedIn More