How Many Layers and Why? An Analysis of the Model Depth in Transformers

Antoine Simoulin; Benoît Crabbé

Communication Dans Un Congrès Année : 2021

How Many Layers and Why? An Analysis of the Model Depth in Transformers

(1, 2) , (2)

1
2

Antoine Simoulin

Fonction : Auteur
PersonId : 1102658
IdHAL : antoine-simoulin

Quantmetry

Laboratoire de Linguistique Formelle

Benoît Crabbé

Fonction : Auteur
PersonId : 1129720

Laboratoire de Linguistique Formelle

Résumé

In this study, we investigate the role of the multiple layers in deep transformer models. We design a variant of ALBERT that dynamically adapts the number of layers for each token of the input. The key specificity of ALBERT is that weights are tied across layers. Therefore, the stack of encoder layers iteratively repeats the application of the same transformation function on the input. We interpret the repetition of this application as an iterative process where the token contextualized representations are progressively refined. We analyze this process at the token level during pretraining, fine-tuning, and inference. We show that tokens do not require the same amount of iterations and that difficult or crucial tokens for the task are subject to more iterations.

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

2021.acl-srw.23.pdf (945.25 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Benoit Crabbé : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03601412

Soumis le : mardi 8 mars 2022-11:29:07

Dernière modification le : mardi 20 février 2024-09:00:04

Archivage à long terme le : jeudi 9 juin 2022-19:11:15

Dates et versions

hal-03601412 , version 1 (08-03-2022)

Identifiants

HAL Id : hal-03601412 , version 1

Citer

Antoine Simoulin, Benoît Crabbé. How Many Layers and Why? An Analysis of the Model Depth in Transformers. Association of Computational Linguistics (student), 2021, Bangkok, Thailand. ⟨hal-03601412⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LLF CAMPUS-AAR AAI UP-SOCIETES-HUMANITES

80 Consultations

76 Téléchargements

How Many Layers and Why? An Analysis of the Model Depth in Transformers

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager