BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model

Several recent studies have tested the use of transformer language model representations to infer prosodic features for textto-speech synthesis (TTS). While these studies have explored prosody in general, in this work, we look specifically at the prediction of contrastive focus on personal pronouns. This is a particularly challenging task as it often requires semantic, discursive and/or pragmatic knowledge to predict correctly. We collect a corpus of utterances containing contrastive focus and we evaluate the accuracy of a BERT model, finetuned to predict quantized acoustic prominence features, on these samples. We also investigate how past utterances can provide relevant information for this prediction. Furthermore, we evaluate the controllability of pronoun prominence in a TTS model conditioned on acoustic prominence features.

Mots clés

text-to-speech language model BERT prosody contrastive focus control

Domaines

Informatique [cs] Traitement du signal et de l'image [eess.SP] Traitement du texte et du document

Fichier principal

BERT__can_HE_predict_contrastive_focus_.pdf (1.48 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Brooke Stephenson : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03791472

Soumis le : jeudi 29 septembre 2022-11:32:46

Dernière modification le : jeudi 4 avril 2024-21:27:30

Archivage à long terme le : vendredi 30 décembre 2022-18:30:01

Dates et versions

hal-03791472 , version 1 (29-09-2022)

Identifiants

HAL Id : hal-03791472 , version 1
DOI : 10.21437/Interspeech.2022-10116

Citer

Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber. BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model. Interspeech 2022 - 23rd Annual Conference of the International Speech Communication Association, Sep 2022, Incheon, South Korea. pp.3383-3387, ⟨10.21437/Interspeech.2022-10116⟩. ⟨hal-03791472⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS GIPSA LIG LIG_TDCGE_GETALP GIPSA-CRISSP GIPSA-PPC MIAI ANR LIG_SIDCH

109 Consultations

35 Téléchargements