RL extraction of syntax-based chunks for sentence compression

Hoa T Le; Christophe Cerisara; Claire Gardent

Communication Dans Un Congrès Année : 2019

RL extraction of syntax-based chunks for sentence compression

(1) , (1) , (1)

Hoa T Le

Fonction : Auteur
PersonId : 1079721

Natural Language Processing : representations, inference and semantics

Christophe Cerisara

Fonction : Auteur
PersonId : 2353
IdHAL : christophe-cerisara
IdRef : 102700168

Natural Language Processing : representations, inference and semantics

Claire Gardent

Fonction : Auteur
PersonId : 3949
IdHAL : claire-gardent
ORCID : 0000-0002-3805-6662
IdRef : 034104593

Natural Language Processing : representations, inference and semantics

Résumé

Sentence compression involves selecting key information present in the input and rewriting this information into a short, coherent text. While dependency parses have often been used for this purpose, we propose to exploit such syntactic information within a modern reinforcement learning-based extraction model. Furthermore, compared to other approaches that include syntactic features into deep learning models, we design a model that has better explainability properties and is flexible enough to support various shallow syntactic parsing modules. More specifically, we linearize the syntactic tree into the form of overlapping text segments, which are then selected with reinforcement learning and regenerated into a compressed form. Hence, despite relying on extractive components, our model is also able to handle abstractive summarization. We explore different ways of selecting subtrees from the dependency structure of the input sentence and compare the results of various models on the Gigaword corpus.

Domaines

Traitement du texte et du document Informatique et langage [cs.CL]

Fichier principal

icann19.pdf (322.64 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Christophe Cerisara : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02323821

Soumis le : mardi 20 octobre 2020-17:57:40

Dernière modification le : lundi 11 septembre 2023-17:41:18

Dates et versions

hal-02323821 , version 1 (21-10-2019)

hal-02323821 , version 2 (20-10-2020)

Identifiants

HAL Id : hal-02323821 , version 2

Citer

Hoa T Le, Christophe Cerisara, Claire Gardent. RL extraction of syntax-based chunks for sentence compression. ICANN 2019, Sep 2019, Munich, Germany. pp.337-347. ⟨hal-02323821v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE LORIA LORIA-NLPKD LUE-UL IMPACT-OLKI ANR

135 Consultations

188 Téléchargements

RL extraction of syntax-based chunks for sentence compression

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager