Skip to Main content Skip to Navigation
Conference papers

BERT-Proof Syntactic Structures: Investigating Errors in Discontinuous Constituency Parsing

Abstract : The combined use of neural scoring systems and BERT fine-tuning has led to very high results in many natural language processing (NLP) tasks. These high results raise two important questions about the contribution and the limitations of pretrained-language models: (i) what are the remaining errors in the bestperforming systems? (ii) what are the types of test examples where pretrained language models help the most? In this paper, we investigate both questions for the task of English discontinuous constituency parsing on the Penn Treebank, for which recent models obtain close to 95 F 1 score. To do so, we propose two methods for automatically analysing the errors of discontinuous parser. First, we annotate and release a test-suite focused on the syntactic phenomena responsible for discontinuities in the Penn Treebank, enabling us to obtain a per-phenomenon evaluation of a parser's output. Second, we extend the Berkeley Parser Analyser-a tool that classifies parsing errors according to predefined structural patterns-, to discontinuous trees. We apply both methods to characterize errors of a state-of-theart transition-based discontinuous parser, and to provide an overview of the contribution of BERT to this task.
Document type :
Conference papers
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03339847
Contributor : Maximin Coavoux Connect in order to contact the contributor
Submitted on : Friday, September 10, 2021 - 12:42:12 PM
Last modification on : Tuesday, October 19, 2021 - 11:19:14 AM
Long-term archiving on: : Saturday, December 11, 2021 - 6:05:36 PM

File

2021.findings-acl.288.pdf
Publisher files allowed on an open archive

Identifiers

Collections

Citation

Maximin Coavoux. BERT-Proof Syntactic Structures: Investigating Errors in Discontinuous Constituency Parsing. Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), Association for Computational Linguistics, Aug 2021, Online, France. pp.3259-3272, ⟨10.18653/v1/2021.findings-acl.288⟩. ⟨hal-03339847⟩

Share

Metrics

Les métriques sont temporairement indisponibles