Skip to Main content Skip to Navigation
New interface
Conference papers

Evaluating Hierarchical Clustering Methods for Corpora with Chronological Order

Abstract : Hierarchical clustering can traditionally be represented through a dendrogram: a rooted tree whose leaves are documents, the length of the path between two leaves representing the stylistic/linguistic distance between the documents. Clusters correspond to branching nodes: the shorter the distance between two nodes, the more they are expected to share stylistic and linguistic features. We wonder how much the resulting dendrogram is consistent with the chronological order of writing. Indeed, this would provide us with a method of evaluating the result of the clustering. More precisely, the question we want to answer is: can the branching nodes of the dendrogram be re-ordered so that its leaves follow a chronological order as best as possible, while of course preserving the structure of the dendrogram?
Complete list of metadata
Contributor : Philippe Gambette Connect in order to contact the contributor
Submitted on : Sunday, September 12, 2021 - 9:04:00 PM
Last modification on : Friday, September 16, 2022 - 1:51:34 PM
Long-term archiving on: : Monday, December 13, 2021 - 6:53:50 PM


  • HAL Id : hal-03341803, version 1


Philippe Gambette, Olga Seminck, Dominique Legallois, Thierry Poibeau. Evaluating Hierarchical Clustering Methods for Corpora with Chronological Order. EADH2021: Interdisciplinary Perspectives on Data. Second International Conference of the European Association for Digital Humanities, EADH, Sep 2021, Krasnoyarsk, Russia. ⟨hal-03341803⟩



Record views


Files downloads