Skip to Main content Skip to Navigation
Conference papers

Evaluation of text clustering methods and their dataspace embeddings: an exploration

Abstract : Fair evaluation of text clustering methods needs to clarify the relations between 1)pre-processing, resulting in raw term occurrence vectors, 2)data transformation, and 3)method in the strict sense. We have tried to empirically compare a dozen well-known methods and variants in a protocol crossing three contrasted open-access corpora in a few tens transformed dataspaces. We compared the resulting clusterings to their supposed "ground-truth" classes by means of four usual indices. The results show both a confirmation of well-established implicit combinations, and good performances of unexpected combinations, mostly in spectral or kernel dataspaces. The rich material resulting from these some 450 runs includes a wealth of intriguing facts, which needs further research on the specificities of text corpora in relation to methods and dataspaces.
Document type :
Conference papers
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-02116493
Contributor : Martine Cadot <>
Submitted on : Thursday, January 30, 2020 - 12:49:08 AM
Last modification on : Tuesday, October 27, 2020 - 2:34:30 PM

Identifiers

  • HAL Id : hal-02116493, version 4

Citation

Alain Lelu, Martine Cadot. Evaluation of text clustering methods and their dataspace embeddings: an exploration. IFCS 2019 - 16th International of the Federation of Classification Societies, Aug 2019, Thessaloniki, Greece. ⟨hal-02116493v4⟩

Share

Metrics

Record views

147

Files downloads

326