Skip to Main content Skip to Navigation
New interface
Conference papers

Unsupervised Tree Extraction in Embedding Spaces for Taxonomy Induction

Abstract : Exposing latent structure (graph, tree...) of data is a major challenge to deal with the web of data. Today's embedding techniques incorporate any data source (noisy graphs, item similarities, plain text) into continuous vector spaces that are typically used as input to classifier. In this work, we are dealing with the opposite task: finding structures (taxonomies) from embedded data. We provide an original unsupervised methodology for taxonomy induction by directly searching for graph structures preserving pairwise distances between items. Contrary to the state-of-the-art (SOTA), our approach does not require to train classifiers; it is also more versatile as it can be applied to any embedding (eg. word embedding, similarity embedding like space-time local embedding...). On standard benchmarks and metrics, our approach yields SOTA performance. As another contribution, we propose better evaluation metrics for taxonomy induction, leveraging graph kernel similarities and edit distance, showing that the structures of our predicted taxonomies are significantly closer to the ground-truth than SOTA solutions.
Complete list of metadata
Contributor : Vincent Claveau Connect in order to contact the contributor
Submitted on : Tuesday, December 21, 2021 - 2:08:58 PM
Last modification on : Friday, August 5, 2022 - 2:54:52 PM
Long-term archiving on: : Tuesday, March 22, 2022 - 6:35:39 PM


Files produced by the author(s)



François Torregrossa, Robin Allesiardo, Vincent Claveau, Guillaume Gravier. Unsupervised Tree Extraction in Embedding Spaces for Taxonomy Induction. WI-IAT 2021 - 20th IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Dec 2021, Melbourne, Australia. pp.1-8, ⟨10.1145/3486622.3493941⟩. ⟨hal-03494697⟩



Record views


Files downloads