UJM at INEX 2009 XML Mining Track
Résumé
This paper reports our experiments carried out for the INEX XML Mining track 2009, consisting in developing categorization methods for multi-labeled XML documents. We represent XML documents as vectors of indexed terms. The purpose of our experiments is twofold: firstly we aim to compare strategies that reduce the index size using an improved feature selection criteria CCD. Secondly, we compare a thresholding strategy (MCut) we proposed with common RCut, PCut strategies. The index size was reduced in such a way that the results were less good than expected. However, we obtained good improvements with the MCut thresholding strategy.
Domaines
Traitement du texte et du document
Origine : Fichiers produits par l'(les) auteur(s)
Loading...