Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Abstract : This paper describes a representation for XML documents in order to classify them. Document classification is based on document representation techniques. How relevant the representation phase is, the more relevant the classification will be. We propose a representation model that exploits both the structure and the content of document. Our approach is based on vector space model: a document is represented by a vector of weighted features. Each feature is a couple of (tag, term). We have expanded tf*idf to calculate feature's weight according to term's structural level in the document. SVM has been used as learning algorithm. Experimentation on Reuters collection shows that our proposition improves classification performance compared to the standard classification model based on term vector.
https://hal.archives-ouvertes.fr/hal-00585914 Contributor : Import Ws IrsteaConnect in order to contact the contributor Submitted on : Thursday, April 14, 2011 - 11:00:46 AM Last modification on : Monday, June 27, 2022 - 11:32:50 AM Long-term archiving on: : Friday, July 15, 2011 - 2:40:06 AM
Samaneh Chagheri, Catherine Roussey, Sylvie Calabretto, Cyril Dumoulin. XML Document Classification using SVM. SFC'2010 (Société Francophone de Classification), Jun 2010, Saint Denis de la Réunion, France. pp.71-74. ⟨hal-00585914⟩