Skip to Main content Skip to Navigation
New interface
Conference papers

Real-time unsupervised classification of web documents

Abstract : This paper adresses the problem of clustering dynamic collections of web documents. We show an iterative algorithm based on a fine-grained keyword extraction (simple, compound words and proper nouns). Each new document inserted in the collection is either assigned to an existing class containing documents of the same topic, or assigned to a new class. After each step, when necessary, classes are refined using statistical techniques. The implementation of this algorithm was successfully integrated in an application used for Information Intelligence.
Document type :
Conference papers
Complete list of metadata

Cited literature [7 references]  Display  Hide  Download
Contributor : Anthony Sigogne Connect in order to contact the contributor
Submitted on : Friday, August 3, 2012 - 6:40:39 PM
Last modification on : Thursday, September 29, 2022 - 2:21:15 PM
Long-term archiving on: : Monday, November 5, 2012 - 11:00:16 AM


Files produced by the author(s)



Anthony Sigogne, Mathieu Constant. Real-time unsupervised classification of web documents. 4th International Multiconference on Computer Science and Information Technology (IMCSIT'09), Oct 2009, Mragowo, Poland. pp.281-286, ⟨10.1109/IMCSIT.2009.5352714⟩. ⟨hal-00722749⟩



Record views


Files downloads