HMM-based passage models for document classification and ranking

Abstract : We present an application of Hidden Markov Models to supervised document classification and ranking. We consider a family of models that take into account the fact that relevant documents may contain irrelevant passages; the originality of the model is that it does not explicitly segment documents but rather considers all possible segmentations in its final score. This model generalizes the multinomial Naive Bayes and it is derived from a more general model for different access tasks. The model is evaluated on the REUTERS test collection and compared to the multinomial Naive Bayes model. It is shown to be more robust with respect to the training set size and to improve the performance both for ranking and classification, specially for classes with few training examples.
Complete list of metadatas
Contributor : Ludovic Denoyer <>
Submitted on : Tuesday, August 30, 2016 - 10:17:36 AM
Last modification on : Thursday, March 21, 2019 - 2:19:21 PM


  • HAL Id : hal-01357599, version 1


Ludovic Denoyer, Hugo Zaragoza, Patrick Gallinari. HMM-based passage models for document classification and ranking. ECIR'01 - 23rd European Colloquium on Information Retrieval Research, Apr 2001, Darmstadt, Germany. pp.126-135. ⟨hal-01357599⟩



Record views