Mining user-generated comments

Julien Subercaze 1 Christophe Gravier 1 Frederique Laforest 1
1 Laboratoire Hubert Curien / Eris
LHC - Laboratoire Hubert Curien [Saint Etienne]
Abstract : —Social-media websites, such as newspapers, blogs, and forums, are the main places of generation and exchange of user-generated comments. These comments are viable sources for opinion mining, descriptive annotations and information extraction. User-generated comments are formatted using a HTML template, they are therefore entwined with the other information in the HTML document. Their unsupervised extraction is thus a taxing issue – even greater when considering the extraction of nested answers by different users. This paper presents a novel technique (CommentsMiner) for unsupervised users comments extraction. Our approach uses both the theoretical framework of frequent subtree mining and data extraction techniques. We demonstrate that the comment mining task can be modelled as a constrained closed induced subtree mining problem followed by a learning-to-rank problem. Our experimental evaluations show that CommentsMiner solves the plain comments and nested comments extraction problems for 84% of a representative and accessible dataset, while outperforming existing baselines techniques.
Complete list of metadatas

Cited literature [25 references]  Display  Hide  Download
Contributor : Julien Subercaze <>
Submitted on : Friday, August 28, 2015 - 11:21:58 AM
Last modification on : Thursday, July 26, 2018 - 1:10:15 AM
Long-term archiving on : Sunday, November 29, 2015 - 10:14:41 AM


Files produced by the author(s)


  • HAL Id : hal-01187082, version 1


Julien Subercaze, Christophe Gravier, Frederique Laforest. Mining user-generated comments. IEEE/WIC/ACM International Conference on Web Intelligence, Dec 2015, Singapour, Singapore. ⟨hal-01187082⟩



Record views


Files downloads