Mining user-generated comments - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Mining user-generated comments

Julien Subercaze
Christophe Gravier
Frederique Laforest

Résumé

—Social-media websites, such as newspapers, blogs, and forums, are the main places of generation and exchange of user-generated comments. These comments are viable sources for opinion mining, descriptive annotations and information extraction. User-generated comments are formatted using a HTML template, they are therefore entwined with the other information in the HTML document. Their unsupervised extraction is thus a taxing issue – even greater when considering the extraction of nested answers by different users. This paper presents a novel technique (CommentsMiner) for unsupervised users comments extraction. Our approach uses both the theoretical framework of frequent subtree mining and data extraction techniques. We demonstrate that the comment mining task can be modelled as a constrained closed induced subtree mining problem followed by a learning-to-rank problem. Our experimental evaluations show that CommentsMiner solves the plain comments and nested comments extraction problems for 84% of a representative and accessible dataset, while outperforming existing baselines techniques.
Fichier principal
Vignette du fichier
wi2015.pdf (580.76 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01187082 , version 1 (28-08-2015)

Identifiants

  • HAL Id : hal-01187082 , version 1

Citer

Julien Subercaze, Christophe Gravier, Frederique Laforest. Mining user-generated comments. IEEE/WIC/ACM International Conference on Web Intelligence, Dec 2015, Singapour, Singapore. ⟨hal-01187082⟩
165 Consultations
478 Téléchargements

Partager

Gmail Facebook X LinkedIn More