Using language models to improve opinion detection

Faiza Belbachir; Mohand Boughanem

doi:10.1016/j.ipm.2018.07.001

Article Dans Une Revue Information Processing and Management Année : 2018

Using language models to improve opinion detection

(1) , (2, 3)

1
2
3

Faiza Belbachir

Fonction : Auteur

Institut Polytechnique des Sciences Avancées

Mohand Boughanem

Fonction : Auteur
PersonId : 1188423
IdHAL : mohand-boughanem
ORCID : 0000-0001-7004-0807
IdRef : 069002916

Recherche d’Information et Synthèse d’Information

Université Toulouse III - Paul Sabatier

Résumé

Opinion mining is one of the most important research tasks in the information retrieval research community. With the huge volume of opinionated data available on the Web, approaches must be developed to differentiate opinion from fact. In this paper, we present a lexicon-based approach for opinion retrieval. Generally, opinion retrieval consists of two stages: relevance to the query and opinion detection. In our work, we focus on the second state which itself focusses on detecting opinionated documents . We compare the document to be analyzed with opinionated sources that contain subjective information. We hypothesize that a document with a strong similarity to opinionated sources is more likely to be opinionated itself. Typical lexicon-based approaches treat and choose their opinion sources according to their test collection, then calculate the opinion score based on the frequency of subjective terms in the document. In our work, we use different open opinion collections without any specific treatment and consider them as a reference collection. We then use language models to determine opinion scores. The analysis document and reference collection are represented by different language models (i.e., Dirichlet, Jelinek-Mercer and two-stage models). These language models are generally used in information retrieval to represent the relationship between documents and queries. However, in our study, we modify these language models to represent opinionated documents. We carry out several experiments using Text REtrieval Conference (TREC) Blogs 06 as our analysis collection and Internet Movie Data Bases (IMDB), Multi-Perspective Question Answering (MPQA) and CHESLY as our reference collection. To improve opinion detection, we study the impact of using different language models to represent the document and reference collection alongside different combinations of opinion and retrieval scores. We then use this data to deduce the best opinion detection models. Using the best models, our approach improves on the best baseline of TREC Blog (baseline4) by 30%.

Mots clés

Information retrieval Opinion detection Blog Language model

Domaines

Informatique et langage [cs.CL]

Fichier principal

belbachir_22484.pdf (509.75 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Open Archive Toulouse Archive Ouverte (OATAO) : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02279437

Soumis le : jeudi 5 septembre 2019-11:50:55

Dernière modification le : lundi 20 novembre 2023-11:44:21

Archivage à long terme le : jeudi 6 février 2020-09:39:13

Dates et versions

hal-02279437 , version 1 (05-09-2019)

Identifiants

HAL Id : hal-02279437 , version 1
DOI : 10.1016/j.ipm.2018.07.001
OATAO : 22484

Citer

Faiza Belbachir, Mohand Boughanem. Using language models to improve opinion detection. Information Processing and Management, 2018, 54 (6), pp.958-968. ⟨10.1016/j.ipm.2018.07.001⟩. ⟨hal-02279437⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS SMS UT1-CAPITOLE IRIT IRIT-IRIS IRIT-GD TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

55 Consultations

204 Téléchargements

Using language models to improve opinion detection

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager