Web Content Classification with Topic and Sentiment Analysis - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

Web Content Classification with Topic and Sentiment Analysis

Shuhua Liu
  • Fonction : Auteur
  • PersonId : 957461
Thomas Forss
  • Fonction : Auteur
yes
Kaj-Mikael Bjork
  • Fonction : Auteur
yes

Résumé

Automatic classification of web content has been studied extensively, using different learning methods and tools, investigating different datasets to serve different purposes. Most of the studies have made use of content and structural features of web pages. In this study we present a new approach for automatically classifying web pages into pre-defined topic categories. We apply text summarization and sentiment analysis techniques to extract topic and sentiment indicators of web pages. We then build classifiers based on the extracted topic and sentiment features. Our results offer valuable insights and inputs to the development of web detection systems.
Fichier principal
Vignette du fichier
TDE_Berlin_2014_final_version.pdf (243.48 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01005881 , version 1 (13-06-2014)

Identifiants

  • HAL Id : hal-01005881 , version 1

Citer

Shuhua Liu, Thomas Forss, Kaj-Mikael Bjork. Web Content Classification with Topic and Sentiment Analysis. Terminology and Knowledge Engineering 2014, Jun 2014, Berlin, Germany. 9 p. ⟨hal-01005881⟩

Collections

TKE2014
125 Consultations
1211 Téléchargements

Partager

Gmail Facebook X LinkedIn More