Web Content Classification with Topic and Sentiment Analysis
Résumé
Automatic classification of web content has been studied extensively, using different learning methods and tools, investigating different datasets to serve different purposes. Most of the studies have made use of content and structural features of web pages. In this study we present a new approach for automatically classifying web pages into pre-defined topic categories. We apply text summarization and sentiment analysis techniques to extract topic and sentiment indicators of web pages. We then build classifiers based on the extracted topic and sentiment features. Our results offer valuable insights and inputs to the development of web detection systems.
Domaines
Informatique et langage [cs.CL]
Origine : Fichiers produits par l'(les) auteur(s)
Loading...