Online refresh strategies for content based feed aggregation

Roxana Horincar 1 Bernd Amann 1 Thierry Artières 2
1 BD - Bases de Données
LIP6 - Laboratoire d'Informatique de Paris 6
2 MLIA - Machine Learning and Information Access
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : With the rapid growth of data sources, services and devices connected to the Internet, online available web content is getting more and more diverse and dynamic. In order to facilitate the efficient dissemination of evolving and temporary information, many web applications publish their new information as RSS and Atom documents which are then collected and transformed by RSS aggregators like Feedly or Yahoo! News. This article addresses the particular issue of large scale aggregation of highly dynamic information sources by focusing on the design of optimal refresh strategies for large collections of RSS feed documents. First, we introduce two quality measures specific to RSS aggregation which reflect the information completeness and average freshness of the result feeds. Then, we propose a best effort feed refresh strategy that achieves maximum aggregation quality compared with all other existing policies with the same average number of refreshes. This strategy is based on specific online change estimation models developed after a deep analysis of the temporal publication characteristics of a representative collection of real-world RSS feeds. The presented methods have been implemented and tested against synthetic and real-world RSS feed data sets.
Type de document :
Article dans une revue
World Wide Web, Springer Verlag, 2015, 18 (4), pp.913-947. 〈10.1007/s11280-014-0288-y〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01185362
Contributeur : Lip6 Publications <>
Soumis le : jeudi 20 août 2015 - 10:45:01
Dernière modification le : lundi 17 décembre 2018 - 01:21:28

Identifiants

Collections

Citation

Roxana Horincar, Bernd Amann, Thierry Artières. Online refresh strategies for content based feed aggregation. World Wide Web, Springer Verlag, 2015, 18 (4), pp.913-947. 〈10.1007/s11280-014-0288-y〉. 〈hal-01185362〉

Partager

Métriques

Consultations de la notice

304