Hyphe, a Curation-Oriented Approach to Web Crawling for the Social Sciences

Abstract : The web is a field of investigation for social sciences, and platform-based studies have long proven their relevance. However the generic web is rarely studied in itself though it contains crucial aspects of the embodiment of social actors: personal blogs, institutional websites, hobby-specific media… We realized that some sociologists see existing web crawlers as “black boxes” unsuitable for research though they are willing to study the broad web. In this paper we present Hyphe, a crawler developed with and for social scientists, with an innovative “curation-oriented” approach. We expose the problems of using web-mining techniques in social science research and how to overcome those by specific features such as step-by-step corpus building and a memory structure allowing researchers to redefine dynamically the granularity of their “web entities”.
Keywords : crawler web mining
Type de document :
Communication dans un congrès
International AAAI Conference on Web and Social Media, May 2016, Köln, Germany. Association for the Advancement of Artificial Intelligence, 2016
Liste complète des métadonnées


https://hal.archives-ouvertes.fr/hal-01293078
Contributeur : Spire Sciences Po Institutional Repository <>
Soumis le : jeudi 24 mars 2016 - 11:18:25
Dernière modification le : vendredi 25 mars 2016 - 01:06:00
Document(s) archivé(s) le : samedi 25 juin 2016 - 14:10:37

Fichier

jacomy-all-hyphe-icwsm-2016.pd...
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Mathieu Jacomy, Paul Girard, Benjamin Ooghe, Tommaso Venturini. Hyphe, a Curation-Oriented Approach to Web Crawling for the Social Sciences. International AAAI Conference on Web and Social Media, May 2016, Köln, Germany. Association for the Advancement of Artificial Intelligence, 2016. <hal-01293078v1>

Partager

Métriques

Consultations de
la notice

348

Téléchargements du document

250