Hyphe, a Curation-Oriented Approach to Web Crawling for the Social Sciences - Archive ouverte HAL Accéder directement au contenu
Proceedings/Recueil Des Communications Année : 2016

Hyphe, a Curation-Oriented Approach to Web Crawling for the Social Sciences

Mathieu Jacomy
Paul Girard
Benjamin Ooghe
Tommaso Venturini

Résumé

The web is a field of investigation for social sciences, and platform-based studies have long proven their relevance. However the generic web is rarely studied in itself though it contains crucial aspects of the embodiment of social actors: personal blogs, institutional websites, hobby-specific media… We realized that some sociologists see existing web crawlers as “black boxes” unsuitable for research though they are willing to study the broad web. In this paper we present Hyphe, a crawler developed with and for social scientists, with an innovative “curation-oriented” approach. We expose the problems of using web-mining techniques in social science research and how to overcome those by specific features such as step-by-step corpus building and a memory structure allowing researchers to redefine dynamically the granularity of their “web entities”.
Fichier principal
Vignette du fichier
jacomy-all-hyphe-icwsm-2016.pdf (540.39 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01293078 , version 1 (24-03-2016)
hal-01293078 , version 2 (28-05-2021)

Licence

Paternité

Identifiants

Citer

Mathieu Jacomy, Paul Girard, Benjamin Ooghe, Tommaso Venturini. Hyphe, a Curation-Oriented Approach to Web Crawling for the Social Sciences. Association for the Advancement of Artificial Intelligence, 2016. ⟨hal-01293078v2⟩
839 Consultations
487 Téléchargements

Partager

Gmail Facebook X LinkedIn More