Skip to Main content Skip to Navigation
Conference papers

Unleash the Potential of your Website! 180,000 webpages from the French Natural History Museum marked up with Bioschemas/Schema.org biodiversity types

Abstract : The challenge of finding, retrieving and making sense of biodiversity data is being tackled by many different approaches. Projects like the Global Biodiversity Information Facility (GBIF) or Encyclopedia of Life (EoL) adopt an integrative approach where they republish, in a uniform manner, records aggregated from multiple data sources. With this centralized, siloed approach, such projects stand as powerful one-stop shops, but tend to reduce the visibility of other data sources that are not (yet) aggregated. At the other end of the spectrum, the Web of Data promotes the building of a global, distributed knowledge graph consisting of datasets published by independent institutions according to the Linked Open Data principles (Heath and Bizer 2011), such as Wikidata or DBpedia. Beyond these "sophisticated" infrastructures, websites remain the most common way of publishing and sharing scientific data at low cost. Thanks to web search engines, everyone can discover webpages. Yet, the summaries provided in results lists are often insufficiently informative to decide whether a web page is relevant with respect to some research interests, such that integrating data published by a wealth of websites is hardly possible. A strategy around this issue lies in annotating websites with structured, semantic metadata such as the Schema.org vocabulary (Guha et al. 2015). Webpages typically embed Schema.org annotations in the form of markup data (written in the RDFa or JSON-LD formats), which search engines harvest and exploit to improve ranking and provide more informative summarization. Bioschemas is a community effort working to extend Schema.org to support markup for Life Sciences websites (Michel and The Bioschemas Community 2018, Garcia et al. 2017). Bioschemas primarily re-uses existing terms from Schema.org, occasionally re-uses terms from third-party vocabularies, and when necessary proposes new terms to be endorsed by Schema.org. As of today, Bioschemas's biodiversity group has proposed the Taxon type*1 to support the annotation of any webpage denoting taxa, TaxonName to support more specifically the annotation of taxonomic names registries, and guidelines describing how to leverage existing vocabularies such as Darwin Core terms. To proceed further, the biodiversity community must now demonstrate its interest in having these terms endorsed by Schema.org: (1) through a critical mass of live markup deployments, and (2) by the development of applications capable of exploiting this markup data. Therefore, as a first step, the French National Museum of Natural History has marked up its natural heritage inventory website: over 180,000 webpages describing the species inventoried in French territories have been annotated with the Taxon and TaxonName types in the form of JSON-LD scripts (see example scripts). As an example, one can check the source of the Delphinus delphis page. In this presentation, by demonstrating that marking up existing webpages can be very inexpensive, we wish to encourage the biodiversity community to adopt this practice, engage in the discussion about biodiversity-related markup, and possibly propose new terms related e.g. to traits or collections. We believe that generalizing the use of such markup by the many websites reporting checklists, museum collections, occurrences, life traits etc. shall be a major step towards the generalized adoption of FAIR*2 principles (Wilkinson 2016), shall dramatically improve information discovery using search engines, and shall be a key accelerator for the development of novel, web-scale, biodiversity data integration scenarios.
Complete list of metadata

Cited literature [4 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02976710
Contributor : Franck Michel Connect in order to contact the contributor
Submitted on : Friday, October 23, 2020 - 3:24:04 PM
Last modification on : Monday, February 22, 2021 - 4:55:18 PM

Identifiers

Collections

Citation

Franck Michel, Gargominy Olivier, Benjamin Ledentec, Bioschemas Community. Unleash the Potential of your Website! 180,000 webpages from the French Natural History Museum marked up with Bioschemas/Schema.org biodiversity types. TDWG 2020 annual conference, Oct 2020, Virtual, France. ⟨10.3897/biss.4.59046⟩. ⟨hal-02976710⟩

Share

Metrics

Record views

83

Files downloads

50