Skip to Main content Skip to Navigation
Conference papers

GOOFRE version 2

Etienne Brunet 1 Laurent Vanni 1
1 BCL, équipe Logométrie : corpus, traitements, modèles
BCL - Bases, Corpus, Langage (UMR 7320 - UNS / CNRS)
Abstract : The amount of data contained within Google Books has doubled over the last two years and now exceeds 500 billion words. A new treatment of the data has included a re-examination of scanned images, offering a more accurate recognition of the text. In addition, for the first time, included texts have been subjected to deambigation and lemmatisation. Finally, the website Culturomics has made tools available that facilitate its accessibility. It seemed interesting, therefore, to develop a new expertise and to create a new database, complete with all the necessary statistical tools, available online or locally, for exploiting such large corpora.
Complete list of metadatas

Cited literature [2 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01196595
Contributor : Laurent Vanni <>
Submitted on : Wednesday, September 16, 2015 - 9:57:56 AM
Last modification on : Tuesday, May 26, 2020 - 6:50:57 PM
Document(s) archivé(s) le : Monday, December 28, 2015 - 11:43:08 PM

File

jadt2014-paper-62.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01196595, version 1

Collections

Citation

Etienne Brunet, Laurent Vanni. GOOFRE version 2. JADT 2014, Jun 2014, Paris, France. pp.106-119. ⟨hal-01196595⟩

Share

Metrics

Record views

400

Files downloads

292