Modeling Noun-Phrases Dynamics in Specialized Text Collections

Abstract : The science of biology has entered a new era with new approaches for information processing frameworks and high-throughput experiments. This has led to a high rate of publication production and the emergence of large accessible databases in English, permitting the creation of text collections in any specialized domain. To process such text data, systematic analysis of language properties is helpful and benefits from a distribution description. In this article, firstly, as scientific publications are time-stamped we can analyse distribution profiles of noun-phrases (i.e. “content-words”) over time. Hence, time-dependency analysis of noun-phrases reveals interesting specific behaviour taking into account sequential occurrence of features. Single content-word distributions appear to be linearly shaped. We also observed that the association of content-words is distributed in a different way over time, i.e. as a mixed beta distribution.
Document type :
Journal articles
Complete list of metadatas
Contributor : Archive Ouverte Prodinra <>
Submitted on : Friday, March 1, 2019 - 8:07:11 PM
Last modification on : Saturday, March 2, 2019 - 9:19:53 AM




Nicolas Turenne. Modeling Noun-Phrases Dynamics in Specialized Text Collections. Journal of Quantitative Linguistics, Taylor & Francis (Routledge), 2010, 17 (3), pp.212-228. ⟨10.1080/09296174.2010.485447⟩. ⟨hal-02054488⟩



Record views