Text Mining Approaches for Semantic Similarity Exploration and Metadata Enrichment of Scientific Digital Libraries

Abstract : For scientists and researchers, it is very critical to ensure knowledge is accessible for re-use and development. Moreover, the way we store and manage scientific articles and their metadata in digital libraries determines the amount of relevant articles we can discover and access depending on what is actually meant in a search query. Yet, are we able to explore all semantically relevant scientific documents with the existing keyword-based search information retrieval systems? This is the primary question addressed in this thesis. Hence, the main purpose of our work is to broaden or expand the knowledge spectrum of researchers working in an interdisciplinary domain when they use the information retrieval systems of multidisciplinary digital libraries. However, the problem raises when such researchers use community-dependent search keywords while other scientific names given to relevant concepts are being used in a different research community. Towards proposing a solution to this semantic exploration task in multidisciplinary digital libraries, we applied several text mining approaches. First, we studied the semantic representation of words, sentences, paragraphs and documents for better semantic similarity estimation. In addition, we utilized the semantic information of words in lexical databases and knowledge graphs in order to enhance our semantic approach. Furthermore, the thesis presents a couple of use-case implementations of our proposed model. Finally, several experimental evaluations were conducted to validate the efficiency of the proposed approach. The results of the hybrid approach, based on the short text semantic representation and the word semantic information extracted from lexical databases, were very encouraging. We believe that our new proposed approaches based on text mining techniques practically achieved the expected results in addressing the limitation of semantic exploration in the classical information retrieval systems of digital libraries. The advantage of our approach is that it is applicable to large-scale multidisciplinary digital libraries. In that sense, we use information found in the metadata of such libraries in order to enrich it with additional semantic tags. As a consequence, the enhanced and enriched metadata enable researchers to retrieve more semantically relevant documents that would have otherwise remained unexplored without the enrichment. We think that our study and proposed approaches will provide practical solutions to knowledge access and contribute to the research communities and fields of text mining and data management in digital libraries.
Complete list of metadatas

Cited literature [186 references]  Display  Hide  Download

Contributor : Hussein T. Al-Natsheh <>
Submitted on : Tuesday, March 12, 2019 - 3:16:44 PM
Last modification on : Wednesday, April 3, 2019 - 1:12:51 AM
Long-term archiving on : Thursday, June 13, 2019 - 3:23:25 PM


Files produced by the author(s)


  • HAL Id : tel-02065269, version 1



Hussein Al-Natsheh. Text Mining Approaches for Semantic Similarity Exploration and Metadata Enrichment of Scientific Digital Libraries. Information Retrieval [cs.IR]. Lyon 2, 2019. English. ⟨tel-02065269⟩



Record views


Files downloads