Skip to Main content Skip to Navigation
Journal articles

Automatic summarization of scientific publications using a feature selection approach

Abstract : Feature Maximization is a feature selection method that deals efficiently with textual data: to design systems that are altogether language-agnostic, parameter-free and do not require additional corpora to function. We propose to evaluate its use in text summarization, in particular in cases where documents are structured. We first experiment this approach in a single-document summarization context. We evaluate it on the DUC AQUAINT corpus and show that despite the unstructured nature of the corpus, our system is above the baseline and produces encouraging results. We also observe that the produced summaries seem robust to redundancy. Next, we evaluate our method in the more appropriate context of SciSumm challenge, which is dedicated to research publications summarization. These publications are structured in sections and our class-based approach is thus relevant. We more specifically focus on the task that aims to summarize papers using those that refer to them. We consider and evaluate several systems using our approach dealing with specific bag of words. Furthermore, in these systems, we also evaluate cosine and graph-based distance for sentence weighting and comparison. We show that our Feature Maximization based approach performs very well in the SciSumm 2016 context for the considered task, providing better results than the known results so far, and obtaining high recall. We thus demonstrate the flexibility and the relevance of Feature Maximization in this context.
Complete list of metadata
Contributor : Nicolas Dugue Connect in order to contact the contributor
Submitted on : Thursday, April 13, 2017 - 7:12:49 PM
Last modification on : Wednesday, November 3, 2021 - 4:50:26 AM



Hazem Al Saied, Nicolas Dugué, Jean-Charles Lamirel. Automatic summarization of scientific publications using a feature selection approach. International Journal on Digital Libraries, Springer Verlag, 2018, 19 (2-3), pp 203-215. ⟨10.1007/s00799-017-0214-x⟩. ⟨hal-01508130⟩



Record views