Automatic summarization of scientific publications using a feature selection approach

Abstract : Feature Maximization is a feature selection method that deals efficiently with textual data: to design systems that are altogether language-agnostic, parameter-free and do not require additional corpora to function. We propose to evaluate its use in text summarization, in particular in cases where documents are structured. We first experiment this approach in a single-document summarization context. We evaluate it on the DUC AQUAINT corpus and show that despite the unstructured nature of the corpus, our system is above the baseline and produces encouraging results. We also observe that the produced summaries seem robust to redundancy. Next, we evaluate our method in the more appropriate context of SciSumm challenge, which is dedicated to research publications summarization. These publications are structured in sections and our class-based approach is thus relevant. We more specifically focus on the task that aims to summarize papers using those that refer to them. We consider and evaluate several systems using our approach dealing with specific bag of words. Furthermore, in these systems, we also evaluate cosine and graph-based distance for sentence weighting and comparison. We show that our Feature Maximization based approach performs very well in the SciSumm 2016 context for the considered task, providing better results than the known results so far, and obtaining high recall. We thus demonstrate the flexibility and the relevance of Feature Maximization in this context.
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01508130
Contributeur : Nicolas Dugue <>
Soumis le : jeudi 13 avril 2017 - 19:12:49
Dernière modification le : jeudi 6 septembre 2018 - 01:13:26

Identifiants

Citation

Hazem Al Saied, Nicolas Dugué, Jean-Charles Lamirel. Automatic summarization of scientific publications using a feature selection approach. International Journal on Digital Libraries, Springer Verlag, 2018, 19 (2-3), pp 203-215. 〈https://link.springer.com/article/10.1007/s00799-017-0214-x〉. 〈10.1007/s00799-017-0214-x〉. 〈hal-01508130〉

Partager

Métriques

Consultations de la notice

334