Using the Structure of Documents to Improve the Discovery of Unexpected Information
Résumé
In this paper we are interested in taking into account the structure of the documents during the discovery of unexpected information in textual databases. Following a work that aimed at designing and integrating, in the UnexpectedMiner system, some measures for the evaluation of the unexpectedness of documents, we wanted to improve the system by taking into account the structure of the documents processed. Each part of the documents are weighted by some coefficients whose values are determined by optimization techniques. Those coefficients are then used by the system in the unexpectedness measures to determine if a document contains some unexpected information or not. The efficiency of our new system is then evaluated and the experiments put forward the improvements induced by the use of the structure of the documents.