Seek&Hide. Anonymising a French SMS corpus using natural language processing techniques.
Résumé
This article presents the system Seek&Hide, a text message processing tool developed for the sud4science LR (http://www.sud4science.org/) project. It performs the anonymisation/de-iden- ti cation of a corpus. At present, it has been used to anonymise the sud4science LR corpus of French text messages collected during the project. is is done in two phases. In the rst phase, it automatically processes over 70% of the corpus. e rest of the corpus is processed in the second phase, aided by an expert annotator via a web interface speci cally designed to simplify the task.