Graded lexicons: new resources for educational purposes and much more

Abstract : Computational tools and resources can play an important role for vocabulary acquisition. A distinction between explicit and implicit learning is referred to in the literature depending on the users' attention paid to the words (Ma & Kelly, 2006): exercices specifically focused on vocabulary or activities where lexical acquisition rather occurs as a side-effect by being repeatedly exposed to words, like in reading. Fostered by the extensive use of mobile devices, recent iCALL applications and platforms propose a large variety of learning games that offer challenging possibilities (see Cornillie et al., 2012) compared to more traditional exercises which emphasize repetition. Such educational tools are built on modern pedagogical criteria, offering among other things hyperlinks to electronic dictionaries or concordancers. The information a student can find in such resources is related to word forms (morphology), word meanings (semantics, usage) and word patterns (syntax, collocations). Going further, these electronic resources may even offer information concerning the origins of the word, particular usage (constructions), typically related words (semantically or thematically, word-families), etc. all this in multimedia form. However, very few resources provide information about the complexity of a word, either for learning or for comprehension (showing for example that 'monster' is a simpler term than its hyponyms 'phoenix' or 'behemoth', or that 'to walk' is easier than its synonyms 'to stroll' or 'to ramble'). Yet, the idea of using frequency counts as a proxy for word difficulty is not new: frequency word lists were built in the past (for instance, "The teacher's word book" for English (Thorndike, 1921), or the "Dictionnaire fondamental de la langue française" for French (Gougenheim, 1958)). However, the notion of 'graded lexicon' is still not widely disseminated, although a few resources exist whith words classified accross difficulty levels, such as the English Profile Wordlists (Capel, 2010), Manulex (Lété et al., 2004), or FLELex (François et al., 2014). In this presentation, we will shed light on this issue and we will introduce a graded resource of French synonyms, ReSyf (Gala et al. 2013). Unlike the above mentioned lexicons, the methodology to build ReSyf is not only corpus-based, but it also includes a predictive model based on lexical and psycholinguistic features related to lexical complexity (Gala et al., 2014) which allows assigning a grade level to any word. While coming from the NLP community and aimed to text simplification, this graded lexicon can also help learners of French to acquire vocabulary, and to improve language acquisition by and large. On the one hand, the lexicon itself can be used for explicit learning of French vocabulary guided by the different grades of the synonyms of a word. On the other hand, it can also be used to carry out word substitution within an automatic text simplification system aiming at helping learners and children with reading impairments to get through a text, rediscovering the pleasure of reading (as they can better understand what they read), and thus entering a virtuous circle, whereby reading and decoding skills are trained through reading practice.
