A. Table, 1.: Typed positions evaluation on boundaries, for types with at least 100 occurrences, with our basic system (continued) Accuracy type # occurrences 0, p.3742

A. Table, Typed positions evaluation on boundaries, for types with at least 100 occurrences, with our basic system (continued) Accuracy type # occurrences 0, p.122

A. Table, Typed positions evaluation on boundaries, for types with at least 100 occurrences, with our basic system (continued) Accuracy type # occurrences 0, p.257

A. Table, Typed positions evaluation on boundaries, for types with at least 100 occurrences, basic system+MDL (continued) Accuracy type # occurrences 0, p.241

A. Abeillé, L. Clément, and F. Toussenel, Building a Treebank for French, Treebanks, pp.165-187, 2003.
DOI : 10.1007/978-94-010-0201-1_10

G. F. Arcodia and B. Basciano, On the Productivity of the Chinese Suffixes -?R,-?HUà AND -?TOU, Taiwan Journal of Linguistics, vol.10, pp.89-118, 2012.

H. Baayen, Quantitative aspects of morphological productivity, Yearbook of morphology 1991, pp.109-149, 1992.
DOI : 10.1007/978-94-011-2516-1_8

R. H. Baayen, Word frequency distributions, 2001.
DOI : 10.1007/978-94-010-0844-0

N. Bernstein-ratner, The phonology of parent?child speech. Children's language, pp.159-174, 1987.

M. R. Brent, Speech segmentation and word discovery: a computational perspective, Trends in Cognitive Sciences, vol.3, issue.8, pp.294-301, 1999.
DOI : 10.1016/S1364-6613(99)01350-9

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.2020

J. Bresnan, The mental representation of grammatical relations, 1982.

J. Bresnan, Is syntactic knowledge probabilistic? Experiments with the English dative alternation. Roots: Linguistics in search of its evidential base, pp.75-96, 2007.

J. Bresnan, A. Cueni, T. Nikitina, and R. H. Baayen, Predicting the dative alternation. Cognitive foundations of interpretation, pp.69-94, 2007.

E. Brill, Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging, Computational linguistics, vol.21, issue.4, pp.543-565, 1995.

Y. R. Chao, A grammar of spoken Chinese, 1968.

Y. R. Chao and L. S. Yang, Concise Dictionary of Spoken Chinese, 1962.

K. Chen, C. Huang, L. Chang, and H. Hsu, Sinica corpus: Design methodology for balanced corpora, Language, vol.167, p.176, 1996.

P. Chen, Modern Chinese: history and sociolinguistics, 1999.
DOI : 10.1017/CBO9781139164375

N. Chomsky, REMARKS ON NOMINALIZATION, p.314, 1970.
DOI : 10.1515/9783110814231.11

C. Christodoulopoulos, S. Goldwater, and M. Steedman, Two Decades of Unsupervised POS induction: How far have we come?, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp.575-584, 2010.

G. Chrupa?a, Hierarchical clustering of word class distributions, Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure, pp.100-104, 2012.

K. Church, How many multiword expressions do people know?, ACM Transactions on Speech and Language Processing, vol.10, issue.2, pp.4-12, 2013.
DOI : 10.1145/2483691.2483693

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.207.5328

B. Courtois, M. Garrigues, G. Gross, M. Gross, R. Jung et al., Dictionnaire électronique des noms composés DELAC: les composants NA et NN, p.55, 1997.

J. X. Dai, Chinese morphology and its interface with the syntax, 1992.

J. Defrancis, The Chinese language: Fact and fantasy, 1984.

Y. Desalle, S. Hsieh, B. Gaume, and H. Cheung, Towards an automatic measurement of verbal lexicon acquisition: the case for a young children-vs-adults categorization in French and Mandarin, Proceedings of 24th Pacific Asia Conference on Language, Information and Computation: Workshop on Model and Measurement of Meaning (M3), 2010.

Z. Dong, Q. Dong, and C. Hao, Word Segmentation Needs Change?From a Linguist's View, Proceedings of the First CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp.1-7, 2010.

J. Drillon, Traité de la ponctuation française, 1991.

S. Duanmu, Wordhood in Chinese, pp.135-196, 1998.
DOI : 10.1017/S095267570000097X

T. Emerson, The second international Chinese word segmentation bakeoff, Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, 2005.

J. Fan, Xing-ming zuhe jian 'de' zi de yufa zuoyong [The grammatical function of de in adjective-noun constructions, 1958.

H. Feng, K. Chen, X. Deng, and W. Zheng, Accessor Variety Criteria for Chinese Word Extraction, Computational Linguistics, vol.22, issue.3, pp.75-93, 2004.
DOI : 10.1162/089120101300346787

URL : http://doi.org/10.1162/089120104773633394

A. Fourtassi, B. Börschinger, M. Johnson, and E. Dupoux, Why is English so easy to segment?, Proceedings of the Fourth Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL), pp.1-10, 2013.

M. C. Frank, S. Goldwater, T. L. Griffiths, and J. B. Tenenbaum, Modeling human performance in statistical word segmentation, Cognition, vol.117, issue.2, pp.107-125, 2010.
DOI : 10.1016/j.cognition.2010.07.005

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.124.6889

J. Gao, M. Li, A. Wu, and C. Huang, Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach, Computational Linguistics, vol.8, issue.1, pp.31531-574, 2005.
DOI : 10.1002/(SICI)1097-4571(199310)44:9<532::AID-ASI3>3.0.CO;2-M

S. Goldwater, Nonparametric Bayesian Models of Lexical Acquisition, 2006.

S. Goldwater, L. Thomas, M. Griffiths, and . Johnson, A Bayesian framework for word segmentation: Exploring the effects of context, Cognition, vol.112, issue.1, pp.21-54, 2009.
DOI : 10.1016/j.cognition.2009.03.008

S. Green, M. De-marneffe, J. Bauer, and C. D. Manning, Multiword expression identification with tree substitution grammars: A parsing tour de force with French, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.725-735, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01111383

G. Gross, Les expressions figées en français: noms composés et autres locutions, 1996.

M. Gross, Méthodes en syntaxe. Hermann, 1975.

M. Gross, Lexicon-grammar, Proceedings of the 11th coference on Computational linguistics -, pp.1-6, 1986.
DOI : 10.3115/991365.991367

URL : https://hal.archives-ouvertes.fr/hal-00621600

R. Harris, Rethinking writing, 2005.

Z. S. Harris, From Phoneme to Morpheme, Language, vol.31, issue.2, pp.190-222, 1955.
DOI : 10.2307/411036

Z. S. Harris, Morpheme Boundaries within Words: Report on a Computer Test, 1967.
DOI : 10.1007/978-94-017-6059-1_3

D. Hewlett and P. Cohen, Fully unsupervised word segmentation with BVE and MDL, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, pp.540-545, 2011.

D. Hewlett and P. R. Cohen, Bootstrap Voting Experts, IJCAI, pp.1071-1076, 2009.

C. Hsu, Lexical Gaps and Lexicalization: Implications for Word Segmentation Systems for Chinese NLP, Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation, 2012.

C. Huang and H. Zhao, ????????(Chinese word segmentation: A decade review), Journal of Chinese Information Processing, vol.21, issue.3, pp.8-20, 2007.

C. Huang, K. Chen, C. , and L. , Segmentation standard for Chinese natural language processing, Proceedings of the 16th conference on Computational linguistics -, pp.1045-1048, 1996.
DOI : 10.3115/993268.993362

URL : http://acl.ldc.upenn.edu/C/C96/C96-2184.pdf

C. J. Huang, Phrase structure, lexical integrity, and Chinese compounds, Journal of the Chinese Language Teachers Association, vol.19, issue.2, pp.53-78, 1984.

U. Iûnn, New Manifestation of the Taiwanese vernacular literature ? Introduction to Digital Archive for Written Taiwanese, National Museum of Taiwanese Literature. NMTL, 2007.

U. Iûnn, J. Tai, K. Lau, K. Chen, and C. Kao, Modeling Taiwanese POS tagging Using Statistical Methods and Mandarin Training Data, International Journal of Computational Linguistics and Chinese Language Processing, vol.14, issue.3, pp.237-256, 2009.

R. Jackendoff, The architecture of the language faculty, 1997.

O. Jespersen, The Philosophy of Grammar, 1924.

G. Jin and X. Chen, The fourth international Chinese language processing bakeoff: Chinese word segmentation, named entity recognition and Chinese pos tagging, Sixth SIGHAN Workshop on Chinese Language Processing, p.69, 2008.

Z. Jin, A Study On Unsupervised Segmentation Of Text Using Contextual Complexity, 2007.

Z. Jin and K. Tanaka-ishii, Unsupervised segmentation of Chinese text by use of branching entropy, Proceedings of the COLING/ACL on Main conference poster sessions -, pp.428-435, 2006.
DOI : 10.3115/1273073.1273129

M. Johnson and K. Demuth, Unsupervised phonemic Chinese word segmentation using Adaptor Grammars, Proceedings of the 23rd International Conference on Computational Linguistics Coling 2010 Organizing Committee, pp.528-536, 2010.
DOI : 10.3115/1626324.1626328

M. Johnson and S. Goldwater, Improving nonparameteric Bayesian inference, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics on, NAACL '09, pp.317-325, 2009.
DOI : 10.3115/1620754.1620800

M. Johnson, T. L. Griffiths, and S. Goldwater, Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models, Advances in neural information processing systems 19, 2007.

S. Kahane, Les unit??s minimales de la syntaxe et de la s??mantique : le cas du fran??ais, Congr??s Mondial de Linguistique Fran??aise 2008, 2008.
DOI : 10.1051/cmlf08106

R. M. Kaplan and J. Bresnan, Lexical-functional grammar: A formal system for grammatical representation. Formal Issues in Lexical-Functional Grammar, pp.29-130, 1982.

E. Kaske, The Politics of Language in Chinese Education, Sinica Leidensia. Barend J. ter Haar, vol.82, pp.1895-1919, 2008.

A. Kempe, Experiments in unsupervised entropy-based corpus segmentation, Workshop of EACL in Computational Natural Language Learning, pp.7-13, 1999.

P. Kratochvíl, Modern standard Chinese, Lingua, vol.17, issue.1-2, 1967.
DOI : 10.1016/0024-3841(66)90007-6

G. Levow, The third international Chinese language processing bakeoff: Word segmentation and named entity recognition, Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, 2006.

F. Liu, A clitic analysis of locative particles, Journal of Chinese Linguistics, vol.26, issue.1, pp.48-70, 1998.

F. Liu and C. Oakden, Disyllabic bound forms in Modern Mandarin Chinese: an analysis of yihou and yihou, Journal of Chinese Linguistics, 2013.

S. Lü, Hanyu yufa fenxi wenti [Issues in analysis of Chinese grammar, 1979.

Z. Lu, Hanyu de goucifa [Chinese morphology], 1964.

P. Magistry, Productivité morphologique : Étude sur le chinois mandarin, 2008.

P. Magistry, L. Prévot, H. Cheung, C. Shiao, Y. Desalle et al., Using Extra-Linguistic Material for Mandarin-French Verbal Constructions Comparison, Proceedings of the 23rd Pacific Asia Conference on Language, Information, and Computation (PACLIC), pp.335-344, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00992102

P. Magistry and B. Sagot, Can MDL Improve Unsupervised Chinese Word Segmentation?, Sixth International Joint Conference on Natural Language Processing: 7th SIGHan Workshop, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00876389

A. Martinet, Eléments de linguistique générale, 1960.

I. A. Mel-'?uk and A. Polguère, A formal lexicon in the Meaning-Text Theory: (or how to do lexica with words) Computational linguistics, pp.3-4261, 1987.

I. Mel-'?uk, Dependency syntax: theory and practice, 1988.

I. Mel-'?uk, Cours de morphologie générale: Significations morphologiques, 1994.

P. H. Miller, Clitics and constituents in phrase structure grammar, 1992.

D. Mochihashi, T. Yamada, and N. Ueda, Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1, ACL-IJCNLP '09, pp.100-108, 2009.
DOI : 10.3115/1687878.1687894

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.164.3945

E. Navarro, Métrologie des graphes de terrain, application à la construction de ressources lexicales et à la recherche d'information, 2013.

É. V. Nguyen, Unité lexicale et morphologie en chinois mandarin: vers l'élaboration d'un dictionnaire explicatif et combinatoire du chinois, 2006.

J. Norman, Chinese, 1988.

J. L. Packard, The morphology of Chinese, 2000.
DOI : 10.1017/CBO9780511486821

M. Paris, Some aspects of the syntax and semantics of the " lian...ye/dou " construction in Mandarin, pp.47-70, 1979.

G. Patin, Extraction interactive et non supervisée de lexique en chinois contemporain appliquée à la constitution de ressources linguistiques dans un domaine spécialisé, 2013.

L. Pearl, S. Goldwater, and M. Steyvers, How ideal are we? Incorporating human limitations into Bayesian models of word segmentation, Proceedings of the 34th annual Boston University Conference on Child Language Development, pp.315-326, 2010.

J. Rissanen, Modeling by shortest data description, Automatica, vol.14, issue.5, pp.465-471, 1978.
DOI : 10.1016/0005-1098(78)90005-5

J. R. Saffran, E. L. Newport, and R. N. Aslin, Word Segmentation: The Role of Distributional Cues, Journal of Memory and Language, vol.35, issue.4, pp.606-621, 1996.
DOI : 10.1006/jmla.1996.0032

I. A. Sag, T. Baldwin, F. Bond, A. Copestake, and D. Flickinger, Multiword Expressions: A Pain in the Neck for NLP, Computational Linguistics and Intelligent Text Processing, pp.1-15, 2002.
DOI : 10.1007/3-540-45715-1_1

L. Sagart, L'emploi des phonétiques dans l'écriture chinoise, Ecriture chinoise/Données, usages et représentations, Collection des Cahiers de Linguistique Asie Orientale, pp.35-53, 2006.

B. Sagot and P. Boullier, From raw corpus to word lattices: robust preparsing processing with SXPipe, Archives of Control Sciences, vol.15, issue.4, pp.653-662, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00521228

F. D. Saussure, C. Bally, A. Sechehaye, and A. Riedlinger, Cours de linguistique générale: publié par Charles Bally et Albert Sechehaye avec la collaboration de Albert Riedlinger, Libraire Payot & Cie, 1916.

J. Sinclair, Collins COBUILD, Collins Birgmingham University International Language Database: English language dictionary, 1987.

A. V. Sosa and J. Macfarlane, Evidence for frequency-based constituents in the mental lexicon: collocations involving the word, Brain and Language, vol.83, issue.2, pp.227-236, 2002.
DOI : 10.1016/S0093-934X(02)00032-9

R. Sproat and T. Emerson, The first international Chinese word segmentation Bakeoff, Proceedings of the second SIGHAN workshop on Chinese language processing -, pp.133-143, 2003.
DOI : 10.3115/1119250.1119269

URL : http://acl.ldc.upenn.edu/acl2003/sighan/pdf/Sproat.pdf

R. Sproat, W. Gale, C. Shih, C. , and N. , A stochastic finite-state wordsegmentation algorithm for Chinese, Computational linguistics, vol.22, issue.3, pp.377-404, 1996.

R. Sproat and C. Shih, A Statistical Method for Finding Word Boundaries in Chinese Text, Computer Processing of Chinese & Oriental Languages, vol.4, issue.4, 1990.

C. Y. Suen, Computational studies of the most frequent Chinese words and sounds, World Scientific Singapore, 1986.
DOI : 10.1142/0219

J. Sun and Y. Lepage, Can Word Segmentation be Considered Harmful for Statistical Machine Translation Tasks between Japanese and Chinese?, Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation, 2012.

K. Tanaka-ishii, Entropy as an Indicator of Context Boundaries: An Experiment Using a Web Search Engine, IJCNLP, pp.93-105, 2005.
DOI : 10.1007/11562214_9

L. Tesnière, Eléments de syntaxe structurale, 1959.

J. Thuilier, Contraintes préférentielles et ordre des mots en français, 2012.

T. 'sou, B. Lin, H. Liu, G. Chan, T. Hu et al., A synchronous Chinese language corpus from different speech communities: Construction and applications, Computational Linguistics and Chinese Language Processing, pp.91-104, 1997.

H. Wang, J. Zhu, S. Tang, F. , and X. , A New Unsupervised Approach to Word Segmentation, Computational Linguistics, vol.33, issue.1, pp.421-454, 2011.
DOI : 10.1214/aos/1176346060

A. Wu, Customizable segmentation of morphologically derived words in Chinese, International Journal of Computational Linguistics and Chinese Language Processing, vol.8, issue.1, pp.1-27, 2003.

F. Xia, The segmentation guidelines for the Penn Chinese Treebank (3.0), 2000.

H. Yu, S. Duan, B. Zhu, and X. Sun, The Basic Processing of Contemporary Chinese Corpus at Peking University SPECIFICATION, Journal of Chinese Information Processing, p.7, 2002.

Y. Zhang and S. Clark, Chinese segmentation with a word-based perceptron algorithm, ANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, p.840, 2007.

Y. Zhang and S. Clark, A fast decoder for joint word segmentation and POStagging using a single discriminative model, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp.843-852, 2010.

Y. Zhang and S. Clark, Syntactic Processing Using the Generalized Perceptron and Beam Search, Computational Linguistics, vol.8, issue.1, pp.105-151, 2011.
DOI : 10.1162/089120101750300526

H. Zhao and C. Kit, An empirical comparison of goodness measures for unsupervised Chinese word segmentation with a unified framework, The Third International Joint Conference on Natural Language Processing (IJCNLP2008), 2008.

H. Zhao and Q. Liu, The CIPS-SIGHAN CLP 2010 Chinese word segmentation bakeoff, Proceedings of the First CPS-SIGHAN Joint Conference on Chinese Language Processing, pp.199-209, 2010.

V. Zhikov, H. Takamura, and M. Okumura, An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp.832-842, 2010.
DOI : 10.1527/tjsai.28.347

G. K. Zipf, Human Behaviour and the Principle of Least-Effort, 1949.

A. M. Zwicky and G. K. Pullum, Cliticization vs. inflection: English n't. Language, pp.502-513, 1983.