B. Alex, M. Nissim, and C. Grover, The Impact of Annotation on the Performance of Protein Tagging in Biomedical Text, Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC), pp.595-600, 2006.

B. Alex, C. Grover, R. Shen, and M. Kabadjov, Agile corpus annotation in practice: An overview of manual and automatic annotation of cvs, Proceedings of the Fourth Linguistic Annotation Workshop (LAW), pp.29-37, 2010.

E. Alphonse, S. Aubin, G. Philippebessì-eres, T. Bisson, S. Hamon et al., Event-based Information Extraction for the Biomedical the CADERIGE Project, Proceedings of the JNLPBA COLING 2004 Workshop, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00098040

R. Artstein and M. Poesio, Inter-Coder Agreement for Computational Linguistics, Computational Linguistics, vol.27, issue.1, pp.555-596, 2008.
DOI : 10.1037/0033-2909.103.3.374
URL : http://doi.org/10.1162/coli.07-034-r2

E. M. Bennett, R. Alpert, and A. C. Goldstein, Communications Through Limited Response Questioning, Proceedings of the InterSpeech, pp.303-308, 1954.
DOI : 10.1086/266520
URL : http://poq.oxfordjournals.org/cgi/content/short/18/3/303

J. Carletta, Assessing Agreement on Classification Tasks: the Kappa Statistic, Computational Linguistics, vol.22, pp.249-254, 1996.

J. Cohen, A Coefficient of Agreement for Nominal Scales, Educational and Psychological Measurement, vol.20, issue.1, pp.37-46, 1960.
DOI : 10.1177/001316446002000104

J. Cohen, Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit., Psychological Bulletin, vol.70, issue.4, pp.213-220, 1968.
DOI : 10.1037/h0026256

B. D. , E. , and M. Glass, The Kappa Statistic: a Second Look, Computational Linguistics, vol.30, issue.1, pp.95-101, 2004.

R. H. Finn, A Note on Estimating the Reliability of Categorical Data, Educational and Psychological Measurement, vol.11, issue.1, pp.71-76, 1970.
DOI : 10.1177/001316447003000106

O. Galibert, L. Quintard, S. Rosset, P. Zweigenbaum, C. Nédellec et al., Named and Specific Entity Detection in Varied Data: the Quaero Named Entity Baseline Evaluation, Proceedings of the Seventh International Conference on Language Resources and Evaluation, 2010.

G. Hripcsak and D. F. Heitjan, Measuring agreement in medical informatics reliability studies, Journal of Biomedical Informatics, vol.35, issue.2, pp.99-110, 2002.
DOI : 10.1016/S1532-0464(02)00500-2

G. Hripcsak, S. Adam, and . Rothschild, Agreement, the F-Measure, and Reliability in Information Retrieval, Journal of the American Medical Informatics Association, vol.12, issue.3, p.2968, 2005.
DOI : 10.1197/jamia.M1733

K. Krippendorff, Content Analysis: An Introduction to Its Methodology, chapter 12, Sage, 1980.

K. Krippendorff, Content Analysis: An Introduction to Its Methodology, second edition, chapter 11, 2004.

M. Laignelet and F. Rioult, Repérer automatiquement les segments obsolescentsàobsolescents`obsolescentsà l'aide d'indices sémantiques et discursifs, Proceedings of the Traitement Automatique des Langues Naturelles, 2009.

J. Makhoul, F. Kubala, R. Schwartz, and R. Weischedel, Performance measures for information extraction, Proceedings of DARPA Broadcast News Workshop, pp.249-252, 1999.

R. Passonneau, Measuring Agreement on Set- Valued Items (MASI) for Semantic and Pragmatic Annotation, Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006.

D. Reidsma and J. Carletta, Reliability Measurement without Limits, Computational Linguistics, vol.41, issue.3, pp.319-326, 2008.
DOI : 10.1162/089120104773633402
URL : http://doi.org/10.1162/coli.2008.34.3.319

A. William and . Scott, Reliability of Content Analysis: The Case of Nominal Scale Coding, Public Opinion Quaterly, vol.19, issue.3, pp.321-325, 1955.

S. Siegel and N. Castellan, Nonparametric Statistics for the Behavioral Sciences, 1988.