R. Artstein and M. Poesio, Inter-Coder Agreement for Computational Linguistics, Computational Linguistics, vol.27, issue.1, pp.555-596, 2008.
DOI : 10.1037/0033-2909.103.3.374

URL : https://doi.org/10.1162/coli.07-034-r2

T. Baguley, Serious Stats: A guide to advanced statistics for the behavioral sciences, 2012.
DOI : 10.1007/978-0-230-36355-7

G. Bailly, T. Pietrzak, J. Deber, and D. J. Wigdor, M??tamorphe, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '13, pp.563-572, 2013.
DOI : 10.1145/2470654.2470734

A. Bousseau, T. Tsandilas, L. Oehlberg, and W. E. Mackay, How Novices Sketch and Prototype Hand-Fabricated Objects, Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI '16, 2016.
DOI : 10.1115/1.2712214

URL : https://hal.archives-ouvertes.fr/hal-01272187

, Article 18. Publication date Fallacies of Agreement: A Critical Review of Consensus Assessment Methods for, ACM Transactions on Computer-Human Interaction Gesture Elicitation, vol.25, issue.18, p.47, 2018.

L. Robert, D. J. Brennan, and . Prediger, Coefficient kappa: Some uses, misuses, and alternatives, pp.687-699, 1981.

J. Carpenter and J. Bithell, Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians, 9<1141::AID-SIM479>3.0.CO, pp.1141-11641097, 2000.
DOI : 10.1111/j.1467-842X.1993.tb01326.x

E. Chan, T. Seyed, W. Stuerzlinger, X. Yang, and F. Maurer, User Elicitation on Singlehand Microgestures, Conference on Human Factors in Computing Systems (CHI), 2016.
DOI : 10.1145/2858036.2858589

H. Chmura-kraemer, V. S. Periyakoil, and A. Noda, Kappa coefficients in medical research, Statistics in Medicine, vol.80, issue.14, pp.2109-2129, 2002.
DOI : 10.1080/01621459.1985.10477157

V. Domenic, A. R. Cicchetti, and . Feinstein, High agreement but low kappa: II. Resolving the paradoxes, Journal of Clinical Epidemiology, vol.43, issue.690, pp.551-5580895, 1990.

. G. William and . Cochran, The Comparison of Percentages in Matched Samples, Biometrika, vol.37, pp.3-4, 1950.

A. Cockburn, C. Gutwin, and S. Greenberg, A predictive model of menu performance, Proceedings of the SIGCHI conference on Human factors in computing systems , CHI '07, pp.627-636, 2007.
DOI : 10.1145/1240624.1240723

J. Cohen, A Coefficient of Agreement for Nominal Scales, Educational and Psychological Measurement, vol.20, issue.1, p.37, 1960.
DOI : 10.1037/h0044251

J. Culbertson, P. Smolensky, and G. Legendre, Learning biases predict a word order universal, Cognition, vol.122, issue.3, pp.306-329, 2012.
DOI : 10.1016/j.cognition.2011.10.017

A. Deep-soboslay, M. Akil, C. E. Martin, L. B. Bigelow, M. M. Herman et al., Reliability of psychiatric diagnosis in postmortem research, Biological Psychiatry, vol.57, issue.1, pp.96-101, 2005.
DOI : 10.1016/j.biopsych.2004.10.016

P. Dragicevic, Fair Statistical Communication in HCI, pp.291-330978, 2016.
DOI : 10.1007/978-3-319-26633-6_13

URL : https://hal.archives-ouvertes.fr/hal-01377894

B. Efron, Bootstrap Methods: Another Look at the Jackknife. The Annals of, Mathematical Statistics, vol.7, issue.1, pp.1-26, 1979.

D. Ellerman, http://www.ellerman.org/history-of-the- logical-entropy-formula, History of the Logical Entropy Formula. Online, 2010.

R. Alvan, D. V. Feinstein, . Cicchetti-leah, B. Findlater, J. Lee et al., High agreement but low Kappa: I. the problems of two paradoxes DOI:http://dx.doi.org/10 Beyond QWERTY: Augmenting Touch Screen Keyboards with Multitouch Gestures for Non-alphanumeric Input, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12, pp.543-5490895, 1016.

R. Fisher, Statistical methods for research workers, 1954.

L. Joseph and . Fleiss, Measuring nominal scale agreement among many raters, Psychological Bulletin, vol.76, issue.5, pp.378-382, 1971.

A. Garrett and K. Johnson, Phonetic bias in sound change In Origins of sound change: Approaches to phonologization, pp.51-97, 2012.

B. Gleeson, K. Maclean, A. Haddadi, E. Croft, and J. Alcazar, Gestures for industry Intuitive human-robot communication from human observation, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp.349-356, 2013.
DOI : 10.1109/HRI.2013.6483609

URL : http://www.cs.ubc.ca/labs/spin/publications/spin/Gleeson-HRI13-GestureAutomation-preprint.pdf

M. D. Good, J. A. Whiteside, D. R. Wixon, and S. J. Jones, Building a user-derived interface, Communications of the ACM, vol.27, issue.10, pp.1032-1043, 1984.
DOI : 10.1145/358274.358284

D. Grijincu, A. Miguel, P. O. Nacenta, and . Kristensson, User-defined Interface Gestures, Proceedings of the Ninth ACM International Conference on Interactive Tabletops and Surfaces, ITS '14, pp.25-34, 2014.
DOI : 10.1561/1100000012

K. Li and G. , Variance Estimation of Nominal-Scale Inter-Rater Reliability withÂ?Random Selection of Raters, Psychometrika, vol.73, issue.407, pp.11336-11343, 2008.

K. Li and G. , Handbook of Inter-Rater Reliability, 4th Edition: The Definitive Guide to Measuring The Extent of Agreement Among Raters. Advanced Analytics, LLC. https, 2014.

J. Hailpern, K. Karahalios, J. Halle, L. Dethorne, and M. Coletto, A3, ACM Transactions on Accessible Computing, vol.2, issue.2, 2009.
DOI : 10.1145/1530064.1530066

F. Andrew, K. Hayes, and . Krippendorff, Answering the call for a standard reliability measure for coding data, pp.77-89, 2007.

, Article 18. Publication date, ACM Transactions on Computer-Human Interaction, vol.25, issue.18, p.48, 2018.

T. Tsandilas,

T. Hesterberg, D. Moore, S. Monaghan, A. Clipson, and R. Epstein, Bootstrap methods and permutation tests, Introduction to the Practice of Statistics. W. H. Freeman and Company, 2005.

K. Hornbaek, S. S. Sander, J. A. Bargas-avila, and J. G. Simonsen, Is once enough?, Proceedings of the 32nd annual ACM conference on Human factors in computing systems, CHI '14, pp.3523-3532, 2014.
DOI : 10.1145/2556288.2557004

T. Hothorn, K. Hornik, M. A. Van-de-wiel, and A. Zeileis, Implementing a Class of Permutation Tests: The coin Package Switchboard SWBD-DAMSL shallow-discourse-function annotation coders manual, Journal of Statistical Software Institute of Cognitive Science Technical Report, vol.28, issue.8, pp.1-23, 1997.

M. Kaptein and J. Robertson, Rethinking statistical analysis methods for CHI, Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems, CHI '12, pp.1105-1114, 2012.
DOI : 10.1145/2207676.2208557

M. Kay, S. Haroz, S. Guha, and P. Dragicevic, Special Interest Group on Transparent Statistics in HCI, Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, CHI EA '16, 2016.
DOI : 10.1145/2207676.2208557

URL : https://hal.archives-ouvertes.fr/hal-01405018

V. Kostakos, The big hole in HCI research, interactions, vol.22, issue.2, pp.48-51, 2015.
DOI : 10.1007/s11192-011-0374-1

K. Krippendorff, Reliability in Content Analysis., Human Communication Research, vol.103, issue.3, pp.411-433, 2004.
DOI : 10.1086/266577

K. Krippendorff, Agreement and Information in the Reliability of Coding, Communication Methods and Measures, vol.34, issue.2, pp.93-112568376, 2011.
DOI : 10.1037/0033-2909.103.3.374

K. Krippendorff, Content analysis: An introduction to its methodology, p.Sage, 2013.

B. Lahey, A. Girouard, W. Burleson, and R. Vertegaal, PaperPhone, Proceedings of the 2011 annual conference on Human factors in computing systems, CHI '11, pp.1303-1312, 2011.
DOI : 10.1145/1978942.1979136

S. Lee, S. Kim, E. Jin, B. Choi, X. Kim et al., How users manipulate deformable displays as input devices, Proceedings of the 28th international conference on Human factors in computing systems, CHI '10, pp.1647-1656, 2010.
DOI : 10.1145/1753326.1753572

M. Kathleen, E. Macqueen, K. Mclellan, B. Kay, and . Milstein, Codebook development for team-based qualitative analysis, Cultural anthropology methods, vol.10, issue.2, pp.31-36, 1998.

B. Mandelbrot, Information Theory and Psycholinguistics: A Theory of Word Frequencies, 1967.

E. M. Markman, The whole-object, taxonomic, and mutual exclusivity assumptions as initial constraints on word meanings, 1991.
DOI : 10.1017/CBO9780511983689.004

M. Micire, M. Desai, A. Courtemanche, M. Katherine, . Tsui et al., Analysis of natural gestures for controlling robot teams on multi-touch tabletop surfaces, Proceedings of the ACM International Conference on Interactive Tabletops and Surfaces, ITS '09, pp.41-48, 2009.
DOI : 10.1145/1731903.1731912

M. Meredith-ringel, Web on the Wall: Insights from a Multimodal Interaction Elicitation Study, Proceedings of the 2012 ACM International Conference on Interactive Tabletops and Surfaces (ITS '12, pp.95-104, 2012.

M. Morris, A. Danielescu, S. Drucker, D. Fisher, and J. O. Wobbrock, Bongshin Lee, m. c. schraefel, Reducing Legacy Bias in Gesture Elicitation Studies, pp.40-45, 2014.

M. Morris, J. O. Wobbrock, and A. D. Wilson, Understanding Users' Preferences for Surface Gestures, Proceedings of Graphics Interface 2010 (GI '10). Canadian Information Processing Society, pp.261-268, 2010.

E. J. Mark and . Newman, Power laws, Pareto distributions and Zipfâ??s law, Contemporary Physics, vol.46, issue.5, pp.323-351, 2005.

M. Nielsen, M. Störring, T. B. Moeslund, and E. Granum, A Procedure for Developing Intuitive and Ergonomic Gesture Interfaces for HCI, pp.409-420, 2004.
DOI : 10.1007/978-3-540-24598-8_38

L. O. Dianne, A. J. Connell, and . Dobson, General Observer-Agreement Measures on Individual Subjects and Groups of Subjects, Biometrics, vol.40, issue.4, pp.973-983, 1984.

U. Oh and L. Findlater, The challenges and potential of end-user gesture customization, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '13, pp.1129-1138, 2013.
DOI : 10.1145/2470654.2466145

J. O. Kimberly, K. F. Malley, M. D. Cook, K. R. Price, . Wildes et al.,

, Article 18. Publication date Fallacies of Agreement: A Critical Review of Consensus Assessment Methods for, ACM Transactions on Computer-Human Interaction Gesture Elicitation, vol.25, issue.18, p.49, 2018.

, Measuring Diagnoses: ICD Code Accuracy, pp.5-7, 2005.

T. Steven and . Piantadosi, Zipf's word frequency law in natural language: A critical review and future directions, Psychonomic Bulletin & Review, vol.213758, issue.5, pp.1112-1130, 2014.

T. Piumsomboon, A. Clark, M. Billinghurst, and A. Cockburn, User-Defined Gestures for Augmented Reality In INTERACT 2013: 14th IFIP TC13 Conference on Human-Computer Interaction, pp.282-299978, 2013.

K. L. Posner, P. D. Sampson, R. A. Caplan, R. J. Ward, and F. W. Cheney, Measuring interrater reliability among multiple raters: An example of methods for nominal data, Statistics in Medicine, vol.4, issue.9, pp.10-1103, 1990.
DOI : 10.1002/sim.4780090917

M. H. Quenouille, Problems in Plane Sampling, The Annals of Mathematical Statistics, vol.20, issue.3, pp.355-375, 1949.
DOI : 10.1214/aoms/1177729989

. Doi,

J. Rico and S. Brewster, Usable gestures for mobile interfaces, Proceedings of the 28th international conference on Human factors in computing systems, CHI '10, pp.887-896, 2010.
DOI : 10.1145/1753326.1753458

J. Ruiz and D. Vogel, Soft-Constraints to Reduce Legacy and Performance Bias to Elicit Whole-body Gestures with Low Arm Fatigue, Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI '15, pp.3347-3350, 2015.
DOI : 10.1145/1518701.1518866

A. William and . Scott, Reliability of content analysis: The case of nominal scale coding, Public opinion quarterly, 1955.

H. Edward and . Simpson, Measurement of Diversity, Nature, vol.163, issue.688, pp.1950-02238, 1949.

L. Robert, J. L. Spitzer, and . Fleiss, A Re-analysis of the Reliability of Psychiatric Diagnosis, The British Journal of Psychiatry, vol.125, issue.587, pp.341-347, 1974.

G. M. Troiano, E. W. Pedersen, and K. Hornbaek, User-defined gestures for elastic, deformable displays, Proceedings of the 2014 International Working Conference on Advanced Visual Interfaces, AVI '14, pp.1-8, 2014.
DOI : 10.1145/2598153.2598184

T. Tsandilas and P. Dragicevic, Accounting for Chance Agreement in Gesture Elicitation Studies, Research Report LRI -CNRS, vol.1584, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01267288

J. S. Uebersax, A design-independent method for measuring the reliability of psychiatric diagnosis, Journal of Psychiatric Research, vol.17, issue.4, pp.335-3420022, 1982.
DOI : 10.1016/0022-3956(82)90039-5

J. S. Uebersax, Statistical Methods for Diagnostic Agreement Accessed, pp.2017-2025, 2015.

S. Vanbelle and A. Albert, Agreement between Two Independent Groups of Raters, Psychometrika, vol.1, issue.3, pp.477-491, 2009.
DOI : 10.1111/1467-985X.00213

R. Vatavu and J. O. Wobbrock, Formalizing Agreement Analysis for Elicitation Studies, Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI '15, pp.1325-1334, 2015.
DOI : 10.1145/1518701.1518866

R. Vatavu and J. O. Wobbrock, Between-Subjects Elicitation Studies, Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI '16, pp.3390-3402, 2016.
DOI : 10.1109/ICDH.2012.23

J. Wagner, S. Huot, and W. Mackay, BiTouch and BiPad, Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems, CHI '12, pp.2317-2326, 2012.
DOI : 10.1145/2207676.2208391

URL : https://hal.archives-ouvertes.fr/hal-00663972

M. Weigel, V. Mehta, and J. Steimle, More than touch, Proceedings of the 32nd annual ACM conference on Human factors in computing systems, CHI '14, pp.179-188, 2014.
DOI : 10.1145/2556288.2557239

M. Wilson, W. Mackay, E. Chi, M. Bernstein, and J. Nichols, RepliCHI SIG, Proceedings of the 2012 ACM annual conference extended abstracts on Human Factors in Computing Systems Extended Abstracts, CHI EA '12, 2012.
DOI : 10.1145/2212776.2212419

J. O. Wobbrock, H. Htet-aung, B. Rothrock, and B. A. Myers, Maximizing the guessability of symbolic input, CHI '05 extended abstracts on Human factors in computing systems , CHI '05, pp.1869-1872, 2005.
DOI : 10.1145/1056808.1057043

J. O. Wobbrock, M. R. Morris, and A. D. Wilson, User-defined gestures for surface computing, Proceedings of the 27th international conference on Human factors in computing systems, CHI 09, pp.1083-1092, 2009.
DOI : 10.1145/1518701.1518866

M. Wood, Bootstrapped Confidence Intervals as an Approach to Statistical Inference, Organizational Research Methods, vol.8, issue.4, pp.454-470, 2005.
DOI : 10.1007/978-0-230-80278-0

K. George and . Zipf, Human Behaviour and the Principle of Least Effort, 1949.