R. Artstein and M. Poesio, Inter-Coder Agreement for Computational Linguistics, Computational Linguistics, vol.34, pp.555-596, 2008.

T. Baguley, Serious Stats: A guide to advanced statistics for the behavioral sciences, 2012.

G. Bailly, T. Pietrzak, J. Deber, and D. J. Wigdor, Métamorphe: Augmenting Hotkey Usage with Actuated Keys, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13), pp.563-572, 2013.

A. Bousseau, T. Tsandilas, L. Oehlberg, and W. E. Mackay, How Novices Sketch and Prototype Hand-Fabricated Objects, Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01272187

, , pp.397-408

, Critical Review of Consensus Assessment Methods for Gesture Elicitation, vol.18, p.47

L. Robert, D. J. Brennan, and . Prediger, Coefficient kappa: Some uses, misuses, and alternatives, vol.41, pp.687-699, 1981.

J. Carpenter and J. Bithell, Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians, Statistics in Medicine, vol.19, pp.1141-1164, 2000.

E. Chan, T. Seyed, W. Stuerzlinger, X. Yang, and F. Maurer, User Elicitation on Singlehand Microgestures, Conference on Human Factors in Computing Systems (CHI), 2016.

V. S. Helena-chmura-kraemer, A. Periyakoil, and . Noda, Kappa coefficients in medical research, Statistics in Medicine, vol.21, pp.2109-2129, 2002.

D. V. Cicchetti and A. R. Feinstein, High agreement but low kappa: II. Resolving the paradoxes, Journal of Clinical Epidemiology, vol.43, pp.551-558, 1990.

.. G. William and . Cochran, The Comparison of Percentages in Matched Samples, Biometrika, vol.37, pp.256-266, 1950.

A. Cockburn, C. Gutwin, and S. Greenberg, A Predictive Model of Menu Performance, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '07), pp.627-636, 2007.

J. Cohen, A Coefficient of Agreement for Nominal Scales, Educational and Psychological Measurement, vol.20, p.37, 1960.

J. Culbertson, P. Smolensky, and G. Legendre, Learning biases predict a word order universal, Cognition, vol.122, pp.306-329, 2012.

A. Deep-soboslay, M. Akil, C. E. Martin, L. B. Bigelow, M. M. Herman et al., Reliability of psychiatric diagnosis in postmortem research, Biological Psychiatry, vol.57, pp.96-101, 2005.

P. Dragicevic, Fair Statistical Communication in HCI, pp.291-330, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01377894

. Bradley-efron, Bootstrap Methods: Another Look at the Jackknife, vol.7, pp.1-26, 1979.

D. Ellerman, History of the Logical Entropy Formula, Online, 2010.

R. Alvan, D. V. Feinstein, and . Cicchetti, Beyond QWERTY: Augmenting Touch Screen Keyboards with Multitouch Gestures for Non-alphanumeric Input, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12), vol.43, pp.2679-2682, 1990.

A. Ronald and . Fisher, Statistical methods for research workers, 1954.

J. L. Fleiss, Measuring nominal scale agreement among many raters, Psychological Bulletin, vol.76, pp.378-382, 1971.

A. Garrett and K. Johnson, Phonetic bias in sound change, Origins of sound change: Approaches to phonologization, pp.51-97, 2012.

B. Gleeson, K. Maclean, A. Haddadi, E. Croft, and J. Alcazar, Gestures for Industry: Intuitive Human-robot Communication from Human Observation, Proceedings of the 8th ACM/IEEE International Conference on Human-robot Interaction (HRI '13), pp.349-356, 2013.

M. D. Good, J. A. Whiteside, D. R. Wixon, and S. J. Jones, Building a User-derived Interface, Commun. ACM, vol.27, pp.1032-1043, 1984.

D. Grijincu, P. Nacenta, and . Ola-kristensson, User-defined interface gestures: dataset and analysis, Proceedings of the Ninth ACM International Conference on Interactive Tabletops and Surfaces, pp.25-34, 2014.

K. Li and G. , Variance Estimation of Nominal-Scale Inter-Rater Reliability withÂ?Random Selection of Raters, Psychometrika, vol.73, p.407, 2008.

K. Li and G. , Handbook of Inter-Rater Reliability, 4th Edition: The Definitive Guide to Measuring The Extent of Agreement Among Raters. Advanced Analytics, 2014.

J. Hailpern, K. Karahalios, J. Halle, L. Dethorne, and M. Coletto, A3: Hci coding guideline for research using video annotation to assess behavior of nonverbal subjects with computer-based intervention, ACM Transactions on Accessible Computing (TACCESS), vol.2, 2009.

F. Andrew, K. Hayes, and . Krippendorff, Answering the call for a standard reliability measure for coding data, Communication methods and measures, vol.1, pp.77-89, 2007.

, Article 18. Publication date, ACM Transactions on Computer-Human Interaction, vol.25, issue.3, p.48, 2018.

T. Tsandilas,

T. Hesterberg, D. Moore, S. Monaghan, A. Clipson, and R. Epstein, Bootstrap methods and permutation tests. In Introduction to the Practice of Statistics, 2005.

K. Hornbaek, S. S. Sander, J. A. Bargas-avila, and J. G. Simonsen, Is Once Enough?: On the Extent and Content of Replications in Human-computer Interaction, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '14), pp.3523-3532, 2014.

T. Hothorn, K. Hornik, M. A. Van-de-wiel, and A. Zeileis, Switchboard SWBD-DAMSL shallow-discourse-function annotation coders manual, Elizabeth Shriberg, and Debra Biasca, vol.28, pp.97-102, 1997.

M. Kaptein and J. Robertson, Rethinking Statistical Analysis Methods for CHI, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12), pp.1105-1114, 2012.

M. Kay, S. Haroz, S. Guha, and P. Dragicevic, Special Interest Group on Transparent Statistics in HCI, Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA '16), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01405018

, , pp.1081-1084

. Vassilis-kostakos, The big hole in HCI research, Interactions, vol.22, pp.48-51, 2015.

K. Krippendorff, Reliability in Content Analysis: Some Common Misconceptions and Recommendations, Human Communication Research, vol.30, pp.411-433, 2004.

K. Krippendorff, Agreement and Information in the Reliability of Coding, Communication Methods and Measures, vol.5, pp.93-112, 2011.

K. Krippendorff, Content analysis: An introduction to its methodology, 2013.

B. Lahey, A. Girouard, W. Burleson, and R. Vertegaal, PaperPhone: Understanding the Use of Bend Gestures in Mobile Devices with Flexible Electronic Paper Displays, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '11), pp.1303-1312, 2011.

S. Lee, S. Kim, B. Jin, E. Choi, B. Kim et al., How Users Manipulate Deformable Displays As Input Devices, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10), pp.1647-1656, 2010.

E. Kathleen-m-macqueen, K. Mclellan, B. Kay, and . Milstein, Codebook development for team-based qualitative analysis, Cultural anthropology methods, vol.10, pp.31-36, 1998.

. Benoit-mandelbrot, Information Theory and Psycholinguistics: A Theory of Word Frequencies, 1967.

E. M. Markman, The whole-object, taxonomic, and mutual exclusivity assumptions as initial constraints on word meanings, 1991.

M. Micire, M. Desai, A. Courtemanche, K. M. Tsui, and H. A. Yanco, Analysis of natural gestures for controlling robot teams on multi-touch tabletop surfaces, Proceedings of the ACM International Conference on Interactive Tabletops and Surfaces, pp.41-48, 2009.

M. Meredith-ringel, Web on the Wall: Insights from a Multimodal Interaction Elicitation Study, Proceedings of the 2012 ACM International Conference on Interactive Tabletops and Surfaces (ITS '12), pp.95-104, 2012.

A. Meredith-ringel-morris, S. Danielescu, D. Drucker, B. Fisher, J. O. Lee et al., Reducing Legacy Bias in Gesture Elicitation Studies, vol.21, pp.40-45, 2014.

M. Morris, J. O. Wobbrock, and A. D. Wilson, Understanding Users' Preferences for Surface Gestures, Proceedings of Graphics Interface 2010 (GI '10), pp.261-268, 2010.

E. J. Mark and . Newman, Power laws, Pareto distributions and Zipfâ??s law, Contemporary Physics, vol.46, pp.323-351, 2005.

M. Nielsen, M. Störring, T. B. Moeslund, and E. Granum, A Procedure for Developing Intuitive and Ergonomic Gesture Interfaces for HCI, pp.409-420, 2004.

D. L. O'connell and A. J. Dobson, General Observer-Agreement Measures on Individual Subjects and Groups of Subjects, Biometrics, vol.40, pp.973-983, 1984.

U. Oh and L. Findlater, The Challenges and Potential of End-user Gesture Customization, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13), pp.1129-1138, 2013.

K. J. O'malley, K. F. Cook, M. D. Price, K. R. Wildes, J. F. Hurdle et al.,

, Fallacies of Agreement: A, Critical Review of Consensus Assessment Methods for Gesture Elicitation, vol.25, issue.3, p.49, 2018.

, Measuring Diagnoses: ICD Code Accuracy, vol.40, pp.1620-1639, 2005.

T. Steven and . Piantadosi, Zipf's word frequency law in natural language: A critical review and future directions, Psychonomic Bulletin & Review, vol.21, pp.1112-1130, 2014.

T. Piumsomboon, A. Clark, M. Billinghurst, and A. Cockburn, User-Defined Gestures for Augmented Reality, INTERACT 2013: 14th IFIP TC13 Conference on Human-Computer Interaction, pp.282-299, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01501749

K. L. Posner, P. D. Sampson, R. A. Caplan, R. J. Ward, and F. W. Cheney, Measuring interrater reliability among multiple raters: an example of methods for nominal data, Stat Med, vol.11, pp.1103-1118, 1990.

H. Maurice and . Quenouille, Problems in Plane Sampling, The Annals of Mathematical Statistics, vol.20, issue.3, pp.355-375, 1949.

J. Rico and S. Brewster, Usable Gestures for Mobile Interfaces: Evaluating Social Acceptability, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10), pp.887-896, 2010.

J. Ruiz and D. Vogel, Soft-Constraints to Reduce Legacy and Performance Bias to Elicit Whole-body Gestures with Low Arm Fatigue, Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15), pp.3347-3350, 2015.

A. William and . Scott, Reliability of content analysis: The case of nominal scale coding, Public opinion quarterly, 1955.

H. Edward and . Simpson, Measurement of Diversity, Nature, vol.163, 1949.

L. Robert, J. L. Spitzer, and . Fleiss, A Re-analysis of the Reliability of Psychiatric Diagnosis, The British Journal of Psychiatry, vol.125, pp.341-347, 1974.

G. M. Troiano, E. Warming-pedersen, and K. Hornbaek, User-defined Gestures for Elastic, Deformable Displays, Proceedings of the 2014 International Working Conference on Advanced Visual Interfaces (AVI '14), 2014.

T. Tsandilas and P. Dragicevic, Accounting for Chance Agreement in Gesture Elicitation Studies, vol.5, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01267288

J. S. Uebersax, A design-independent method for measuring the reliability of psychiatric diagnosis, Journal of Psychiatric Research, vol.17, pp.335-342, 1982.

J. S. Uebersax, Statistical Methods for Diagnostic Agreement, pp.2017-2025, 2015.

S. Vanbelle and A. Albert, Agreement between Two Independent Groups of Raters, Psychometrika, vol.74, pp.477-491, 2009.

D. Radu, J. O. Vatavu, and . Wobbrock, Formalizing Agreement Analysis for Elicitation Studies: New Measures, Significance Test, and Toolkit, Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15), pp.1325-1334, 2015.

D. Radu, J. O. Vatavu, and . Wobbrock, Between-Subjects Elicitation Studies: Formalization and Tool Support, Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16), pp.3390-3402, 2016.

J. Wagner, S. Huot, and W. Mackay, BiTouch and BiPad: Designing Bimanual Interaction for Hand-held Tablets, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12), pp.2317-2326, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00663972

M. Weigel, V. Mehta, and J. Steimle, More Than Touch: Understanding How People Use Skin As an Input Surface for Mobile Computing, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '14), pp.179-188, 2014.

M. Wilson, W. Mackay, E. Chi, M. Bernstein, and J. Nichols, RepliCHI SIG: From a Panel to a New Submission Venue for Replication, CHI '12 Extended Abstracts on Human Factors in Computing Systems (CHI EA '12), 2012.

, , pp.1185-1188

J. O. Wobbrock, H. Htet-aung, B. Rothrock, and B. A. Myers, Maximizing the Guessability of Symbolic Input, CHI '05 Extended Abstracts on Human Factors in Computing Systems (CHI EA '05), pp.1869-1872, 2005.

J. O. Wobbrock, M. R. Morris, and A. D. Wilson, User-defined Gestures for Surface Computing, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '09), pp.1083-1092, 2009.

M. Wood, Bootstrapped confidence intervals as an approach to statistical inference, Organizational Research Methods, vol.8, pp.454-470, 2005.

G. K. Zipf, Human Behaviour and the Principle of Least Effort, 1949.