G. Bailly, T. Pietrzak, J. Deber, D. J. Wigdor, and . Métamorphe, Augmenting hotkey usage with actuated keys, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '13, pp.563-572
URL : https://hal.archives-ouvertes.fr/hal-00822359

R. L. Brennan and D. J. Prediger, Coefficient Kappa: Some Uses, Misuses, and Alternatives, Educational and Psychological Measurement, vol.10, issue.3, pp.687-699, 1981.
DOI : 10.1177/001316448104100307

J. Cohen, A Coefficient of Agreement for Nominal Scales, Educational and Psychological Measurement, vol.20, issue.1, p.37, 1960.
DOI : 10.1177/001316446002000104

G. Cumming and S. Finch, Inference by Eye: Confidence Intervals and How to Read Pictures of Data., American Psychologist, vol.60, issue.2, 2005.
DOI : 10.1037/0003-066X.60.2.170

P. Dragicevic, HCI Statistics without p-values, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01162238

D. Ellerman, History of the logical entropy formula. Online, 2010.

J. Fleiss, Measuring nominal scale agreement among many raters., Psychological Bulletin, vol.76, issue.5, pp.378-382, 1971.
DOI : 10.1037/h0031619

D. Grijincu, M. A. Nacenta, and P. Kristensson, User-defined Interface Gestures, Proceedings of the Ninth ACM International Conference on Interactive Tabletops and Surfaces, ITS '14, pp.25-34, 2014.
DOI : 10.1145/2669485.2669511

K. Gwet, Handbook of Inter-Rater Reliability, 4th Edition: The Definitive Guide to Measuring The Extent of Agreement Among Raters

J. Hailpern, K. Karahalios, J. Halle, L. Dethorne, and M. Coletto, A3, ACM Transactions on Accessible Computing, vol.2, issue.2, 2009.
DOI : 10.1145/1530064.1530066

A. F. Hayes and K. Krippendorff, Answering the Call for a Standard Reliability Measure for Coding Data, Communication Methods and Measures, vol.12, issue.1, pp.77-89, 2007.
DOI : 10.1037/0033-2909.103.3.374

K. Krippendorff, Reliability in Content Analysis., Human Communication Research, vol.103, issue.3, pp.411-433, 2004.
DOI : 10.1086/266577

K. Krippendorff, Content analysis: An introduction to its methodology, 2013.

S. Lieberson, Measuring Population Diversity, American Sociological Review, vol.34, issue.6, pp.850-862, 1969.
DOI : 10.2307/2095977

M. R. Morris, Web on the wall, Proceedings of the 2012 ACM international conference on Interactive tabletops and surfaces, ITS '12, pp.95-104
DOI : 10.1145/2396636.2396651

O. Connell, D. L. Dobson, and A. J. , General Observer-Agreement Measures on Individual Subjects and Groups of Subjects, Biometrics, vol.40, issue.4, pp.973-983, 1984.
DOI : 10.2307/2531148

W. A. Scott, Reliability of Content Analysis: The Case of Nominal Scale Coding, Public Opinion Quarterly, vol.19, issue.3, 1955.
DOI : 10.1086/266577

R. Vatavu and J. Wobbrock, Formalizing Agreement Analysis for Elicitation Studies, Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI '15, pp.2015-1325
DOI : 10.1145/2702123.2702223

J. Wagner, S. Huot, and W. Mackay, BiTouch and BiPad, Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems, CHI '12, pp.2012-2317
DOI : 10.1145/2207676.2208391
URL : https://hal.archives-ouvertes.fr/hal-00663972

J. O. Wobbrock, H. H. Aung, B. Rothrock, and B. A. Myers, Maximizing the guessability of symbolic input, CHI '05 extended abstracts on Human factors in computing systems , CHI '05, pp.1869-1872, 2005.
DOI : 10.1145/1056808.1057043

J. O. Wobbrock, M. R. Morris, and A. D. Wilson, User-defined gestures for surface computing, Proceedings of the 27th international conference on Human factors in computing systems, CHI 09, pp.1083-1092, 2009.
DOI : 10.1145/1518701.1518866
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.4192

M. Wood, Bootstrapped Confidence Intervals as an Approach to Statistical Inference, Organizational Research Methods, vol.3, issue.2, pp.454-470, 2005.
DOI : 10.1177/1094428105280059