J. Attenberg, S. Pandey, and T. Suel, Modeling and predicting user behavior in sponsored search, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, pp.1067-1076, 2009.
DOI : 10.1145/1557019.1557135

V. Beaudouin and J. Denis, Observer et évaluer les usages de Gallica Réflexion épistémologique et stratégique URL https, 2014.

M. David, . Blei, Y. Andrew, . Ng, I. Michael et al., Latent dirichlet allocation, Journal of machine Learning research, vol.3, pp.993-1022, 2003.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information. arXiv preprint, 2016.

P. Kenneth, . Burnham, R. David, and . Anderson, Multimodel inference understanding aic and bic in model selection. Sociological methods & research, pp.261-304, 2004.

D. Ceccarelli, S. Gordea, C. Lucchese, F. M. Nardini, and G. Tolomei, Improving Europeana Search Experience Using Query Logs, International Conference on Theory and Practice of Digital Libraries, pp.384-395, 2011.
DOI : 10.1002/(SICI)1097-4571(199708)48:8<741::AID-ASI7>3.0.CO;2-S

E. Charniak, Introduction to artificial intelligence. Pearson Education India, 1985.

R. Cooley, B. Mobasher, and J. Srivastava, Data Preparation for Mining World Wide Web Browsing Patterns, Knowledge and Information Systems, vol.27, issue.6, pp.5-32, 1999.
DOI : 10.1002/9780470316801

R. Das and I. Turkoglu, Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method, Expert Systems with Applications, vol.36, issue.3, pp.6635-6644, 2009.
DOI : 10.1016/j.eswa.2008.08.067

F. Robert, . Dell, E. Pablo, . Román, D. Juan et al., Web user session reconstruction using integer programming, Proceedings of theACM International Conference on Web Intelligence and Intelligent Agent Technology, pp.385-388, 2008.

M. Gul-nildem-demir, A. Goksedef, and . Sima-etaner-uyar, Effects of session representation models on the performance of web recommender systems, Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop, ICDEW '07, pp.931-936, 2007.

P. Arthur, . Dempster, M. Nan, . Laird, B. Donald et al., Maximum likelihood from incomplete data via the em algorithm, Journal of the royal statistical society. Series B (methodological), pp.1-38, 1977.

E. Ferrara, P. De-meo, G. Fiumara, and R. Baumgartner, Web data extraction, applications and techniques : a survey. Knowledge-based systems, pp.301-323, 2014.
DOI : 10.1016/j.knosys.2014.07.007

URL : http://arxiv.org/pdf/1207.0246

G. Conseil, Evaluation de l'usage et de la satisfaction de la bibliothèque numérique Gallica et perspectives d'évolution, 2012.

G. Godet, Guide d'interopérabilité OAI-PMH pour un référencement des documents numériques dans Gallica, 2015.

Y. Goldberg and O. Levy, word2vec explained : deriving mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv :1402, 2014.

. Google, How a session is defined in analytics -analytics help, 2016. URL https

?. Gündüz and M. Özsu, A web page prediction model based on click-stream tree representation of user behavior, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.535-540, 2003.

G. Heinrich, Parameter estimation for text analysis, 2004.

. Incapsula, URL https, 2003.

K. Anil, . Jain, . Murty, J. Patrick, and . Flynn, Data clustering : a review, ACM computing surveys (CSUR), vol.31, issue.3, pp.264-323, 1999.

D. Kelly and J. Teevan, Implicit feedback for inferring user preference, ACM SIGIR Forum, vol.37, issue.2, pp.18-28, 2003.
DOI : 10.1145/959258.959260

J. A. Kunze, Towards electronic persistence using ark identifiers, ark motivation and overview, 2003.

V. Quoc, T. Le, and . Mikolov, Distributed representations of sentences and documents, ICML, pp.1188-1196, 2014.

T. Mikolov, I. Sutskever, K. Chen, S. Greg, J. Corrado et al., Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems, pp.3111-3119, 2013.

T. Minka and J. Lafferty, Expectation-propagation for the generative aspect model, Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, pp.352-359, 2002.

B. Mobasher, H. Dai, T. Luo, and M. Nakagawa, Effective personalization based on association rule discovery from web usage data, Proceeding of the third international workshop on Web information and data management , WIDM '01, pp.9-15, 2001.
DOI : 10.1145/502932.502935

B. Mobasher, H. Dai, T. Luo, and M. Nakagawa, Discovery and evaluation of aggregate usage profiles for web personalization, Data Mining and Knowledge Discovery, vol.6, issue.1, pp.61-82, 2002.
DOI : 10.1023/A:1013232803866

E. L. Morgan, An Introduction to the Search/Retrieve URL Service (SRU), 2004.

F. Qiu and Y. Cui, An analysis of user behavior in online video streaming Mining and Retrieval, VLS- MCMR '10, Proceedings of the International Workshop on Very-large-scale Multimedia Corpus, pp.49-54, 2010.

R. Lawrence and . Rabiner, Readings in speech recognition. chapter A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, pp.267-296, 1990.

N. Sadagopan and J. Li, Characterizing typical and atypical user sessions in clickstreams, Proceeding of the 17th international conference on World Wide Web , WWW '08, pp.885-894, 2008.
DOI : 10.1145/1367497.1367617

S. C. Sérgio, R. M. Silva, R. C. Silva, R. M. Pinto, and . Salles, Botnets : A survey, Comput. Netw, vol.57, issue.2, pp.378-403, 2013.

M. Steyvers and T. Griffiths, Probabilistic topic models. Handbook of latent semantic analysis, pp.424-440

P. Tan and V. Kumar, Discovery of Web Robot Sessions Based on Their Navigational Patterns, Data Mining and Knowledge Discovery, vol.6, issue.1, pp.9-35, 2002.
DOI : 10.1007/978-3-662-07952-2_9

C. Le-cluster-représenté-en-figure, 1 regroupe le plus d'usagers. Il se caractérise par des sessions démarrant généralement par la consultation de documents et faisant intervenir de courtes phases de recherche. Les clusters suivants (figures C.2