A. , G. Van-rijsbergen, and G. J. , Probabilistic models of information retrieval based on measuring the divergence from randomness, ACM Transactions on Information Systems, 2002.

K. Beyer, J. Goldstein, R. Ramakrishnan, S. Et, and U. , When Is ???Nearest Neighbor??? Meaningful?, Proceedings of the Int. Conf. on Database Theory, pp.217-235, 1999.
DOI : 10.1007/3-540-49257-7_15

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.1422

C. , H. Kolluru, B. Gotoh, Y. Et, R. et al., Maximum entropy segmentation of broadcast news, Proceedings of the 30th IEEE ICASSP, 2005.

C. , V. Et, L. , and S. , Topic Segmentation of TV-streams by mathematical morphology and vectorization, Proceedings of InterSpeech, pp.1105-1108, 2011.

C. , V. Tavenard, R. Et, A. , and L. , Vectorisation des processus d'appariement document-requête, 7e conférence en recherche d'informations et applications, CORIA'10, pp.313-324, 2010.

D. , M. Immorlica, N. Indyk, P. Et, M. et al., Locality-sensitive hashing scheme based on p-stable distributions, Proc. of the 20th ACM Symposium on Computational Geometry, 2004.

E. , A. R. Claveau, V. Et, S. , and P. , Using shallow linguistic features for relation extraction in bio-medical texts, Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles, pp.125-130, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00644070

G. , C. Et, F. , and D. , Actes de l'atelier Défi Fouille de Textes (DeFT'11), 2011.

G. , C. Gravier, G. Et, S. , and P. , Improving ASR-based topic segmentation of TV programs with confidence measures and semantic relations, Proc. Annual Intl. Speech Communication Association Conference (Interspeech), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00555804

H. , S. Gravier, G. Et, S. , and P. , Morpho-syntactic post-processing with n-best lists for improved French automatic speech recognition, Computer Speech and Language, vol.24, issue.4, pp.663-684, 2010.

K. , M. Klavans, J. L. Et, M. , and K. R. , Linear segmentation and segment significance, Proceedings of the 6th International Workshop of Very Large Corpora, 1998.

L. , H. Asmundsson, F. Jónsson, B. Et, A. et al., Nv-tree : An efficient disk-based index for approximate search in very large high-dimensional collections, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.99, issue.1, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00794359

M. , F. Et, S. , and P. , Contributions des techniques du traitement automatique des langues à la recherche d'information, 2005.

P. , L. Et, H. , and M. , A critique and improvement of an evaluation metric for text segmentation, 2002.

P. , J. M. Et, C. , and W. B. , A language modeling approach to information retrieval, Proc. of SIGIR, 1998.

S. , G. Wong, A. Et, Y. , and C. S. , A vector space model for automatic indexing, Comm. of the ACM, vol.18, issue.11, 1975.

S. , L. Et, B. , and P. , Évaluation de méthodes de segmentation thématique linéaire non supervisées après adaptation au fran cais, Actes de la conférence Traitement automatique des langues, 2004.

S. Jones and K. , A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation, vol.28, issue.1, 1972.

S. Jones, K. Walker, S. G. Et, R. , and S. E. , Probabilistic model of information retrieval : Development and comparative experiments, Information Processing and Management, issue.6, p.36, 2000.

B. Stein, Principles of hash-based text retrieval, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '07, 2007.
DOI : 10.1145/1277741.1277832

U. , M. Et, I. , and H. , A statistical model for domain-independent text segmentation, Proceedings of the 9th conference of the ACL, 2001.