L. Bahl, P. Brown, P. De-souza, and R. Mercer, A tree-based statistical language model for natural language speech recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.37, issue.7, pp.1001-1008, 2002.
DOI : 10.1109/29.32278

R. Casey and E. Lecolinet, A survey of methods and strategies in character segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.18, issue.7, pp.690-706, 2002.
DOI : 10.1109/34.506792

C. Chang and C. Lin, LIBSVM, ACM Transactions on Intelligent Systems and Technology, vol.2, issue.3, 2001.
DOI : 10.1145/1961189.1961199

H. Chang, S. Sull, and S. Lee, Efficient video indexing scheme for content-based retrieval, IEEE Transactions on Circuits and Systems for Video Technology, vol.9, issue.8, pp.1269-1279, 2002.
DOI : 10.1109/76.809161

Y. Chang, W. Zeng, I. Kamel, and R. Alonso, Integrated image and speech analysis for content-based video indexing, IEEE International Conference on Multimedia Computing and Systems, pp.306-313, 2002.

D. Chen, J. Odobez, and H. Bourlard, Text detection and recognition in images and video frames, Pattern Recognition, vol.37, issue.3, pp.595-608, 2004.
DOI : 10.1016/j.patcog.2003.06.001

T. Chen, D. Ghosh, and S. Ranganath, Video-text extraction and recognition, IEEE Region 10 Conference, pp.319-322, 2005.

M. Delakis and C. Garcia, Text detection with convolutional neural networks, International Conference on Computer Vision Theory and Applications, pp.290-294, 2008.

C. Dorai, H. Aradhye, and J. Shim, End-to-end video text recognition for multimedia content analysis, IEEE International Conference on Multimedia and Expo, pp.601-604, 2001.
DOI : 10.1109/icme.2001.1237761

C. Garcia and M. Delakis, Convolutional face finder: a neural architecture for fast and robust face detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.26, issue.11, pp.1408-1423, 2004.
DOI : 10.1109/TPAMI.2004.97

X. Hua, P. Yin, and H. Zhang, Efficient video text recognition using multiple frame integration, International Conference on Image Processing, pp.397-400, 2002.

S. Kopf, T. Haenselmann, and W. Effelsberg, Robust character recognition in low-resolution images and videos, 2005.

Y. Lecun and Y. Bengio, Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, pp.255-258, 1995.

S. Lee, D. Lee, and H. Park, A new methodology for gray-scale character segmentation and recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.18, issue.10, pp.1045-1050, 2002.

R. Lienhart and F. Stuber, Automatic text recognition in digital videos, Image and Video Processing, pp.2666-2675, 1996.
DOI : 10.1117/12.234741
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.7592

F. Manerba, J. Benois-pineau, R. Leonardi, and B. Mansencal, Multiple Moving Object Detection for Fast Video Content Description in Compressed Domain, EURASIP Journal on Advances in Signal Processing, vol.2008, issue.1, 2008.
DOI : 10.1016/j.patrec.2006.12.009
URL : https://hal.archives-ouvertes.fr/hal-00308053

Z. Sa¨?danesa¨?dane and C. Garcia, Automatic scene text recognition using a convolutional neural network In International Workshop on Camera-Based Document Analysis and Recognition, pp.100-106, 2007.

P. Y. Simard, D. Steinkraus, and J. C. Platt, Best practices for convolutional neural networks applied to visual document analysis, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings., pp.958-963, 2003.
DOI : 10.1109/ICDAR.2003.1227801

C. Snoek and M. Worring, Multimedia event-based video indexing using time intervals, IEEE Transactions on Multimedia, vol.7, issue.4, pp.638-647, 2005.
DOI : 10.1109/TMM.2005.850966
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.196.5607

T. Som, D. Can, and M. Saraclar, HMM-based sliding video text recognition for Turkish broadcast news, 2009 24th International Symposium on Computer and Information Sciences, pp.475-479, 2009.
DOI : 10.1109/ISCIS.2009.5291877

A. Stolcke, SRILM-an extensible language modeling toolkit, International Conference on Spoken Language Processing, pp.901-904, 2002.

R. Yager, Connectives and quantifiers in fuzzy sets. Fuzzy sets and systems, pp.39-75, 1991.

J. Yi, Y. Peng, and J. Xiao, Using Multiple Frame Integration for the Text Recognition of Video, 2009 10th International Conference on Document Analysis and Recognition, pp.71-75, 2009.
DOI : 10.1109/ICDAR.2009.58

D. Zhang and S. Chang, A Bayesian framework for fusing multiple word knowledge models in videotext recognition, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.528-533, 2003.