A. Barbaresi, Crawling microblogging services to gather language-classified URLs. Workflow and case study, Proceedings of the 51th Annual Meeting of the ACL, Student Research Workshop, pp.9-15, 2013.
URL : https://hal.archives-ouvertes.fr/halshs-00840861

A. Barbaresi, Ad hoc and general-purpose corpus construction from web sources, 2015.
URL : https://hal.archives-ouvertes.fr/tel-01167309

A. Barbaresi, Collection, Description, and Visualization of the German Reddit Corpus, 2nd Workshop on Natural Language Processing for Computer- Mediated Communication, GSCL conference, pp.7-11, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01207311

Z. Cheng, J. Caverlee, L. , and K. , You are where you tweet, Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM '10, pp.759-768, 2010.
DOI : 10.1145/1871437.1871535

J. Ebner, Duden: ¨ Osterreichisches Deutsch, 2008.

B. Gonçalves, N. Perra, and A. Vespignani, Modeling Users' Activity on Twitter Networks: Validation of Dunbar's Number, PLoS ONE, vol.158, issue.8, p.22656, 2011.
DOI : 10.1371/journal.pone.0022656.s001

A. Jaech and M. Ostendorf, What Your Username Says About You, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.
DOI : 10.18653/v1/D15-1240

B. Krishnamurthy, P. Gill, and M. Arlitt, A few chirps about twitter, Proceedings of the first workshop on Online social networks, WOSP '08, pp.19-24, 2008.
DOI : 10.1145/1397735.1397741

J. Kulshrestha, F. Kooti, A. Nikravesh, and P. K. Gummadi, Geographic Dissection of the Twitter Network, Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM), pp.202-209, 2012.

S. Kumar, F. Morstatter, and H. Liu, Twitter Data Analytics, 2014.
DOI : 10.1007/978-1-4614-9372-3

K. Leetaru, S. Wang, G. Cao, A. Padmanabhan, and E. Shook, Mapping the global Twitter heartbeat: The geography of Twitter, First Monday, vol.18, issue.5, 2013.
DOI : 10.5210/fm.v18i5.4366

N. Ljube?ic, D. Fi?er, and T. Erjavec, Tweet- CaT: a Tool for Building Twitter Corpora of Smaller Languages, Proceedings of LREC, pp.2279-2283, 2014.

M. Lui and T. Baldwin, Accurate Language Identification of Twitter Messages, Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM), pp.17-25, 2014.
DOI : 10.3115/v1/W14-1303

R. Mccreadie, I. Soboroff, J. Lin, C. Macdonald, I. Ounis et al., On building a reusable Twitter corpus, Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '12, pp.1113-1114, 2012.
DOI : 10.1145/2348283.2348495

F. Morstatter, J. Pfeffer, H. Liu, C. , and K. M. , Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose, Proceedings of ICWSM, 2013.

C. Olston and M. Najork, Web Crawling, Foundations and Trends?? in Information Retrieval, vol.4, issue.3, pp.175-246, 2010.
DOI : 10.1561/1500000017

R. Tinoco and A. , Twitter como Corpus para Estudios de Geolingüística del Español. Sophia Linguistica: working papers in linguistics, pp.147-163, 2013.

T. Scheffler, J. Gontrum, M. Wegel, and S. Wendler, Mapping German Tweets to Geographic Regions, Workshop Proceedings of the 12th KONVENS conference, 2014.

T. Scheffler, A German Twitter Snapshot, Proceedings of LREC, pp.2284-2289, 2014.

B. Stone, Twitter Blog: Location, location, location. https://web.archive.org/web, 2009.

M. B. Zafar, P. Bhattacharya, N. Ganguly, K. P. Gummadi, and S. Ghosh, Sampling Content from Online Social Networks, ACM Transactions on the Web, vol.9, issue.3, p.12, 2015.
DOI : 10.1145/2743023