C. D. Kumar-r and . Punera-k, A graph-theoretic approach to webpage segmentation, Proceedings of the 17th international conference on World Wide Web, WWW '08, pp.377-386, 2008.

D. S. Vijayaraghavan-p and . Mathew-m, Article : Eliminating noisy information in web pages using featured dom tree, International Journal of Applied Information Systems, vol.2, issue.2, pp.27-34, 2012.

E. I. Novák-a, More effective boilerplate removal ? the goldminer algorithm, Polibits, vol.48, pp.79-83, 2013.

F. A. Zanchetta-e, . &. Baroni-m, and . Bernardini-s, Introducing and evaluating ukwac, a very large web-derived corpus of english, Actes du 4ème Workshop Web as Corpus, 2008.

K. C. and F. P. Nejdl-w, Boilerplate detection using shallow text features, Proceedings of the third ACM international conference on Web search and data mining, WSDM '10, pp.441-450, 2010.

P. J. Roth-d, Extracting article text from the web with maximum subsequence segmentation, WWW, pp.971-980, 2009.

R. J. Metzener-d, Pattern matching : The gestalt approach, Dr. Dobbs Journal, vol.13, issue.47, pp.59-51, 1988.

S. M. and M. M. Pecina-p, Victor : the Web-Page Cleaning Tool, Actes du 4ème Workshop Web as Corpus, 2008.

V. K. Da, S. A. Pinto-n, C. S. De-moura-e, and . Freire-j, A fast and robust method for web page template detection and removal, ACM international conference on Information and knowledge management, CIKM '06, pp.258-267, 2006.