A graph-theoretic approach to webpage segmentation, Proceedings of the 17th international conference on World Wide Web, WWW '08, pp.377-386, 2008. ,
Article : Eliminating noisy information in web pages using featured dom tree, International Journal of Applied Information Systems, vol.2, issue.2, pp.27-34, 2012. ,
More effective boilerplate removal ? the goldminer algorithm, Polibits, vol.48, pp.79-83, 2013. ,
Introducing and evaluating ukwac, a very large web-derived corpus of english, Actes du 4ème Workshop Web as Corpus, 2008. ,
Boilerplate detection using shallow text features, Proceedings of the third ACM international conference on Web search and data mining, WSDM '10, pp.441-450, 2010. ,
Extracting article text from the web with maximum subsequence segmentation, WWW, pp.971-980, 2009. ,
Pattern matching : The gestalt approach, Dr. Dobbs Journal, vol.13, issue.47, pp.59-51, 1988. ,
Victor : the Web-Page Cleaning Tool, Actes du 4ème Workshop Web as Corpus, 2008. ,
A fast and robust method for web page template detection and removal, ACM international conference on Information and knowledge management, CIKM '06, pp.258-267, 2006. ,