HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Journal articles

Understanding and customizing stopword lists for enhanced patent mapping

Abstract : While the use of patent mapping tools is growing, the 'black-box' systems involved do not generally allow the user to interfere further than the preliminary retrieval of documents. Except for one thing: the stopword list, i.e. the list of 'noise' words to be ignored, which can be modified to one's liking and dramatically impacts the final output and analysis. This paper invokes information science and computer science to provide clues for a better understanding of the stopword lists' origin and purpose, and how they fit in the mapping algorithm. Further, it stresses the need for stopword lists that depend on the document corpus analyzed. Thus, the analyst is invited to add and remove stopwords — or even, in order to avoid inherent biases, to use algorithms that can automatically create ad hoc stopword lists.
Complete list of metadata

Cited literature [25 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01247971
Contributor : Antoine Blanchard Connect in order to contact the contributor
Submitted on : Wednesday, December 23, 2015 - 11:19:28 AM
Last modification on : Monday, May 31, 2021 - 1:17:33 PM
Long-term archiving on: : Thursday, March 24, 2016 - 12:10:57 PM

File

StopwordList_preprint.pdf
Files produced by the author(s)

Identifiers

Citation

Antoine Blanchard. Understanding and customizing stopword lists for enhanced patent mapping. World Patent Information, Elsevier, 2007, 29 (4), pp.308. ⟨10.1016/j.wpi.2007.02.002⟩. ⟨hal-01247971⟩

Share

Metrics

Record views

53

Files downloads

1649