Comparison of two topological approaches for dealing with noisy labeling

Abstract : This paper focuses on the detection of likely mislabeled instances in a learning dataset. In order to detect potentially mislabeled samples, two solutions are considered which are both based on the same framework of topological graphs. The first is a statistical approach based on Cut Edges Weighted statistics (CEW) in the neighborhood graph. The second solution is a Relaxation Technique (RT) that optimizes a local criterion in the neighborhood graph. The evaluations by ROC curves show good results since almost 90% of the mislabeled instances are retrieved for a cost of less than 20% of false positive. The removal of samples detected as mislabeled by our approaches generally leads to an improvement of the performances of classical machine learning algorithms.
Document type :
Journal articles
Complete list of metadatas

Cited literature [41 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01524431
Contributor : Fabien Rico <>
Submitted on : Thursday, May 18, 2017 - 10:34:41 AM
Last modification on : Wednesday, October 31, 2018 - 12:24:08 PM

File

Neurocomputing__SL_FM_FR_DAZ_V...
Files produced by the author(s)

Identifiers

Citation

Fabien Rico, Fabrice Muhlenbach, Djamel Zighed, Stéphane Lallich. Comparison of two topological approaches for dealing with noisy labeling. Neurocomputing, Elsevier, 2015, 160, pp.3 - 17. ⟨10.1016/j.neucom.2014.10.087⟩. ⟨hal-01524431⟩

Share

Metrics

Record views

104

Files downloads

91