Tree-based Cost-Sensitive Methods for Fraud Detection in Imbalanced Data

Abstract : Bank fraud detection is a difficult classification problem where the number of frauds is much smaller than the number of genuine transactions. In this paper, we present cost sensitive tree-based learning strategies applied in this context of highly imbalanced data. We first propose a cost sensitive splitting criterion for decision trees that takes into account the cost of each transaction and we extend it with a decision rule for classification with tree ensembles. We then propose a new cost-sensitive loss for gradient boosting. Both methods have been shown to be particularly relevant in the context of imbalanced data. Experiments on a proprietary dataset of bank fraud detection in retail transactions show that our cost sensitive algorithms allow to increase the retailer's benefits by 1,43% compared to non cost-sensitive ones and that the gradient boosting approach outperforms all its competitors.
Complete list of metadatas

Cited literature [19 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01895967
Contributor : Guillaume Metzler <>
Submitted on : Friday, November 9, 2018 - 9:54:17 AM
Last modification on : Friday, September 13, 2019 - 9:49:21 AM
Long-term archiving on : Sunday, February 10, 2019 - 1:15:05 PM

File

CSTree.pdf
Files produced by the author(s)

Identifiers

Citation

Guillaume Metzler, Xavier Badiche, Brahim Belkasmi, Elisa Fromont, Amaury Habrard, et al.. Tree-based Cost-Sensitive Methods for Fraud Detection in Imbalanced Data. IDA 2018 - 17th International Symposium on Intelligent Data Analysis, Oct 2018, ‘s-Hertogenbosch, Netherlands. pp.213-224, ⟨10.1007/978-3-030-01768-2_18⟩. ⟨hal-01895967⟩

Share

Metrics

Record views

140

Files downloads

217