A comparison of different off-centered entropies to deal with class imbalance for decision trees

Abstract : In data mining, large differences in prior class probabilities known as the class imbalance problem have been reported to hinder the performance of classifiers such as decision trees. Dealing with imbalanced and cost-sensitive data has been recognized as one of the 10 most challenging problems in data mining research. In decision trees learning, many measures are based on the concept of Shannon’s entropy. A major characteristic of the entropies is that they take their maximal value when the distribution of the modalities of the class variable is uniform. To deal with the class imbalance problem, we proposed an off-centered entropy which takes its maximum value for a distribution fixed by the user. This distribution can be the a priori distribution of the class variable modalities or a distribution taking into account the costs of misclassification. Others authors have proposed an asymmetric entropy. In this paper we present the concepts of the three entropies and compare their effectiveness on 20 imbalanced data sets. All our experiments are founded on the C4.5 decision trees algorithm, in which only the function of entropy is modified. The results are promising and show the interest of off-centered entropies to deal with the problem of class imbalance.
Document type :
Conference papers
Complete list of metadatas

Contributor : Bibliothèque Télécom Bretagne <>
Submitted on : Monday, May 6, 2019 - 10:29:51 AM
Last modification on : Wednesday, May 15, 2019 - 7:36:35 AM

Links full text



Philippe Lenca, Stéphane Lallich, Thanh Nghi Do, Nguyen-Khang Pham. A comparison of different off-centered entropies to deal with class imbalance for decision trees. Pacific-Asia conference on knowledge discovery and data Mining, 20-23 May, Osaka, Japan, May 2008, Osaka, Japan. pp.634 - 643, ⟨10.1007/978-3-540-68125-0_59⟩. ⟨hal-02120728⟩



Record views