Skip to Main content Skip to Navigation
Conference papers

Training Set Class Distribution Analysis for Deep Learning Model - Application to Cancer Detection

Abstract : Deep learning models specifically CNNs have been used successfully in many tasks including medical image classification. CNN effectiveness depends on the availability of large training data set to train which is generally costly to obtain for new applications or new cases. However, there is a little concrete recommendation about training set creation. In this research, we analyze the impact of different class distributions in the training data to a CNN model. We consider the case of cancer detection task from histopathological images for cancer diagnosis and derive some useful hypotheses about the distribution of classes in the training data. We found that using all the training data leads to the best recall-precision trade-off, while training with a reduced number of examples from some classes, it is possible to inflect the model toward a desired accuracy on a given class.
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download
Contributor : Open Archive Toulouse Archive Ouverte (oatao) <>
Submitted on : Tuesday, July 7, 2020 - 9:55:59 AM
Last modification on : Wednesday, June 9, 2021 - 10:00:35 AM
Long-term archiving on: : Friday, November 27, 2020 - 12:22:02 PM


Files produced by the author(s)


  • HAL Id : hal-02891748, version 1
  • OATAO : 26163


Ismat Ara Reshma, Margot Gaspard, Camille Franchet, Pierre Brousset, Emmanuel Faure, et al.. Training Set Class Distribution Analysis for Deep Learning Model - Application to Cancer Detection. 1st International Conference on Advances in Signal Processing and Artificial Intelligence (ASPAI 2019), Mar 2019, Barcelona, Spain. pp.123-127. ⟨hal-02891748⟩



Record views


Files downloads