DyClee-C: a clustering algorithm for qualitative data based diagnosis
Résumé
Sensors are multiplying on machines, networks and living things. Reasoning and extracting knowledge from this huge amount of data is among nowadays challenges. In data-based diagnostic applications, large amounts of data are often available but a key issue is that data remain unlabelled because labelling would require too much time and imply prohibitive costs. The different situations, e.g. normal or faulty, must hence be learned from the data. Clustering methods, also qualified as unsupervised classification methods, can then be used to create groups of samples according to some similarity criterion. The different groups can supposedly be associated to different situations. Numerous algorithms have been developed in recent years for clustering numeric data but these methods are not applicable to qualitative (categorical) data. However, in many application domains, qualitative features are key to properly describe the different situations. This paper presents DyClee-C, an extension of the numeric feature based DyClee algorithm to qualitative data. DyClee-C is applied to two data sets: a soybean data set to diagnose the disease soybean plants and a breast cancer data set to assess the current diagnosis in terms of recurrence events and prognose possible relapse.
Domaines
Sciences de l'ingénieur [physics]
Origine : Fichiers produits par l'(les) auteur(s)
Loading...