Parallel and Distributed Processing for Unsupervised Patient Phenotype Representation

Abstract : The value of data-driven healthcare is the possibility to detect new patterns for inpatient care, treatment, prevention, and comprehension of disease or to predict the duration of hospitalization, its cost or whether death is likely to occur during the hospital stay. Modeling precise patients phenotype representation from clinical data is challenging over its high-dimensionality, noisy and missing data to be processed into a new low-dimensionality space. Likewise, processing unsupervised learning models into a growing clinical data raises many issues, in terms of algorithmic complexity, such as time to model convergence and memory capacity. This paper presents DiagnoseNET framework to automate patient phenotype extractions and apply them to predict different medical targets. It provides three high-level features: a full-workflow orchestration into stage pipelining for mining clinical data and using unsupervised feature representations to initialize supervised models; a data resource management for training parallel and distributed deep neural networks. As a case of study, we have used a clinical dataset from admission and hospital services to build a general purpose inpatient phenotype representation to be used in different medical targets, the first target is to classify the main purpose of inpatient care. The research focuses on managing the data according to its dimensions, the model complexity, the workers number selected and the memory capacity, for training unsupervised staked denoising auto-encoders over a Mini-Cluster Jetson TX2. Therefore, mapping tasks that fit over computational resources is a key factor to minimize the number of epochs necessary to model converge, reducing the execution time and maximizing the energy efficiency.
Type de document :
Communication dans un congrès
LATIN AMERICA HIGH PERFORMANCE COMPUTING CONFERENCE, Sep 2018, Bucaramanga, Colombia. 〈http://www.ccarla.org〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01885364
Contributeur : Michel Riveill <>
Soumis le : lundi 1 octobre 2018 - 18:05:04
Dernière modification le : lundi 5 novembre 2018 - 15:52:10

Fichier

main.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01885364, version 1

Citation

John Anderson Garcia Henao, Frédéric Precioso, Pascal Staccini, Michel Riveill. Parallel and Distributed Processing for Unsupervised Patient Phenotype Representation. LATIN AMERICA HIGH PERFORMANCE COMPUTING CONFERENCE, Sep 2018, Bucaramanga, Colombia. 〈http://www.ccarla.org〉. 〈hal-01885364〉

Partager

Métriques

Consultations de la notice

16

Téléchargements de fichiers

26