Representation Learning of Compositional Data

Marta Avalos Fernandez; Richard Nock; Cheng Soon Ong; Julien Rouar; Ke Sun

Communication Dans Un Congrès Année : 2018

Representation Learning of Compositional Data

(1, 2) , (3, 4, 5) , (3, 5) , (1, 2) , (3)

1
2
3
4
5

Marta Avalos Fernandez

Fonction : Auteur
PersonId : 742122
IdHAL : mavalosf
ORCID : 0000-0002-5471-2615
IdRef : 153689293

Statistics In System biology and Translational Medicine

Bordeaux population health

Richard Nock

Fonction : Auteur
PersonId : 1040010

Data61 [Canberra]

The University of Sydney

Australian National University

Cheng Soon Ong

Fonction : Auteur
PersonId : 1040011

Data61 [Canberra]

Australian National University

Julien Rouar

Fonction : Auteur
PersonId : 1040012

Statistics In System biology and Translational Medicine

Bordeaux population health

Ke Sun

Fonction : Auteur
PersonId : 1040013

Data61 [Canberra]

Résumé

We consider the problem of learning a low dimensional representation for compositional data. Compositional data consists of a collection of nonnegative data that sum to a constant value. Since the parts of the collection are statistically dependent, many standard tools cannot be directly applied. Instead, compositional data must be first transformed before analysis. Focusing on principal component analysis (PCA), we propose an approach that allows low dimensional representation learning directly from the original data. Our approach combines the benefits of the log-ratio transformation from compositional data analysis and exponential family PCA. A key tool in its derivation is a generalization of the scaled Bregman theorem, that relates the perspective transform of a Bregman divergence to the Bregman divergence of a perspective transform and a remainder conformal divergence. Our proposed approach includes a convenient surrogate (upper bound) loss of the exponential family PCA which has an easy to optimize form. We also derive the corresponding form for nonlinear autoencoders. Experiments on simulated data and microbiome data show the promise of our method.

Mots clés

Microbiome data PCA Bregman divergences Compositional data Relative abundance data

Domaines

Machine Learning [stat.ML] Méthodologie [stat.ME] Calcul [stat.CO] Applications [stat.AP] Apprentissage [cs.LG] Santé publique et épidémiologie

Fichier principal

7902-representation-learning-of-compositional-data.pdf (757.33 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Marta Avalos : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01945508

Soumis le : mercredi 5 décembre 2018-12:37:39

Dernière modification le : mercredi 15 mars 2023-08:50:07

Archivage à long terme le : mercredi 6 mars 2019-13:54:02

Dates et versions

hal-01945508 , version 1 (05-12-2018)

Identifiants

HAL Id : hal-01945508 , version 1

Citer

Marta Avalos Fernandez, Richard Nock, Cheng Soon Ong, Julien Rouar, Ke Sun. Representation Learning of Compositional Data. NIPS 2018 - Thirty-second Conference on Neural Information Processing Systems, Dec 2018, Montréal, Canada. ⟨hal-01945508⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSERM INRIA INRIA2 U1219

110 Consultations

358 Téléchargements

Representation Learning of Compositional Data

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager