Fast estimation of posterior probabilities in change-point models through a constrained hidden Markov model

Abstract : The detection of change-points in heterogeneous sequences is a statistical challenge with applications across a wide variety of fields. In bioinformatics, a vast amount of methodology has been developed to identify an ideal set of change-points for detecting Copy Number Variation (CNV). Numerous efficient algorithms are currently available for finding the best segmentation of the data in CNV. However, relatively few approaches consider the important problem of assessing the uncertainty of the change-point location. Having quadratic complexity, these approaches typically are intractable for large datasets of tens of thousands points or more. In this paper, we assess uncertainty through a constrained hidden Markov model with a fixed number of segments of contiguous observations with the same distribution. Forward-backward algorithms from this model estimate posterior probabilities of interest with linear complexity. The methods are implemented in the R package postCP, which uses the results of a given change-point detection algorithm to estimate the probability that each observation is a change-point. We present the results of the package on a sequence of Poisson generated data, and on a publicly available smaller data set (n=120) for CNV. Due to its frequentist framework, postCP obtains less conservative confidence intervals than previously published Bayesian methods, but with linear complexity instead of quadratic. On another data set of high-resolution data (n=14,241), the implementation processed high-resolution data in less than one second on a mid-range laptop computer.
Type de document :
Communication dans un congrès
JOBIM: Journée Ouverte en Biologie, Informatique et Mathématiques, Jul 2012, Rennes, France. 2012
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00712343
Contributeur : Yves Rozenholc <>
Soumis le : mercredi 27 juin 2012 - 00:33:53
Dernière modification le : mardi 11 octobre 2016 - 11:58:23

Identifiants

  • HAL Id : hal-00712343, version 1

Collections

Citation

The Minh Luong, Yves Rozenholc, Gregory Nuel. Fast estimation of posterior probabilities in change-point models through a constrained hidden Markov model. JOBIM: Journée Ouverte en Biologie, Informatique et Mathématiques, Jul 2012, Rennes, France. 2012. <hal-00712343>

Partager

Métriques

Consultations de la notice

105