Abstract : In survival analysis it is quite common that heterogeneity between patients results in various survival response distributions. This heterogeneity can be controlled through known covariates (such as date of birth, age at diagnosis, gender, treatment, co-exposure, BMI, etc.) using regression-type models such as the Cox proportional hazard model and by performing stratified analyses or by incorporating a random effect in a frailty model. Other types of heterogeneous dataset arise when the incidence rate changes over the calendar time in a cohort study and specific models like age-period-cohort have been extensively studied to take into account this kind of heterogeneity. While theses models have proved to be most useful, it is however likely that unaccounted latent heterogeneity remains in the survival signal. This might be due for example to an unknown interaction between a treatment and some exposure, or to some unaccounted heterogeneity of the disease itself (for example an unknown cancer sub-type). For instance, age at diagnosis might be associated with a higher chance to receive a new treatment or BMI might be associated with a specific exposure.
In the present work, we suggest a new approach considering survival heterogeneity as a breakpoint model in an ordered sequence of survival responses. The survival responses might be ordered according to any numerical covariate (ties are possible) like age at diagnosis, BMI, etc. The basic idea being that heterogeneity will be detected as soon as it is associated with the chosen covariate. In such a model, we aim at two objectives: first we want to estimate the hazard rates and the proportional factors in each homogenous region through a Cox model. Secondly, we want to accurately provide the number and location of the breakpoints. Recently a constrained Hidden Markov Model (HMM) method was suggested in the context of breakpoint analysis (see Luong et al, 2013). This method allows to perform a full change-point analysis in a segment-based model (one parameter by segment) providing linear EM estimates of the parameter and a full specification of the posterior distribution of change points. In this talk we adapt this method to the context of survival analysis, where the estimation is performed through the EM algorithm to provide update of the hazard rate estimates and the posterior distribution at each iteration step.
The method will be illustrated on the dataset on diabetic patients from the Steno Memorial hospital in Copenhagen (dataset from Andersen et al., 1993), where the event times are ordered with respect to the calendar time of disease onset. On this dataset, the years of disease onset of the patients range from 1933 to 1972. A two breakpoint model is found from our method and survival functions and hazard ratios are estimated on each three segment. Our results clearly indicate a general medical improvement over time for Danish diabetic patients.