Parametric estimation of spectrum driven by an exogenous signal

In this paper, we introduce new parametric generative driven auto-regressive (DAR) models. DAR models provide a nonlinear and non-stationary spectral estimation of a signal, conditionally to another exogenous signal. We detail how inference can be done efficiently while guaranteeing model stability. We show how model comparison and hyper-parameter selection can be done using likelihood estimates. We also point out the limits of DAR models when the exogenous signal contains too high frequencies. Finally, we illustrate how DAR models can be applied on neuro-physiologic signals to characterize phase-amplitude coupling.


INTRODUCTION
Auto-regressive (AR) models are stochastic signal models that have proved their usefulness in many applications: from speech and audio applications, econometrics or even physiology and biological signal processing.One of their advantages is the existence of fast inference algorithms [1] and to provide a compact representation of the spectral content of a signal (e.g. using 10 to 16 parameters for coding 32 ms of speech).
Standard AR models are so-called stationary, meaning that the statistics of the signal are assumed to be stable over time.When working with such models, the spectrum is therefore not a function of time.In many applications, this modeling assumption is not adapted to describe the interesting dynamics of the physical system observed.This is for example the case in the field of econometrics [2] where time-varying or nonlinear AR models were first studied, but it is also the case for physiological signals as it will be illustrated below with a phenomena known as phase-amplitude coupling [3].The type of signal we propose to study here is illustrated in Fig. 1.
Related non-linear auto-regressive models Various nonlinear auto-regressive models have been proposed in audio signal processing or econometrics.Models with a conditional heteroskedasticity (ARCH [4], GARCH [5]) are extremely popular in econometrics, since they are able to model a varying amplitude for the entire spectrum.However, these models do not model variations in the spectrum itself.To do so, a simple way is to define several regimes: at each time, the instantaneous model is a simple linear AR model, yet the AR coefficients change depending on a non-linear function of the past of the signal.The first models based on this idea are the SETAR models [2], which estimate several AR models depending on the amplitude of the signal with respect to some thresholds.To get a smoother transition between regimes, SE-TAR models have inspired other models like EXPAR [6] or STAR [7], in which the AR coefficients change continuously depending on a non-linear function of the past of the signal.FAR models [8] are even more general, since they estimate the different regimes independently from each other.Motivations behind these models are shared with the DAR models presented in this paper.However, DAR models do not require to infer the driving behavior from the signal itself but rely on the prior knowledge of an exogenous driving signal.As we demonstrate below, this increases both speed and robustness of the estimation, while we guarantee the local stability of the models thanks to a parametrization using parcor coefficients.

Models definition
An auto-regressive (AR) model specifies that a signal y depends linearly on its own p past values, where p is the order of the model: where T is the length of the signal and ε is the innovation (or residual), modeled with a Gaussian white noise: ε(t) ∼ N (0, σ(t) 2 ).To extend this AR model to a non-stationary model, one can assume [9,10] that the AR coefficients a i are driven by a polynomial function of a given exogenous signal x, here called the driver: where x(t) j is x(t) to the power j.The choice of the polynomial basis is discussed in section 3.2.Note that A i is a vector, as the upper-case letter suggests, composed of the scalars a ij .We note Âi the estimated value, and A i the transpose.
As we parametrize the AR coefficients, the estimated models are not guaranteed to be stable.To examine stability, the standard technique is to consider the parcor coefficients k i , introduced in the fast Levinson-Durbin algorithm [1] that estimates the a i coefficients recursively from previous orders: the a i at order p (noted a (p) i in Eq. 3) can be computed from k p and the AR coefficients at previous order a (p−1) i : From this so-called lattice representation, we can define a second non-stationary parametrization, where the k i (t) are directly driven by the exogenous signal x, instead of the a i (t): This parametrization is used as an intermediate step during model estimation.The lattice representation brings a simple condition (necessary and sufficient) for stability: −1 < k i < 1 [11].To enforce this condition, we use the log-area ratios (LAR) γ i [12], as suggested in [13]: We use the LAR coefficients γ i (t) in order to force the k i (t) to be within ] − 1, 1[, which enforces the stability of the instantaneous model for all t.Our third and final non-stationary parametrization is: We call this model a driven auto-regressive (DAR) model.

Model estimation
For DAR models, as the innovation is assumed to be Gaussian white noise, the likelihood L is obtained via: To estimate a DAR model, we maximize the likelihood iteratively from order 1 to order p, by adding each time a lattice cell.First, we assume that the innovation variance is constant σ(t) 2 = σ 2 and equal to the signal's empirical variance.So for each lattice cell i from i = 1 to p, we estimate the DAR coefficients Γ i ∈ R m+1 by minimizing a least square criterion over the forward residual: where the forward and backward residuals at the output of the i-th lattice cell are given by: and where we initialize with ε + 0 (t) = ε − 0 (t) = y(t).We minimize J 1 (Γ i ) with a Newton-Raphson procedure since the gradient and Hessian can be computed easily.To start with a good initialization, we approximate the problem with a twostep algorithm.In the first step, we solve the linear problem argmin Ki J 1 (K i ) using ( 4), which leads to a normal equation: In the second step, we estimate Γ i with a regression over the trajectories γ i (t) and ki (t): After clipping ki (t) = K i X(t) inside [−1 + η, 1 − η] (e.g.η = 10 −6 ), we measure the quality of the approximation with a second least-squares criterion: Again this leads to a normal equation: These two steps are used to initialize the Newton-Raphson procedure, and largely speed-up the optimization.We iterate this over each lattice cell i: For i > 1, the residuals ε + i (t) and ε − i (t) are obtained with (9) using the ki (t) derived from γi (t) = Γ i X(t) using (5).We then obtain the DAR coefficients Γ i for every order i from 1 to p.

Innovation variance estimation
During inference of the DAR coefficients Γ i , we first assumed a fixed innovation variance σ 2 .We propose to model it as driven by x, as in [9]: The vector B ∈ R m+1 is estimated from the residual ε + p (t) by maximum likelihood also using a Newton-Raphson procedure.We then iterate between the estimation of the Γ i and of B. We observed in our experiments that two iterations are generally sufficient.

Power spectral density
After model estimation, we can compute the conditional power spectral density (PSD) of the model S y : For a given driver's value x 0 , we compute the AR coefficients a i (x 0 ) from the DAR coefficients Γ i using ( 6), ( 5) and ( 3), along with the innovation's standard deviation σ(x 0 ) using ( 12).Since AR models with time-varying coefficients are locally stationary [14], we can compute the PSD for this driver's value : where j 2 = −1 and a 0 (x 0 ) = 1.Fig. 2 shows an example of S y (x) for the observed range of value of the driver x (for robustness, we only use the value of x between the 5 and 95 percentiles of all x values) (the signal used in this estimation will be detailed in section 4).Using the fast Fourier transform (FFT) in ( 13), we estimate the PSD for a range of frequency f ∈ [0, f s /2].As model estimation is also very fast, we obtain the PSD very quickly: With (p, m) = (10, 2), on a signal with T = 10 6 time points, the entire DAR estimation lasts 0.22 s.

BIC selection
DAR models are probabilistic, thus we can define a likelihood function L (7), and perform model and hyper-parameter selection by maximizing it.To compare models with different number of hyper-parameters (i.e.degrees of freedom), we can also use the well known bayesian information criterion (BIC) [15], which reads BIC = −2 log(L) + d log(T ), where d is the number of degrees of freedom.In DAR models, we have d = (p + 1)(m + 1) .As model estimation is very fast, we can easily estimate on a full grid of hyper-parameters (p and m) and keep the model that leads to the lowest BIC.
Simulations Using simulated data, we tested the hyperparameters selection based on the BIC.We generated a driving signal x for T = 10 4 time points, by re-sampling a Gaussian white noise of length T f x /f s (where f s = 10 3 Hz is the In dashed black, we plot the PSD estimated with a Welch method over the entire signal.
Note that the PSD is mostly flat since the signal was globally whitened as preprocessing, cf. 4 .
sampling frequency).We then drew a set of LAR coefficients γ i (t) using random walks: starting from a real number Note that the LAR coefficients do not need to be in ] − 1, 1[ to make the system stable (cf.section 2.1).However, to avoid having very large coefficients during a long period of time, we rescaled the trajectory to be inside [−4, 4].To make the LAR coefficients dependent of the driver x, we projected them on the basis With the obtained DAR coefficients Γi , we computed the instantaneous LAR coefficients γi (t), and generated a signal y(t) by feeding the corresponding lattice filters defined by ki (t) with a Gaussian white noise.To focus on the spectral fluctuations, we used a constant innovation gain: σ(t) = σ.
We then fitted several DAR models on y, with p ∈ [1, 20] and m ∈ [0, 3], and compared the BIC of each fit.We tested if the BIC selects the correct hyper-parameters, and we tested it for different driver's frequency f x .The results are presented in Fig. 3 : Interestingly, the BIC selection of m is correct in most cases, provided that the driver's frequency is not too high (f x < 50 Hz).When the driver's frequency is too low, the BIC sometimes overestimate m since the time length is too short to see many driver's oscillations.The BIC selects p correctly in most cases at ±2 (not shown), for all driver's frequencies.

Comparison with STAR models
The choice of a polynomial basis in ( 6) and ( 12) is motivated by simplicity, and practice also shows that a low order (≤ 3) is enough as stated by a BIC selection.For the sake of comparison, we also estimated a multi-regime LSTAR [16]: We then estimated some DAR models on these signals, and we selected p and m that minimized the BIC.graphs show the proportion of each m selected.The hyper-parameter m is correctly estimated in most cases if the driver's frequency is not too high (f x < 50 Hz).
which is parametrized with sigmoid functions: and with F 0 (x(t)) = 1.Note that to fairly compare the models, we use the same exogenous signal x to drive the AR coefficients, when standard LSTAR models use the delayed signal y(t − d), which results in a worse fit (cf.Fig 4).We also extended them to model the innovation variance σ(t) 2 with sigmoids: The results are presented in Fig. 4, and show that the polynomial and the sigmoid parametrizations achieve comparable performances.However, the optimization of the LSTAR is much slower, since there is no good initialization procedure for the transitions parameters γ j and c j : the non-linear optimization is slow and may contain several local minima, and the cost of an initial grid-search is large, or even prohibitive as soon as m ≥ 2. On the contrary, the DAR models estimation is very fast and does not suffer from multiple minima.Fig. 5.The driver x is extracted from the raw signal z with a band-pass filter.The remaining high frequencies are whitened to form the modeled signal y.

APPLICATION TO NEUROSCIENCE
In neuroscience, phase-amplitude coupling (PAC) refers to the interaction between the phase of a slow neural oscillation and the amplitude of high frequencies within the same signal or at a distinct brain location.High frequencies driven by slow signal fluctuations have been observed in animal [17] and human [3] studies and have been reported as a key element for neural communication during complex cognitive processes [18,19].Yet, PAC is usually measured through empirical and somehow ad hoc metrics [3,20].To give a proper model to PAC, we applied DAR models on a human electro-corticogram (ECoG) channel from [3] (730 seconds at 333 Hz).
After decimation and removal of the electrical network noise, we extracted the driver x with a zero-phase bandpass filter (f x = 3.8 Hz, ∆f x = 1 Hz).We removed a wider bandpass signal around it (∆f = 4 Hz) to prevent spurious correlations between x and y, and we filled the gap with white noise filtered with the same filter.We then whitened y with a simple AR model, by applying the inverse AR filter to the signal.This whitening step is not necessary, yet it reduces the need of a high order p in DAR, which reduces both the computational cost and the variance of the model.After an exhaustive grid search with p ∈ [1, 50] and m ∈ [0, 4], the hyper-parameters (p, m) = (10, 2) are selected by BIC, hence revealing the phase-amplitude coupling present in the data.
Fig. 5 shows a portion of the signals, while Fig. 2 shows the PSD (in dB) depending on the driver's value, estimated through a DAR model.PAC can be identified in the difference of S y (x) as the driver x varies: the PSD has more power for negative driver's values than for positive driver's values, and PSD shapes are also different.

CONCLUSION
In this paper, we introduced driven auto-regressive (DAR) models, to provide a parametric spectrum estimation of a signal, conditionally to another exogenous driving signal.The chosen parametrization guarantees local stability, and allows fast inference and easy hyper-parameter selection.We also showed how DAR models can be used to characterize phaseamplitude coupling on neuro-physiologic signals.

Fig. 2 .
Fig. 2. Power spectral density (PSD) evaluated through a DAR model, for different driver's values x.In dashed black, we plot the PSD estimated with a Welch method over the entire signal.Note that the PSD is mostly flat since the signal was globally whitened as preprocessing, cf. 4 .

Fig. 3 .
Fig.3.Selection of hyper-parameter m in simulated data with respect to the driver's frequency.We simulated 100 signals from 100 DAR models with p = 10 and m = 0 (left), m = 1 (middle), m = 2 (right).We then estimated some DAR models on these signals, and we selected p and m that minimized the BIC.graphs show the proportion of each m selected.The hyper-parameter m is correctly estimated in most cases if the driver's frequency is not too high (f x < 50 Hz).