Denoising Smooth Signals Using a Bayesian Approach: Application to Altimetry

This paper presents a novel Bayesian strategy for the estimation of smooth signals corrupted by Gaussian noise. The method assumes a smooth evolution of a succession of continuous signals that can have a numerical or an analytical expression with respect to some parameters. The proposed Bayesian model takes into account the Gaussian properties of the noise and the smooth evolution of the successive signals. In addition, a gamma Markov random field prior is assigned to the signal energies and to the noise variances to account for their known properties. The resulting posterior distribution is maximized using a fast coordinate descent algorithm whose parameters are updated by analytical expressions. The proposed algorithm is tested on satellite altimetric data demonstrating good denoising results on both synthetic and real signals. In comparison with state-of-the-art algorithms, the proposed strategy provides a good compromise between denoising quality and necessary reduced computational cost. The proposed algorithm is also shown to improve the quality of the altimetric parameters when combined with a parameter estimation or a classification strategy.


Denoising Smooth Signals Using a Bayesian
Approach: Application to Altimetry I. INTRODUCTION I N MANY applications, the development of new sensor technologies allows for high-speed acquisition of a succession of signals, leading to a slight variation from one signal to the next. This is the case for satellite altimetric signals that can be described as a succession of continuous functions corrupted by noise [1], [2]. Indeed, when observing the ocean, the altimetric successive signals show a reduced variation due to the nature of ocean (see Fig. 1 that shows a succession of 800 signals acquired by the Jason-2 mission). This paper aims to exploit this correlation to denoise the observed altimetric signals. A satellite altimeter is a nadir-viewing radar that emits regular pulses and records the travel time, the magnitude, and the shape of each return signal after reflection on the oceanic surface. This reflected echo provides information about some physical parameters such as the range between the satellite and the observed scene (denoted by τ ), the significant wave height (denoted by μ), and the wind speed (related to the signal's amplitude P u ). The shape of the signals carries additional information related to the observed regions (such as coastal zone, iceberg, etc.), which promotes the development of different altimetric classification methods [3], [4].
Altimetric signals are corrupted by speckle noise, and many recent studies and missions have been focusing on improving the quality of these signals by reducing the effects of the noise. This goal can be achieved by improving the instrument technology [5], [6], the physical altimetric models [2], [7]- [9], or the signal processing algorithms [10], [11]. In this paper, we focus on signal processing approaches that can be divided into two categories, as shown in Fig. 2. The first class focuses on the altimetric parameters by providing realistic smooth estimates [10]- [12]. The second class is more general and consists in filtering the noisy signals [13], [14]. The filtered signals can then be processed by several classical techniques, which have different goals such as parameter estimation [7], [10], [11], signal classification [4], and object detection [15]. This study follows This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ the second class and proposes a new filtering algorithm that operates on the observed signals. The resulting algorithm differs from the CD-BM algorithm proposed in [12], since it can be combined with many classical altimetric algorithms for different purposes, while CD-BM is specialized on the extraction of realistic altimetric parameters.
Filtering successive signals can be seen as a classical image denoising problem, which has been tackled using different approaches such as empirical, sparsity-based, and statistical methods, etc. Empirical methods include the empirical mode decomposition [16], [17] and other heuristic algorithms [18]. Sparsity-based methods account for the sparsity of the data under a given signal representation. This class includes optimization-based algorithms such as FISTA [19] and SALSA [20], and collaborative methods as the well-known BM3D [21], and its variant for speckle noise SBM3D [22]. Statistical methods are generally based on the noise statistics of the image. This class includes, as an example, the well-known PPB algorithm [23], Kalman filtering [24], particle filtering methods [25], [26], and Gaussian-process-based methods [27]. This last class is considered in this paper by proposing a new Bayesian algorithm especially designed to denoise altimetric signals under the requirements of a good denoising quality and a reduced computational cost. It should be noted that the provided list is far from being exhaustive, and we refer the interested reader to [18], [28], and [29] for more details regarding other methods.
The first contribution of this paper is a hierarchical Bayesian model to denoise a set of smooth altimetric signals corrupted by a speckle noise. As a result of the averaging on-board the altimetric satellite, the speckle noise is generally approximated by an independent and identically distributed Gaussian noise [7], [11]. The proposed Bayesian model generalizes this assumption by considering that each signal is corrupted by additive independent and nonidentically distributed Gaussian noise. A gamma Markov random field (gamma-MRF) prior [30] is also considered to account for the correlation between the noise variances to better approximate the speckle noise. The signal energies are also assigned a gamma-MRF prior to account for their continuity. Using the Bayes rule, the likelihood and the prior distributions lead to a posterior distribution that is used to estimate the noiseless signals and the noise parameters (as described in the next paragraph). Note that the proposed Bayesian hierarchy is generic in the sense that it does not assume a specific signal model. Indeed, the signal can be expressed by a numerical formula or by linear/nonlinear analytical function with respect to (w.r.t.) some parameters. Therefore, the proposed algorithm can be easily adapted to coastal altimetric signals [31], [32] or other altimetric technologies such as delay/Doppler altimetry [6], [33], [34]. It can also be generalized to hyperspectral images, which often present a high spectral correlation, as already exploited by many algorithms in the literature to denoise these images [35], [36].
The second contribution of this paper is the derivation of a denoising algorithm associated with the proposed hierarchical Bayesian model. The minimum mean square error (MMSE) and maximum a posteriori (MAP) estimators of the unknown signals/parameters cannot be easily computed from the obtained joint posterior. In this paper, the MAP estimator is evaluated by considering a coordinate descent algorithm (CDA) [37], [38] whose convergence to a stationary point is ensured. The proposed algorithm sequentially updates the estimated noiseless signals, noise variances, and other hyperparameters by analytical formulas leading to a reduced computational cost. The proposed Bayesian model and the estimation algorithm are validated using synthetic and real altimetric data acquired during the Jason-2 mission. The obtained results are very promising and show the potential of the proposed denoising strategy. This paper is organized as follows. Section II introduces the observation model and the considered altimetric signal. The proposed hierarchical Bayesian model and its estimation algorithm are introduced in Sections III and IV. Section V evaluates and compares the proposed technique with the state-of-the-art algorithms using simulated data with controlled ground truth and real Jason-2 altimetric data. Finally, conclusions and future work are reported in Section VI.

II. PROBLEM FORMULATION
Consider M successive signals S ∈ IR K ×M and let Y ∈ IR K ×M denote their noisy version. Let y :m ∈ IR K ×1 be the mth column of Y and y k : ∈ IR 1×M its kth row, representing the kth temporal gate for all signals. For notation simplicity, we denote y :m = y m , for m = 1, . . . , M, and y k : = y k , for k = 1, . . . , K (the same notation is used for s). The observation model is given by where ∼ means "is distributed according to," y m and s m are (K × 1) vectors representing the mth observed and noiseless signals, and e m is a centered Gaussian noise vector with a diagonal covariance matrix Σ = diag(σ 2 ), with σ 2 = (σ 2 1 , . . . , σ 2 K ) T a (K × 1) vector. The signals S might depend on some parameters (by a linear or nonlinear expression), which are denoted by the (1 × H) vector Θ m = [θ 1 (m), . . . , θ H (m)] containing the H parameters of the mth signal. Note, however, that the proposed method does not necessarily require a parametric expression for S and is valid provided that the signals satisfy some properties (as described in the following). In different applications such as oceanic altimetry [10], [11], the successive signals show a reduced variation mainly because of the correlation between the successive physical parameters Θ = (Θ T 1 , . . . , Θ T M ) T (see Fig. 1). This smooth variation can be highlighted by expressing the observed signals (1) as follows: where k ∈ {1, . . . , K} indexes the signal samples that are known as "temporal gates," I M denotes the (M × M ) identity matrix, and s k is a smooth (M × 1) vector representing the signal evolution at the kth gate (see Fig. 1(bottom) for examples). The proposed Bayesian method aims to filter the observed signals y k , k ∈ {1, . . . , K}, to retrieve the noiseless signals s k , k ∈ {1, . . . , K}. The next section introduces the hierarchical Bayesian model proposed to achieve this goal.

III. HIERARCHICAL BAYESIAN MODEL
This section introduces a hierarchical Bayesian model to denoise M successive signals. The Bayesian approach first requires the determination of the likelihood that is based on the statistical model associated with the observed data. Second, the known properties of the parameters of interest are modeled via suitable prior distributions. The Bayes theorem allows the likelihood and the priors to be combined to build the posterior distribution of the statistical model. More precisely, if f (X) denotes the prior distribution assigned to the parameter X, the Bayesian approach computes the posterior distribution of X using the Bayes rule where ∝ means "proportional to" and f (Y |X) is the likelihood of the observation vector Y . The parameter X is then estimated from the posterior distribution by computing its MMSE estimator (the mean of the distribution) or its MAP estimator (the maximum of the distribution). The following sections introduce the likelihood and the prior distributions considered in this paper. The unknown parameters of the proposed model include the (K × M ) matrix representing the noiseless signals S and the (K × 1) vector σ containing the noise variances associated with the M considered signals.

A. Likelihood
The observation model defined in (2) and the Gaussian properties of the noise sequence e k , k ∈ {1, . . . , K}, yield where || · || denotes the standard 2 norm such that ||x|| 2 = x T x. Assuming independence between the temporal samples of the observed signals leads to

B. Priors for the Observed Signal
As previously assumed, the successive observed signals evolve slowly leading to smooth vectors s k , for k ∈ 1, . . . , K [see Fig. 1(bottom)]. This property is satisfied by considering a conjugate Gaussian prior for s k ensuring smoothness as follows: where H is an (M × M ) matrix representing the squaredexponential covariance function given by , which introduces the correlation between the successive signals, ϑ controls the degree of correlation between the signals, and 2 k is a variance parameter that is gate dependent. From (6), it is clear that this variance is related to the energy of the signals at the kth gate (via the norm s T k H −1 s k ). Moreover, because of the continuity of the signal s m w.r.t. the temporal gates, the signal energies vary smoothly from one gate to another. Therefore, we expect 2 k to vary smoothly from one gate to another, which will be introduced by considering a specific prior for 2 k , as explained in Section III-D. As a result, the proposed model accounts for the individual signal continuity w.r.t. temporal gates and for the smooth evolution existing between successive signals. An interesting property of this prior is that the signals s k and s k , k = k , are conditionally independent; thus, they can be updated in parallel. Note that the coefficient ϑ depends on some technical parameters such as the speed of the satellite, and the number of emitted signals per second. Its value affects the algorithm behavior where a high value of ϑ leads to oversmoothing and a low value provides noisy signals. Therefore, it should be empirically adapted to the considered data. As a rule of thumb, ϑ should be increased for systems with higher frequency of signals or lower satellite speed. 1

C. Prior for the Noise Variance
The choice of a prior distribution is generally driven by two points: the available knowledge about the parameter of interest and the tractability of the resulting posterior distribution. Regarding the first point, and due to the speckle origins of the corrupting noise, we expect the noise variances σ 2 k , k ∈ {1, . . . , K}, to vary smoothly. The second point is often fulfilled by considering a conjugate distribution for the parameter of interest. In our case, σ has a conjugate inverse gamma distribution, which is the commonly used variance prior in many Bayesian algorithms [39], [40]. In this paper, we satisfy these two goals by introducing an auxiliary vector w (of size K × 1) and assigning a gamma-MRF prior for the couple (σ, w) given by (see [30] for more details regarding this prior) where Z(ζ) is a normalizing constant and ζ > 1 is a fixed coupling parameter that controls the amount of correlation enforced by the gamma-MRF. This prior ensures that each σ 2 k is connected to two neighboring elements of w and vice versa [see Fig. 3(a)]. An interesting property of this joint prior is that the conditional prior distributions of σ and w reduce to conjugate inverse gamma (IG) and gamma (G) distributions, respectively [30]: Equation (8) clearly shows that the variances σ 2 k and σ 2 k , for k = k , are conditionally independent and that the correlation is introduced via the auxiliary variables w. It also shows that the maximum of the conditional distributions of σ and w can be analytically evaluated (as detailed in Section IV). These interesting properties will no longer hold if one marginalizes over the latent variables w, which is not recommended. The coefficients associated with this prior are ζ and w 0 . The condition ζ > 1 ensures the existence of a mode for the gamma distribution (necessary for the considered algorithm). A high value of ζ enforces more correlation between the variances σ 2 k . In this paper, we used ζ = 1000, and the value of w 0 has been empirically adjusted using the formula

D. Hyperparameter Priors
As previously explained, the hyperparameters 2 k are closely related to the signal energies via the norm (s T k H −1 s k ).
Considering this property and the continuity of the signal suggest the presence of a correlation between the parameters 2 k . This correlation can be introduced by considering a gamma-MRF prior for ( , v) as follows [30]: where v are auxiliary variables and η > 1 is the coupling parameter. A schematic description of the variable correlations is shown in Fig. 3(b), which is similar to that presented in Section III-C. The conjugate conditional prior distributions for and v are given by The parameters (η, v 0 ) play the same role as (ζ, w 0 ) and have been fixed to η = 1000 and v 0 = w 0 in the rest of the paper.

E. Posterior Distributions
The parameters of interest associated with the proposed Bayesian model are X = (S, σ, w, , v). The joint posterior distribution of this Bayesian model can be computed as follows: where we have assumed a priori independence between the parameters. For simplicity, f (x|θ) has been denoted by f (x) when the parameter θ is a user-fixed parameter. The MMSE and MAP estimators associated with the posterior (11) are not easy to determine. In this paper, and akin to [41], we propose to evaluate the MAP estimator by using an optimization technique maximizing the posterior (11) w.r.t. the parameters of interest. At this point, it should be noted that the proposed algorithm [denoted as smooth signal estimation (SSE)] has many differences from the CD-BM algorithm proposed in [12]. The latter assigns a smooth evolution constraint on the altimetric parameters, while SSE operates on the observed signals by imposing smoothness on the signals and their energies. In addition, the SSE outputs are filtered signals that can be combined with classical altimetric algorithms to achieve classification or other tasks, while CD-BM only provides the estimated altimetric parameters.

IV. COORDINATE DESCENT ALGORITHM
This section describes the optimization algorithm maximizing the posterior (11) w.r.t. the noiseless signals, the noise Update σ (t) according to (20) 10: Update w (t) according to (22)  11: Update (t) according to (21) 12: Update v (t) according to (23) 13: Set conv = 1 if the convergence criteria are satisfied 14: t = t + 1 15: end while variances, and the hyperparameters. This provides the MAP estimator of the parameters of interest X. An equivalent problem is to minimize w.r.t. X, the negative log-posterior C(X) = −log[f (X|Y )] denoted as "cost function" and given by (after removing unnecessary constants) . Because of the large number of parameters in X = (S, σ, w, , v), we propose a CDA [37], [38] that sequentially updates the different parameters. More precisely, in each step, the posterior distribution is maximized w.r.t. one parameter, the others being fixed. This process is repeated until the algorithm has converged to a local minimum of the cost function C (S, σ, w, , v). Thus, the algorithm iteratively updates each parameter by maximizing its conditional distribution, as described in Algorithm 1. The next section describes the suboptimization procedures maximizing the cost function C(X) w.r.t. the noiseless signal S, the noise variance σ, and the hyperparameters (w, , v).
1) Updating the Parameters: The noiseless signal S can be updated by maximizing the conditional distribution associated with each independent s k , which is a Gaussian distribution given by where Therefore, the noiseless signal S can be updated using (14), which is the maximum of the Gaussian distribution (an overline is used to denote the mode associated with the updated parameters). Note that this solution corresponds to a least squares solution of the quadratic problem w.r.t. s k shown in (12). Note also that the matrix inversion in (14) should be computed at each descent step leading to a high computational cost. Thus, the proposed algorithm considers a useful modification to achieve this computation with less operations, as discussed in the Appendix. The conditional distributions of σ and (respectively, w and v) are inverse gamma distributions (respectively, gamma distributions) as follows: The mode of each distribution is uniquely attained and given by These modes are used to update the parameters σ, , w, and v, as shown in Algorithm 1.
2) Convergence and Stopping Criteria: The limit points of the sequence generated by the CDA are stationary point of (12) provided that the minimum of that function w.r.t. X along each coordinate is unique and that the function C is monotonically nonincreasing along each coordinate in the interval from x t i to x t+1 i (see [37,Proposition 2.7.1]). This is easily checked for all the parameters, since they have unimode conditional distributions (Gaussian, gamma, and inverse-gamma distributions), the Gaussian and gamma distributions are log-concave, and the inverse-gamma distribution is decreasing on each side of its maximum. The cost function is not convex; thus, the solution obtained might depend on the initial values that should be chosen carefully. In this paper, the parameters have been initialized as follows: σ (0) = s Algorithm 1 is an iterative algorithm that requires the definition of stopping criteria. In this paper, we have considered two criteria, and the algorithm is stopped if either of them is satisfied. The first criterion compares the new value of the cost function to the previous one and stops the algorithm if the relative error between these two values is smaller than a given threshold, i.e., where |.| denotes the absolute value and ξ is the threshold that has been fixed to ξ = 0.001. The second criterion is based on a maximum number of iterations T max = 100. The next sections study the behavior of the proposed algorithm when considering synthetic and real signals.

V. EXPERIMENTS
This section evaluates the performance of the proposed algorithm with synthetic and real data. It is divided into six parts whose objectives are: a) description of the parametric model used to generate the synthetic altimetric signals, b) introduction of the criteria used for the evaluation of the algorithm quality, c) choice of the number of signals to be processed jointly, d) and e) evaluation on synthetic data of the proposed algorithm when combined with parameter estimation or classification strategies, and f) validation on real Jason-2 altimetric data.

A. Conventional Altimetric Model
The altimetric model, in its simplified version, accounts for three parameters that are the amplitude P u , the epoch τ , and the significant wave height μ. The resulting mathematical nonlinear model for the altimetric signal is known as the "Brown model" and is given by [2], [7] where and where erf(t) = 2 √ π t 0 e −z 2 dz stands for the Gaussian error function, t is the time, τ s = 2τ c (respectively, τ ) is the epoch expressed in seconds (respectively, meters), c is the speed of light, and γ and σ 2 p are two known parameters (depending on the satellite and on the measurement instrument). The nonlinear model described in (28) is commonly used in the altimetric community mainly because of its simplicity [2], [7]. Note that the discrete altimetric signal is gathered in the vector s = (s 1 , . . . , s K ) T , where s k = s(kT ), T is the time resolution, and Θ m = [θ 1 (m), θ 2 (m), θ 3 (m)] = [μ(m), τ(m), P u (m)] is a (1 × 3) vector containing the three altimetric parameters μ, τ, P u for the mth signal.
The altimetric signals are corrupted by speckle noise that, thanks to the averaging that takes place on-board of the satellite, can be approximated by additive Gaussian noise, as shown in [42]- [44]. Thus, the altimetric model satisfies (2). The noise variances σ 2 k , k ∈ {1, . . . , K}, obtained after averaging on-board the satellite, are correlated due to the nature of the speckle noise (this correlation will be considered in the proposed Bayesian scheme). Note that this paper only considers oceanic observations, which generally show a smooth variation between successive signals.

B. Evaluation Criteria
For synthetic signals, the quality of the proposed algorithm can be evaluated by comparing the noiseless signals s m to the denoised signals s m using the reconstruction signal-to-noise ratio (RSNR) given by [45] RSNR = 10 log 10 Note that a high RSNR corresponds to a good denoising result. The benefit of the proposed algorithm can also be measured by using the filtered signals as an input to classical altimetric algorithms, as highlighted in Fig. 2. In this paper, we evaluate the enhancement obtained on the estimated parameters and the signal classification results. The latter is evaluated using the accuracy criterion given by where z m (respectively,ẑ m ) denotes the actual (respectively, estimated) label of the mth signal and δ denotes the Dirac delta function. The effect on the altimetric parameters is evaluated when considering the well-known gradient descent (GD)-based strategy that is commonly used by the altimetric community [7], [11]. For synthetic data, the true altimetric parameters are known, and the estimation quality is evaluated using the rootmean-square error (RMSE) and the standard deviations (STDs) of the estimator θ i as follows: for i ∈ {1, . . . , 3}, where θ i (n) (respectively, θ i (n) ) is the true (respectively, estimated) parameter for the nth signal and N is the number of simulated signals (N = N for synthetic signals). When considering real signals, the performance of the proposed algorithm is qualitatively evaluated by a visual comparison between the noisy signals/parameters and the denoised ones [7], [10], [11]. Quantitatively, a modified parameter STD is computed using (30), in which the averaged parameter value is approached by the mean of the estimated parameters along each N = 20 successive signals. This modified STD is called "STD at 20 Hz" [33], [34], [46], [47].

C. Choice of the Number of Signals to be Processed Jointly
This section studies the behavior of SSE when varying the number of the jointly denoised signals M . The SSE parameters have been empirically fixed to ϑ = 30, ζ = η = 1000 in the rest of the paper. Note, however, that they can be changed according to the recommendations provided in Section III. For this study, N = 5000 signals are generated according to the  Fig. 4). The obtained signals are corrupted by a multiplicative speckle noise generated according to a gamma distribution with shape and scale parameters equal to (L, 1/L), L = 90 being the number of looks. This leads to an SNR = 19.55 dB. The noisy signals are processed by the proposed algorithm using different set lengths, as shown in Table I. For example, for a length set M = 250, the algorithm is run 20 times to process the N = 5000 signals. Overall, these results show an ≈11 dB improvement in the processed data with an increasing RSNR w.r.t. M . By increasing the length M , higher computational cost [mainly due to the computation of the singular value decomposition (SVD) in (32)] is required, whereas small values of M lead to more iterations. The value M = 500 represents a good compromise, and we consider this value for the rest of the paper.

D. Parameter Estimation Results on Synthetic Data
This section compares the proposed SSE algorithm with some state-of-the-art algorithms when considering a parameter estimation task. We consider three filtering algorithms that are the SBM3D 2 algorithm [22], the speckle noise variant of PPB 3 [23], and the SVD filtering strategy that is commonly used by the altimetric community [13], [14] (with a threshold equal to 84%). We also consider the CD-BM algorithm specialized on parameter estimation [12]. Following [8], [46], and [47], the study is carried out using μ (with fixed τ = 14.5 m and P u = 130). For each value of μ ∈ [0.5, 8] m, 500 synthetic signals are generated using the Brown model with different noise realizations (with L = 90 looks) and processed using the considered algorithms. Table II reports the obtained RSNR of the considered algorithms for different values of μ. The best performance is obtained with the CD-BM, since it accounts for more physical information through the considered altimetric model. The PPB and SBM3D provide better results than SSE; however, this improvement is achieved at a price of a higher computational times, as will be seen in the next section (see Table IV). Note   also that the algorithms performance are relatively stable w.r.t. the variation of μ.
As previously explained, the results of the filtering algorithms can be used to improve classical tasks such as parameter estimation and signal classification. In this paragraph, we highlight the filtering benefit on the estimated physical altimetric parameters (μ, τ, P u ). The estimated parameters are compared when applying the CD-BM algorithm [12] and the classical GD estimation algorithm [7], [11] on filtered and noisy signals. Fig. 5 shows the obtained parameter RMSEs w.r.t. μ when considering the CD-BM algorithm, the GD algorithm (without filtering), and the filtered signals denoted by SVD + GD, PPB + GD, SBM3D + GD, and SSE + GD. As expected, the CD-BM provides the best performance, since it is designed to achieve the parameter estimation task. Apart from CD-BM, the proposed strategy presents the best performance with the lowest RMSEs. Indeed, at a typical μ of 2 m, the proposed SSE-GD improves the RM-SEs of the GD algorithm by a factor of 4, 6, and 3 w.r.t. the parameters μ, τ , and P u , respectively. This parameter improvement is also highlighted in Fig. 4, which shows the estimated parameters when considering the settings of the previous section (for clarity purpose, we only show 200 signal parameters). It is clear from this figure that SSE-GD (blue lines) provides smoother results that better approximate the actual parameters (black lines) than the other algorithms. These results highlight the interest of the proposed strategy in denoising the altimetric signals and improving the estimated altimetric parameters.

E. Classification Results on Synthetic Data
This section highlights the benefit of the SSE in improving the classification of altimetric data. As previously, SSE will be compared to SVD, PPB, and SBM3D algorithms. The altimetric classification algorithms are generally not available. Therefore, we have considered a simple support vector machine (SVM) strategy 4 to classify two kind of waveforms after the denoising step (see Fig. 2). The first class includes Brown waveforms (i.e., class 1 from the Pistach project [3]), while the second includes the sum of the Brown signal and a Gaussian peak in the trailing edge of the waveform (i.e., class 13 from the Pistach project), where examples are shown in Fig. 6. The SVM has been trained using 2000 noiseless echoes from each class with random parameters in the intervals μ ∈ [2,4] m, τ = 31 gates, P u ∈ [50,200], a g ∈ [10,200], p g = 60 gates, σ g ∈ [4,6] gates, where (a g , p g , σ g ) denote the amplitude, position, and square root of the Gaussian peak. The classification was achieved on 4000 signals that have been generated using the parameters μ ∈ [2,4] m, τ = 31, P u ∈ [50,200], a g ∈ [10,200], p g = 60, σ g ∈ [4,6] and corrupted by a multiplicative noise with L = 90 looks, as in the previous sections. Table III reports the obtained classification results with and without the application of the studied filtering algorithms. It can be seen that the filtering algorithms improve the classification results, and that SSE performs better than the other strategies.

F. Results on Jason-2 Real Data
This section is devoted to the validation of the proposed SSE denoising algorithm when applied to a parameter estimation task on the oceanic Jason-2 dataset. The data represent a 36-min time series and consist of 45 000 real signals that were extracted from pass 30 of cycle 35. The signals contain K = 104 temporal gates, where each gate corresponds to a distance of 46.8 cm. Fig. 7 shows a sequence of 800 Jason-2 signals before and after SSE filtering. Note first that this sequence shows a reduced variation in the altimetric signals, which justifies the use of the proposed strategy. Moreover, it clearly shows a reduction in the noise affecting the signals after the application of the SSE algorithm especially in the tail of the signal (i.e., the decreasing part), which was most affected by the speckle noise. Fig. 8 shows the parameters estimated w.r.t. time when considering the GD (in red), SVD + GD (in green), PPB + GD (in yellow), SBM3D + GD (in cyan), CD-BM (in black), and SSE + GD (in blue) algorithms. As observed for synthetic data in Section V-D, the CD-BM and SSE + GD provide a smooth parameter evolution, which is physically more consistent, while the other algorithms present high estimation noise. This result is quantitatively confirmed in Table IV, where the smallest STDs are obtained for CD-BM and SSE + GD. This STD reduction has initiated many altimetric missions [5], [6] and is of great importance for many practical applications related to oceanography such as bathymetry. Comparing SSE + GD to GD, Table IV highlights an STD improvement by a factor of 6 for μ, 4 for τ , and 5 for P u . To better appreciate these STD improvements, it should be noted that new delay/Doppler technology is expected to reduce STDs by a factor lower than 2, as reported in [48]- [50]. This table also shows a good agreement between the means of the estimated parameters for the different algorithms (except P u that is slightly reduced by SSE + GD, as shown in Fig. 8). Finally, Table IV also compares the computational costs of the considered algorithms when processing the 45 000 signals (the result is reported for the processing of one signal). As expected, the filtering algorithms require more computational times than GD. However, the proposed SSE algorithm shows a good compromise between the computational cost and the quality of the filtering. These results confirm the good performance of the proposed strategy for the fast denoising of smooth signals such as oceanic altimetric signals.

VI. CONCLUSION
This paper presented a new Bayesian strategy for the estimation of smooth signals corrupted by Gaussian noise. The successive continuous signals can have a numerical expression or be given by a linear/nonlinear function w.r.t. some parameters. A Bayesian model was proposed to take into account the Gaussian properties of the noise and the smooth properties of the signal evolutions. The resulting posterior distribution was maximized using a fast CDA. Experiments on synthetic and real data showed that the proposed algorithm provides good filtering results at a reduced computational cost. An advantage of the proposed algorithm is that it can be combined with the available altimetric algorithms to improve their results. To highlight this, SSE was evaluated by combining it with a commonly used parameter estimation strategy and a classification algorithm. The obtained results showed a clear improvement highlighting the benefit of the proposed algorithm. Future work will include the combination of the proposed algorithm with other classification and target detection altimetric algorithms. It is worth noting that the proposed strategy is fast and generic and thus could be applied when considering other altimetric technologies such as delay/Doppler altimetry [6], [33], [34]. It can also be applied to coastal altimetric signals that have recently been the subject of growing interest [31], [32]. Finally, generalizing the proposed approach for hyperspectral images, that show smooth spectral variations, is also an interesting issue that is currently under investigation.

A. Updating the Noiseless Signal s k
The conditional distribution of s k in (13) has been computed using the following property: with Σ = (Σ −1 1 + Σ −1 2 ) −1 . The (M × M ) matrix inversion in (15) should be computed at each update of the noiseless signals, which requires a high computational cost. To avoid this cost, we divide this matrix inversion into two parts. Significant computations can be moved outside the "while" loop in Algorithm 1, whereas simple vector multiplications are kept inside the loop. To achieve this, the SVD decomposition is first applied to H −1 as follows: where D(x i ) denotes a diagonal matrix with its ith diagonal element equal to x i , r i is the ith singular value of H −1 , and V is a unitary orthogonal matrix, i.e., V V T = I M . Straightforward computations lead to the following expression for the noiseless signal update: Note that the operation V T y k in (33) and the SVD decomposition (32) are only computed once outside the loop, while the remaining vector operations in (33) are achieved inside the loop. Over the past five years, he has authored or coauthored more than 100 peer-reviewed papers. His research interests include nonstationary signal analysis and classification, nonlinear and statistical signal processing, sparse representations, and machine learning. His particular research interests are applications to (wireless) sensor networks, biomedical signal and image processing, hyperspectral imagery, and nonlinear adaptive system identification.
Dr. Honeine is the corecipient (with C. Richard) of the 2009 Best Paper Award at the IEEE Workshop on Machine Learning for Signal Processing.