On an RLS-like LMS adaptive filter

A unified and generalized framework for a recursive least squares (RLS)-like least mean square (LMS) algorithm is proposed, which adopts the cost function of the RLS to minimize the mean square error. This paper aims to explore, in a systematic way, the corresponding ideas scattered and multiple-time re-invented in the literature, and introduces a unified approach in the same spirit as in [1], which relates LMS with the Kalman filter. The proposed alternative to the RLS is favored when the matrix inversion lemma is not useful, such as the case of multivariate or multichannel data where the input is not a vector. Furthermore, all the derivations are conducted in the quaternion domain and are hence generalizable to complex- and real-valued models. The resulting algorithm has a neat form and a similar complexity to the RLS. Through experiments, the method is shown to exhibit performance close to or even better than the RLS algorithm. Other aspects, such as the choice of descent directions and variable stepsizes, are also discussed to support the analysis.


I. INTRODUCTION
The least mean square (LMS) algorithm has long been a workhorse of adaptive filtering and has, since its conception in 1959 [2], found a number of successful applications.Based on the minimization of an instantaneous approximation to the mean square error, the LMS converges to Wiener-Hopf solution w op (i.e. the optimal solution), while its 'instantaneous' nature also makes it an algorithm of choice for non-stationary signal processing.Another popular way to reach the Wiener solution, w op , is by using the method of least squares, through the algorithm called the recursive least squares (RLS) [3].Owing to no approximations in the derivation, the RLS is capable of converging in much fewer iterations than the LMS, and achieve better tracking performance.It is nevertheless not without drawbacks, such as poorer ability to track nonstationary processes or processes with outliers.Moreover, it is less numerically stable and more computationally expensive, both due to the use of matrix inversion lemma [3].To date, there has been no clear consensus on whether to use the LMS or the RLS, and the answer depends on the applications at hand, while much research effort has been dedicated to improving both algorithms.For LMS, most of its variants have focused on varying the step size, in an attempt to accelerate the convergence and reduce steady-state misalignment [4], while for RLS, the main focus has been on effective implementation and numerical robustness [5].
Many methods were proposed in an attempt to create alternative online linear algorithms which inherit the virtues of both LMS and RLS, while mitigating some of their drawbacks.Albeit introduced under seemingly different concepts(e.g.[6], [7], [8]), these RLS-like methods would benefit from being undertaken within one unifying framework.This paper intends to provide a rigorous and systematic derivation of a unifying algorithm in this class, based on such a unified approach, so that it can be more accessible and more easily utilized.The other important contribution here is that the framework was derived in the quaternion domain which offers twofold advantages; first, since quaternions are generalization of real and complex numbers so that the derived algorithms will also hold true for both these number systems; second, the adoption rate of quaternions is increasing in a variety of applications, such as modeling of 3-D rotational data (computer graphics [9] and array signal processing [10], [11], [12]), color image processing [13], [14] and source separation [15].Hence, deriving the proposed framework in the quaternion domain offers more rigor and intuition in terms of physical interpretation, algebraic generality and algorithmic elegance.

A. Quaternions and GHR Calculus
Since there is no shortage of excellent tutorials on basic quaternion algebra [16], [17], we start from the generalized HR (GHR) calculus, the recent concept which enables our proposed method to have an elegant form in the quaternion domain which is consistent with its real-and complex-valued counterparts.For detailed information, we refer to [18], [19].
Consider a general case of functions f (q) : H M ×1 → H, where q = (q 1 , q 2 , ..., q M ) T ∈ H M ×1 .Then, the quaternion gradient and conjugate gradient are respectively given by [19] ∇ q f ∂f ∂q Similar to the complex domain, it was shown that the conjugate gradient ∇ q * f yields the steepest descent direction of the function f [20].This makes the conjugate GHR derivatives a natural choice for the optimization of our proposed algorithm.
Within the GHR calculus, the conjugate derivative is defined based on a generalized orthogonal basis {1, i µ , j µ , k µ } where µ is a rotating quaternion.In our work, a simplified case where µ = 1 suffices and yields a neater form of the HR conjugate derivative [20].The left and right derivative operators, which arise out of the non-commutative nature of quaternions, are respectively defined as follows where q = q a + iq b + jq c + kq d , q a , q b , q c , q d ∈ R and i 2 = j 2 = k 2 = −1.Observe that when f is real-valued, both left and right derivatives are identical, which is the case with our analysis, and from now on, we will therefore implicitly refer to the left derivative.
The key benefits of the GHR calculus are the novel product rule and chain rule (not present in the HR calculus).These two rules are integral in the derivation of the proposed framework.For the product rule, if the function f, g : H → H are realdifferentiable, then so too is their product, that is, where q g * = gq * g −1 is a quaternion rotation [18].For the chain rule, if g : S → H and f : T → H are real-differentiable at the respective interior points, q ∈ S ⊆ H and g(q) ∈ T ⊆ H, then the derivative of the composite function f (g(q)) is given by ∂f (g)

B. Adaptive Filters
For completeness, we next briefly summarize the LMS and RLS algorithms in their most general form -the widely linear quaternion model (for general non-circular data [21]).
Denoted by y n , x n ∈ H , n = 1, ..., N the desired (output) signal and input signal, respectively, the linear MSE estimator of y(n), denoted by ŷn , can then be expressed as [22] where ŵ is an estimate of the optimal solution w op , q n is defined as and T for a filter of order M .For a simple linear model (i.e.data is circular [21]), The goal of an online linear predictor is to minimize the mean square error (MSE) recursively, with the weight update expressed as [18] w In the case of the LMS algorithm, eq.(6a) becomes an instantaneous estimate, that is, which as a result gives the following update [23] w where e n is a priori error, defined as and α n ∈ R is an adaptive stepsize.For the normalized LMS, α n can be found via the minimum disturbance principle [3] as where > 0 is a regularizer which prevents numerical instability when q n 2 → 0.
On the other hand, the RLS filter iteratively solves the least square solution, ŵ, of the given set of training pairs y n , x n via the following deterministic cost function [25], where 0 < λ < 1 is a forgetting factor used to suppress the effect of early data which may be no longer relevant to the current estimate.The core idea of the proposed method is to derive an LMS algorithm which approximates MSE in a manner similar to eq. ( 12).We did not describe the whole RLS algorithm owing to page limits.

A. Derivation in Quaternions
Inspired by the RLS cost function in eq. ( 12), we reformulate the estimation of the MSE in eq.(6a) to Here, the factor 2 is present to normalize the expressions involving GHR derivatives [19] (which actually can be absorbed into the stepsize α n ).Then, we substitute eqs.( 3) and (6b) into eq.( 13) to yield where R{•} is a real-part operator with We now follow the optimization path of LMS with the MSE estimate as in eq. ( 12).Through eqs.( 1) and ( 2), the a priori gradient g n|n−1 is defined as Then, the expression of the quaternion weight update becomes where d n is the descent direction, while the stepsize α n ∈ H, unlike eq. ( 7), is non-commutative and has to post-multiply d n so that it will not lead to Sylvester's equation which has no closed-form solution [26].One convenient way to find α n is by using exact line search (equivalent to method of minimum disturbance in the NLMS case [27]), where ∂J n (w n )/∂α n is set to zero.Upon substituting into eq.( 7), we obtain where Observe that d n , h n|n is real-valued.Now, to reuse past calculations efficiently, a posteriori gradient g n|n is defined as and after some manipulation, we arrive at the following dual recursion

B. Choice of Descent Directions
Most works concerning the RLS-like LMS algorithm use the conjugate gradient (CG) direction because it is the fastest first-order solver of a quadratic-form equation; however, due to the time-varying statistics of R n , most proposed CG techniques fail to sustain the conjugacy property (see [6] and reference therein).To overcome this issue, some methods were introduced to relax the conjugacy constraint, like control Lyapunov functions [7] and a CG-like technique [8].All these methods can be considered under one unifying umbrella termed Markov conjugacy.Definition 1.A set of descent directions {d 1 , d 2 , ..., d n } is Markov conjugate if, at any iteration i, In the follow-up paper, we show this idea can easily generalize into a more sophisticated technique.With only one degree of freedom in eq. ( 24), the Markov conjugate direction is expressed as Pre-multiplying with (d n−1 ) H R n and substituting eq. ( 24) into eq.( 25) yields where and after some manipulation, we arrive at the another dual recursion An interesting simplification is the steepest descent scheme where β n = 0, ∀n so that d n in eq. ( 25) becomes and it can be shown that α n will become real-valued, i.e., This is very beneficial in that all the techniques of variable stepsize proposed in the real domain [28] are directly applicable in quaternions, i.e.
where n ∈ R is an adaptive regularizer [29].The unified routine of the RLS-like LMS algorithm, which we shall refer to as the m-NLMS i.e. the NLMS with memory, is summarized in Algorithm 1, where δ > 0 is used as a stopping criterion.Update q n by x n according to eqs. ( 4) and ( 5);

IV. NUMERICAL EXPERIMENTS
A set of numerical case studies was performed to benchmark the validity of the m-NLMS algorithm against the original LMS and RLS algorithms.
For rigor, the simulations were conducted within a widely linear quaternion framework which has found its potential in MIMO channel identification [30].Here, a widely linear quaternion moving average of order 3 -WLQMA(3) was used, where the input x n was drawn from 1000 samples of the same distribution N (0, 1) for each component of x n , with a moderate SNR of 40dB.Every experiment was conducted over 100 independent realizations and the m-NLMS used in the experiments, if not specified otherwise, was the steepest descent (SD) scheme.For a fair comparison, the stepsizes α n for the original LMS routines was calculated using the minimum disturbance principle as in eq. ( 11) in which the original LMS yields NLMS.For convenience, we shall omit the term 'quaternion', as our primary attention is paid to the algorithmic mechanism.The metric used to benchmark the algorithms was the normalized misalignment, η n , defined as In the first experiment, we tested how our new algorithm performed against the NLMS and RLS with different parameter settings but with a fixed n = 0.01 just to ensure numerical stability.As seen in Fig. 1, the m-NLMS converged faster and achieved lower steady-state misalignment than the standard NLMS.At λ = 0.99, the m-NLMS also converged faster than RLS while at λ = 0.95, the m-NLMS converged slightly slower but achieved better misalignment than RLS.Fig. 2 demonstrates the performance of the Markov-conjugate (MC) m-NLMS.At λ = 0.99, the MC scheme converged slightly faster than the SD scheme and achieved the same misalignment as the RLS while at λ = 0.95, it converged even faster than the RLS but achieved worse misalignment than the SD scheme.In other words, the m-NLMS, regardless of its descent schemes, behaves on par with the RLS.Fig. 3 shows the results of the second experiment which examined the variable-stepsize version of the m-NLMS.Three fixed-stepsize routines (RLS and m-NLMS) were benchmarked against three variable-stepsize ones (NLLS and m-NLMS with GSER scheme [28]).By inspection of m-NLMS at λ = 0.99 and 0.95 as well as comparison of NLMS between Fig. 3 and Fig. 1, it is clear that making n variable reduces misalignment but also convergence rate.A closer inspection of m-NLMS between λ = 0.99 and 0.95 reveals that the reduction in Fig. 4: Misalignment of RLS and m-NLMS algorithms with different value of with λ and variable stepsizes both with and w/o GNGD scheme, averaged over 100 trials, for the identification of a WLQMA(3) process, at an SNR of 40dB misalignment for the case λ = 0.95 is more significant than that for λ = 0.99.It is also feasible to make the variable stepsize more effective by tracking whether the learning process is in the transient or steady state.A technique like GNGD [29] could be incorporated to render the algorithm achieving even lower misalignment, without trading off convergence speed, as illustrated by a dashed line in Fig. 4. We did not provide further details on GNGD m-NLMS and experiments on realworld signals, like 3-D wind, which will be covered in the follow-up paper and supported with more analysis.Intuitively however, it would behave as consistently as in this paper.

V. CONCLUSION
In an effort to explore the space of algorithms spanned by the LMS and RLS, we have built on the work in [1], based on the Kalman filter, to provide a unified and generalized approach to a class of RLS-inspired LMS adaptive filters.This has resulted in the proposed m-NLMS framework which has been derived in the quaternion domain so that the resulting algorithm, which has the complexity comparable to the RLS, can be straightforwardly simplified to complexand real-valued cases.Two options of descent direction have also been derived rigorously, under the unifying notion of Markov conjugacy proposed in this paper.Since the m-NLMS is technically a variant of LMS, it is more numerically stable than RLS, as no matrix inversion is involved in the update.The experiments have demonstrated that the m-NLMS has been able to achieve performance which is more or less identical to the RLS, regarding both convergence rate and misalignment.Sharing the virtues of both LMS and RLS, the m-NLMS has been shown to admit further improvement through a variable stepsize.The experiments in the system identification have supported the analysis.