Ocean Data Assimilation using Sequential Methods based on the Kalman Filter

The main purpose of this chapter is to review the fundamentals of the Kalman Filter for ocean data assimilation and to expose the basic ingredients of practical assimilation algorithms developed for applied ocean research and operational forecasting, focusing mainly on high-resolution applications. Important implementation issues such as the reduction in dimensionality of the estimation problem, the simpliﬁcation of the schemes based on static error covariances, the formulation of low-rank ﬁlters, the problem of consistency veriﬁcation, and the concepts of adaptivity and incremental analysis updating will be addressed using scien-tiﬁc and operational examples. Finally, the discussion will conclude with a number of key questions related to the assimilation challenges of the next decade.


Introduction
Operational ocean prediction systems are being developed with a variety of objectives in mind, such as ocean current hindcasting and shortrange forecasting, estimation of the thermodynamic state of the ocean for seasonal and climate predictions, and production of retrospective analyses of the changing ocean through the merging of models and data.In addition to simply combining model estimates with observations, data assimilation provides a means to systematically compare theoretical : models with reality, leading to potential improvements in modelling and observing systems.It is therefore likely that applied ocean research will benefit strongly from operational progress, and vice versa.The question concerning the dominant energetic activity of the mesoscale ocean, its non-deterministic nature and the interactions with the large-scale circulation make the challenge unique, requiring sophisticated numerical models and assimilation methods that make the best use of sparse observations.To produce reliable forecasts, the models must be initialized with conditions that represent as accurately as possible the actual state of the ocean at eddy-resolving resolution.Due to the chaotic properties of ocean dynamics, the forecast range cannot be extended beyond the limit of predictability of the system and the model has to be reinitialized intermittently by correcting the forecast with the most recent observations.Fortunately, the arrival of satellite observations, in particular satellite altimetry, has provided the observational basis needed to respond appropriately to the "high-resolution challenge".In order to extract the best possible information from the new data, it is necessary to assess how reliable the model forecast and the observations are.Therefore, error estimates on the measurements and the model prediction are inherent in the assimilation process.
Data assimilation is traditionally formulated as a least-squares estimation problem.Among the various methodological approaches, the theory of optimal statistical estimation, and more specifically the Kalman filtering approach, is well suited to provide a solution to the Best Linear Unbiased Estimation.Since Kalman in 1960, sequential filtering methods have been thoroughly explored and applied to state estimation.An extended version of the Kalman Filter (KF) has been derived for nonlinear models, known as the Extended Kalman Filter (EKF) [Jazwinski, 1970;Gelb, 1974].In spite of a fairly simple theoretical framework, the question of its applicability in assimilating observations into highresolution, non-linear numerical models of the ocean circulation is far from trivial.As stated by Courtier [1997], the scientific di culty associated with data assimilation is in finding algorithms which simplify the search for an a ordable solution in terms of computer resources, while preserving some of the essential characteristics.A hierarchy of approximations to the Kalman filter has been defined to make the methodology suitable for solving large-dimension problems.These developments represent a substantial part of the research e ort devoted to oceanic and atmospheric data assimilation over the past 10 years.
Compared with variational approaches such as the 4D-VAR, statistical algorithms require less initial investment in terms of coding and are naturally designed to incorporate gradual developments.This is one of several reasons why most assimilation methods used today in operational forecasting systems are inspired by the statistical approach.In the context of GODAE (Global Ocean Data Assimilation Experiment), one objective will be to test and compare the bunch of algorithms implemented in operational systems, in order to better understand the importance of the various possible approximations made in each of these.
The objectives of this chapter are to review the fundamentals of sequential data assimilation for ocean state estimation and to expose the basic ingredients of practical assimilation algorithms developed for applied ocean research and operational systems, focusing mainly on highresolution applications.Section 2 is dedicated to the fundamentals of applied estimation methods leading to the KF equations.Numerical representations of the mathematical objects introduced by the theory will then be illustrated using oceanographic examples in Section 3. Section 4 will provide a brief description of traditional simplifications of the KF.In Section 5, we discuss various approaches to reduce the size of the estimation problem and, in Section 6, we derive the framework of low-rank Kalman filters.The important question of the verification of consistency will be addressed in Section 7, where the concept of adaptivity is also mentioned.Finally, a number of advanced implementation issues such as the transition to incremental/smoothing algorithms will be discussed in Section 8, before concluding the chapter.

Problem definition
In this section, we introduce the basic assimilation problem in the state space using the conventional notations proposed by Ide et al. [1997].The goal here is not to present a rigorous and comprehensive derivation of the Kalman filter, which can be found elsewhere in dedicated text books (e.g., Gelb [1974]), but rather to introduce a simplified framework that still contains the essential characteristics needed to illustrate the more advanced concepts and implementation issues discussed in the following sections.
To start, let us assume that some a priori knowledge about the state of the ocean is available at time t i , represented by vector x a i .A physical model is also available to describe the transition of the state vector from time t i to time t i+1 , which is represented by a numerical (matrix) operator M(t i , t i+1 ).A linear model will be considered at this stage.The dimension of the state space is noted as n.The state vector x contains the minimum set of independent variables needed to perfectly characterize the state of the system at any time.The model can be used to simulate the transition of the state vector up to time t i+1 , where x f i+1 is the vector describing the "forecast" state of the system.At instant t i+1 , another piece of useful information is available about the state, collected in the observation vector y i+1 of dimension p. From the two independent pieces of information x f i+1 and y i+1 , how can the true state of the system x t i+1 be best estimated at time t i+1 ?To answer this question, we need to know more about the precision of the di erent pieces of information.

Uncertainties and PDFs
The precision of the forecast x f i+1 can be quantified in terms of errors on the initial guess and numerical model errors.The di erence between the initial guess x a i and the true state at time t i is the error vector noted Of course, its value is unknown but we can make a number of assumptions about its statistical properties: we will assume that the estimation x a i is unbiased ( a i = 0, where the overbar represents the expected value), and its error a i is distributed as a gaussian, multivariate random variable.The corresponding probability density function (pdf) is a i N (0, P a i ) exp 1 2 (2) where P a i = a i a T i is the n × n error covariance matrix associated with x a i and T denotes the transpose.Error covariances are formally obtained by multiplying an error vector by its transpose and averaging over many realizations, leading to symmetric and positive definite matrices.Morrison [1988] contains excellent background information on multivariate statistical methods.Similarly, the model operator M(t i , t i+1 ) is imperfect and the simulation error is noted as follows: Again, the individual realization of this error is unknown (otherwise a perfect model operator could be run) but its statistical distribution is assumed to be gaussian and centered ( = 0): where Q = T is the n × n model error covariance matrix.We assume in addition that a i and are uncorrelated: a i T = 0.In general, these statistical assumptions are quite crude approximations of the actual error distributions (in particular, a bias in the model is very common), but they are very convenient in deriving a baseline of optimal estimation.A schematic description of the error diagram in the state space is illustrated in figure 1.With the definitions introduced above, the forecast error f i+1 can be broken down as : and the statistical properties of the forecast error can be determined easily if the model is linear.Indeed, Eqs. ( 5), ( 2) and (4) implies that the forecast state is unbiased ( f i+1 = M a i + = 0) and normally distributed, i.e.
with the forecast error covariance matrix given by: This equation is the first fundamental equation of the KF which can be interpreted as follows: the error on the initial state is transformed during the forecast step by the model dynamics (the error being amplified by unstable modes, while it is damped by stable modes) and by the model imperfections which increase the forecast error covariance.Understanding the actual benefits and the practical limitations of error covariance propagation through model dynamics has been a major research issue over recent years.Although simple in its algebraic form, this equation contains several major di culties that prevent an explicit computation in the context of realistic models, as discussed in the following sections.
In order to optimally combine the forecast with new data, the precision of the observations arising at time t i+1 must be quantified.The vector of observations is related to the true state as follows: where H is the observation operator which computes the equivalent of the observations from the model state.The observational errors o i+1 are assumed to be centered ( o i+1 = 0), uncorrelated with the forecast error and having a covariance matrix R = o i+1 o T i+1 .Observation errors measure the misfit between the data and the equivalent of the observations in the true state, i.e.Hx t i+1 .They include not only the errors of the observational system but also the errors associated with the operator H arising, for example, from the numerical interpolation of the data.Again, a gaussian pdf can be assumed for the statistical distribution of the observation errors to make the rest of the development easier and derive the KF equations.

Optimal analysis
The pdf given by ( 6) determines the a priori statistical distribution of the true state P (x t i+1 ), while (9) provides the probability of getting measurements y i+1 given the true state, i.e.P (y i+1 | x t i+1 ).It is then straightforward to deduce the a posteriori probability of the truth, given the observations, by using the Bayes formula: The state maximizing the posterior probability distribution is the maximum likelihood solution of this inverse problem.A comprehensive formulation of data assimilation and inverse methods using the bayesian approach has been proposed by van Leeuwen and Evensen [1996].In Eq. ( 10), the denominator is just a scaling factor (the integral of the numerator for all possible states) which can be ignored in determining the maximum of the posterior pdf.The gaussian distributions ( 6) and ( 9) imply that (11) and the optimal estimation of x t i+1 is the state vector maximizing (11) or, equivalently, minimizing As a result of error definitions, the optimal combination of the forecast and observed information corresponds to the minimum of the cost function This quadratic form contains two terms measuring the misfit with the data and the misfit with the forecast, weighted by the inverse of their respective error covariances.Using the calculus of variations, an implicit equation for the optimal state noted x a i+1 is obtained: which can be solved explicitly after some algebra, yielding The optimal state is obtained by correcting the forecast with a weighted measure of the misfit between the observations and the prior estimate (i.e. the innovation vector d i+1 = y i+1 Hx f i+1 ).The analysis is thus simply the result of the combination of two Gaussian probability density functions.The weight matrix of dimensions n × p is the so-called Kalman gain, which involves the forecast and observation error covariance matrices.It can be interpreted as a ratio between the error variance of the forecast and the total error variance (the sum of the forecast and the observation error variance) projected in the observation space: the larger the forecast errors, the larger the correction to the forecast.In the limit of perfect observations (R 0) of the whole state vector (H I), the Kalman gain matrix converges towards the identity and the optimal estimate becomes a perfect fit of the observations.By contrast, if the forecast is extremely accurate (P f 0) compared with the observations, the correction is negligible.Interesting similarities between this equation and scalar formulations of least squares problems can be found in Kalnay [2003].Equation ( 15) is the second fundamental equation of the Kalman filter.

The sequential assimilation cycle
The optimal state estimate (15) at time t i+1 can be used as the initial conditions for a new forecast up to time t i+2 when new observations become available, and the process can be repeated recursively.To sum up, the algorithm of an assimilation cycle contains two main steps: the forecast step for transitioning the model state and the associated error covariance between time t i and time t i+1 , and the analysis step for correcting the forecast using the data available at time t i+1 .We reproduce here the complete set of the KF equations extended to non-linear models M and observation operators H.
Starting from initial conditions x a i and P a i , the forecast step equations are: and where M is the tangent linear operator derived from M(t i , t i+1 ).Thus, a linearization of the model about the non-linear evolution between t i and t i+1 is performed to propagate the error covariance.The forecast step is followed by an analysis step in which y i+1 is used to correct x f i+1 : using the Kalman gain where H is the gradient of H computed about x f i+1 .It can be demonstrated that the matrix K i+1 corresponds to the minimization of the trace of the analysis error covariance on x a i+1 [Miller 1989], given by This allows us to write the gain also as provided that R is invertible.Equation ( 21) shows that the uncertainty in the forecast is reduced during the analysis according to the amount of additional information assimilated in the system.A sequential assimilation run is then conducted by repeating this forecast/analysis cycle in sequence.Since only data from the past influence the best estimate at a given time, the assimilation procedure belongs to a class of filtering methods.These contrast with smoothing methods (e.g.Fukumori [2001]) in which data from both the past and the future are used to estimate the optimal state of the system at a given time.The analysis error covariance reflects the competition in the Kalman filter between this accumulation of past information and the error growth due to instability mechanisms and model imperfections.Figure 2 conceptually illustrates the filtering process in sequential data assimilation.

3.
From theory to real ocean applications The Kalman Filter has been primarily developed in the context of ballistic applications, involving dynamical models of fairly low dimension.The more recent interest in the KF in Earth sciences (numerical weather prediction or oceanography) has raised new issues related to the huge number of degrees of freedom taken into account by the models, with consequences for the size of the discretized operators and the quantity of information to be manipulated.It is therefore important to examine at this stage the practical representation of the mathematical objects introduced in the previous section, with examples taken from high-resolution circulation systems developed from an operational perspective.

3.1
The state vector and model operator The state vector x is a discrete representation of the variables involved in the description of the system state.It characterizes information about the space variability of the physical or biological quantities, and about the multivariate relationships between the di erent dynamical variables.In a numerical model of the Primitive Equations (PE), x typically contains the 3D discretized temperature, salinity, zonal and meridional velocities, and the 2D sea-surface height or barotropic streamfunction field computed prognostically at every gridpoint of a finite-di erence mesh.The typical size of a PE state vector n is approximately given by N X × N Y × (N Z × 4 variables + 1 variable), where N X , N Y and N Z are the horizontal and vertical grid dimensions.The length n of the state vector is typically 10 6 to 10 8 in scientific or operational applications.A biological state vector would contain, for instance, the concentration distributions of nutrients, plankton, dissolved and particulate matter, etc. [Carmillet et al., 2001].
Let us consider, for example, the 1/12 model configuration of the North Atlantic ocean that has been developed using the HYbrid Coordinate Ocean Model (HYCOM) with a horizontal grid size of approximately 1400 x 1400, and 26 layers in the vertical direction.Figure 3 shows a snapshot extracted from a simulation, illustrating the space variability described by the multivariate HYCOM state vector.
Note that the vertical hybrid coordinate used in HYCOM is a combination of geometric and dynamic vertical coordinates that evolves dynamically with the state of the system itself [Bleck, 2002].Due to the occurrence of outcropping layers at the base of the mixed layer, the number of discrete variables may be di erent from one timestep to another, and the dimension of the state vector is therefore dynamicallydependent.This feature slightly complicates the handling of the state vector [Brankart et al., 2003;Birol et al., 2004], compared with more conventional models based on static vertical coordinates such as OPA.
The size problems arise mainly from equations ( 18), (20) and (21) which involve the manipulation of n × n matrices.With a state dimension of 10 8 , the storage of a full n × n matrix would require a memory of 10 5 gigabytes, representing about 1000 times the capacity of the largest computers available today.For this reason, it is necessary to by-pass the explicit representation of such matrices in the algorithms.For instance, the model M(t i , t j ) will never be available as an explicit operator (even for linear dynamics) but the numerical code of the ocean model will be used as a routine to compute the time evolution of the model state and forecast error components, as required by ( 17) and (18).

3.2
The observation vector The dimension p of the observation vector y depends on the capacity of the observation system and the frequency of data assimilation, but in general it is much smaller than the dimension of the state vector.The data sets available to control ocean circulation models in scientific or operational exercises can be categorized into measurements from space, which primarily reflect the surface signature of the ocean circulation and ocean-atmosphere interactions, and in situ measurements devoted to the monitoring of the ocean's interior.
Oceanic quantities measured from space include essentially Sea-Surface Temperature (SST), Sea-Level Anomalies (SLA) and ocean colour (which can be used to estimate the chlorophyll concentration in the upper ocean).Note that other important satellite data types will become available in the near future, such as surface salinity measurements from the SMOS (Soil Moisture and Ocean Salinity) mission and sea-ice observations from CRYOSAT.Unlike conventional measurements from field campaigns, satellite-based instruments are operated in routine over long periods of time and provide considerable information about the ocean's horizontal and temporal variability.Figure 4 illustrates a typical SST picture obtained from composite AVHRR (Advanced Very High Resolution Radiometer) images, and a network of altimetric ground tracks covered by two satellites flying simultaneously over a 7-day period.Along the altimeter tracks, the measurements are available every 7 km and the separation between two tracks of Topex-Poseidon is about 300 km at the Equator.These data can be assimilated "along track" as shown in figure 4, providing in principle the best possible utilization of the observed information if the assimilation scheme is optimally tuned.Otherwise, the raw measurements can be transformed into gridded products using sub-optimal interpolation methods before assimilation.
The surface nature of satellite data poses specific challenges with regard to data assimilation because the surface information has to be projected downward to reconstruct the 3D-content of the ocean signal.As illustrated in the next section, the extrapolation process is achieved through the use of 3D, multivariate error covariances in the assimilation scheme.These surface data remain insu cient, however, for describing aspects of sub-surface variability, and other data available in the form of vertical temperature and salinity profiles from hydrographic casts, moorings or expandable bathythermographs (XBT) and profiling floats from the ARGO international program provide valuable information about the vertical stratification.It is therefore essential to assimilate both data types in a consistent manner.
Two additional operators related to the data must be introduced: the observation error covariance matrix R and the observation operator H (or a non-linear H).The observation operator converts the forecast state into "first guesses" of the observations.This conversion is needed because the observed variables may not be located on the model grid points so that horizontal or vertical interpolations are necessary.In addition, relationships of varying complexity may exist between the observed quantity and the model variables.One example is the ocean colour, which is related to the phytoplankton concentration through a fairly complex, and sometimes approximate, relationship.Another example is the relationship between sea-level (measured by altimetry) and the prognostic variables of a rigid-lid ocean model [Pinardi et al., 1995].In order to avoid too complex observation operators in the algorithms, it is sometimes advisable to augment the state vector with diagnostic variables (such as the surface pressure of a rigid-lid model which can be diagnosed as a function of the prognostic variables) that can be linked to the observed variables more easily.The observation error accounted for by R has di erent possible sources.One source is the instrumental error which can often be considered as spatially uncorrelated.Another is the so-called "representativeness error", associated with variability described by the data at scales that cannot be faithfully represented by the model grid.A third source is the error associated with the mapping H between the model and the observation space.The spectrum of these errors is mainly concentrated on the short scales, and it is often a reasonable approximation to represent R by a diagonal matrix of error variances.Note, however, that the reduction in the dimensionality of the estimation problem introduced in Section 5 will possibly be a source of correlated observation error.

Error covariance matrices
As mentioned above, the assimilation sequence must be initialized with some initial guess for the state x 0 and the associated error covariance P 0 .The initialization of P critically determines the functioning of the filter beyond the first assimilation stages through the process of error dynamics [Ballabrera et al., 2001].In order to understand the role played by error covariance matrices, let us consider the idealized case of an analysis step with one single observation, 0 .The observed variable, noted , is assumed to be one of the discrete elements of the state vector for simplicity, so that y is a scalar ( p = 1) and H is a single-row vector of the form H = [0, ..., 0, 1, 0, ..., 0] The Kalman gain ( 20) is then a single-column vector which simplifies as where 2 is the error variance of the single observation, p is the error variance of the observed variable (the -diagonal element of P 0 ) and {P 0 } is the -column of the error covariance matrix.The correction to the initial guess is then proportional to {P 0 } weighted by the prior model-data misfit: Thus, any row or column of the error covariance matrix can be interpreted as a multivariate influence function associated with the observed state variable.This explains the crucial role of the error structures specified in P 0 and shows the importance of considering dynamically-balanced error covariance matrices.
The last operator to be prescribed is the model error covariance matrix Q.This includes all the errors associated with the various physical parameterizations necessary in the model (mixing, di usion, turbulent closure, hydrostatic approximation, etc.), the errors in the atmospheric forcings and more generally in the boundary conditions, and the errors due to the numerical discretization on the horizontal and vertical dimensions.Note that Q represents the model error statistics accumulated during an assimilation interval, and should not be confused with errors generated at every time step.Those errors are clearly distributed over a wide spectrum of space scales, and it would be extremely di cult to prescribe a full Q matrix.By considering the prior misfit between model simulations and observations, it is possible to derive some general properties of the model errors as manifested in the observation space.These di culties promote the adoption of simplified parameterizations for Q following, for instance, the approach proposed by Dee [1991], and to adjust those parameterizations by sensitivity experiments.Note that systematic biases in the models often make the statistical assumption (4) inappropriate: such biases can be detected by examining the statistics of the innovation sequence, and in principle they should be removed from the forecast to preserve the essential properties of optimal analysis.

Simplified schemes based on static background errors
In the KF algorithm, the forecast error covariance is updated before every analysis step by using the model dynamics.Equation ( 18) can be rewritten as showing that the model code has to be used to propagate the n columns of P a i (in addition to propagate the model state itself).Given the huge size of realistic ocean models, this leads to computing requirements that even the largest computers in the world will not meet in a foreseeable future.In addition, the many imperfections inherent in the representation of the error covariance matrices P a i and Q make the explicit computation of this equation questionable.
If the flow of assimilated observations is fairly regular over time, it makes sense to assume that the errors in an assimilation sequence tend to fluctuate around some average level after a couple of cycles.This asymptotic behaviour reflects a balance between the increase of uncertainty during the forecast step and the error reduction during the analysis step.The existence of such an asymptotic limit for P f i+1 provides justification for simply using a static error covariance, noted B for "background error" (a term usually adopted when the error statistics are not propagated from one assimilation cycle to the next), instead of explicitly computing the forecast error according to (18).Di erent techniques have been developed to prescribe the background error covariances in ocean and atmospheric assimilation schemes.These are discussed below.

Optimal Interpolation
Optimal Interpolation (OI) designates a wide range of statistical assimilation schemes in which the B matrix is pre-determined empirically.The main advantages of OI methods are their cost, their ease of use and the possibility of conducting sensitivity studies to test many di erent models of error covariances.Today, the vast majority of operational systems involved in GODAE (e.g., FOAM, MFS, HYCOM, MERCATOR) are based on OI schemes.In contradiction with its name however, OI is a sub-optimal assimilation process and it would actually be more correct to designate this class of methods as "statistical interpolation" [Daley, 1991].
Any covariance matrix can be normalized into a correlation matrix C, dividing each component by the product of the error standard deviations.Thus, B can still be written as where D is the diagonal matrix of the variances.OI schemes commonly adopt further simplifications for modelling the background error covariances.Assuming that the correlations are functions of the distance only, analytical functions with open parameters such as correlation scales have been used for a long time to build C (e.g.Thiebaux [1985]).A significant property of well-posed correlation functions is their compact support which reflects the negligible influence of observations at large distances.
It is also generally assumed that the correlations can be separated into a product of horizontal and vertical correlations.The concept of separability is related to the predominant role of stratification and the very di erent scales involved horizontally and vertically in the open ocean.However, di erent behaviours are expected near boundaries and coastal regions [Echevin et al., 2000].The SOFA (System for Ocean Forecasting and Analysis) scheme used in the first operational MERCATOR prototype performs OI in the horizontal direction to combine altimeter data with the model predictions [De Mey and Benkiran, 2002].It o ers the possibility of using di erent error covariance models, such as : where the parameter a determines the horizontal correlation scale [Gavart et al., 1999].Fine tuning of these correlation parameters can be achieved objectively via sensitivity studies or Monte-Carlo experiments [Auclair et al., 2003].In the MERCATOR system, the value of a is defined geographically, reflecting longer horizontal correlation scales in the tropics than at high latitudes.
It is worth noting that some applications of ( 27) allow for the error variances to be updated during the assimilation sequence using empirical prediction schemes, while still keeping the correlation matrix unchanged [Rienecker and Miller, 1991].

Asymptotic approximation
Other interesting approaches such as the time-asymptotic filter approximation have been proposed to determine a background error covariance matrix that is consistent with the error dynamics of the KF [Fu et al., 1993;Fukumori et al., 1993;Fukumori and Malanotte-Rizzoli, 1995].In equations ( 18) , ( 20) and ( 21) above, operators M, H, Q, and R have been assumed time-invariant for simplicity.Satellite altimeters, for instance, can be considered as a time-invariant observation system given that the mission tracks are repeated in an exact manner.By combining ( 18) and ( 21), the so-called Riccati equation is obtained: which can be iterated prior to the assimilation sequence.It can be shown that the solution of the Riccati equation converges towards a steady-state solution, say B f , provided the M, H and Q matrices have desirable properties.The details of these conditions are explained in Fukumori et al. [1993].In addition, a fast convergence can be obtained by using appropriate recursion methods.The B f solution results from a balance between three e ects: evolution due to model dynamics, increase due to model errors and decrease due to the new information from assimilated data.Interestingly, the utility of a stationary B f matrix in the Kalman gain was demonstrated even in the presence of evolving properties of the observation system [Fukumori, 1995].
The interest of steady-state filters has been illustrated in a number of oceanographic case studies, although additional approximations such as order reduction and model simplifications are generally needed to make the assimilation e ective with realistic models.

5.
Reduced-order Kalman filters The previous section has shown that OI-based assimilation schemes over-simplify the propagation of errors by neglecting dynamical principles and statistical information.An alternative way to make the KF tractable with large-size models has been explored with the concept of reduced order, which aims at decreasing the computational burden of the algorithm while preserving the essential characteristics of error dynamics.Experiences from atmospheric re-analyses indicate the importance of the "errors of the day" which can be dominated by short time scales, and which are ignored when the forecast error is approximated by a constant as with OI [Kalnay et al., 1997].Similar behaviour is observed in the ocean, especially at scales dominated by instability mechanisms such as the scale of eddies, western boundary currents, etc [Ballabrera et al., 2001].
Several arguments support the concept of order reduction.Firstly, the ocean can be considered as a driven/dissipative dynamical system governed by an "attractor" of finite dimension.The existence of a global attractor has been proved for the Navier-Stokes equations, with a dimension bounded by a function of the Reynolds number [Lions et al., 1997].Geostrophy, for instance, is one of the dominant properties of the ocean attractor which cannot be ignored during the assimilation process.Therefore, it makes sense to renounce making corrections in the directions where the errors eventually die away because of the attractive nature of the system.Secondly, the state space as defined in Section 3 is determined by the number of degrees of freedom implied by the model discretization, which is much larger than the actual number of dynamical features of interest in the system.Thirdly, less is probably known about the errors than about the dynamics, and the lack of statistical information will make a full KF superfluous anyway [Cane et al., 1996], especially in the ocean where the flow of observations accumulates at a fairly slow rate.
Reducing the dimensionality of the problem can be formulated in terms of state space or error space with quite di erent implications.A variety of approaches widely used in oceanography are discussed below.

5.1
Reducing the dimensionality of the state space The reduced state can be defined explicitly using a transformation operator T to convert the full model state x (of dimension n) into a reduced state w of dimension r < n : The statistical properties of x such as error covariance matrices can be easily transformed in the low-dimension space using (30).Dynamical equations can be derived for w using a pseudo-inverse of T and the KF can be entirely reformulated in the low-dimension space, with the condition that the null space associated with (30) be dynamically uncoupled from the reduced space [Fukumori and Malanotte-Rizzoli, 1995].By introducing this transformation, the r elements of w become the actual degrees of freedom of the estimation problem.Many possibilities exist for defining the transformation, such as a truncation of the model spectrum or a selection of multivariate modes of system variability [Cane et al., 1996].Another approach explored by Dee [1991] takes advantage of physical relationships between certain model variables.The reduction of the state space dimension can also be achieved simply by building the estimation vector from a selection of model state variables on a coarser grid [Fukumori, 1995] or with dynamical variables closely correlated with the observed signal.It is worth noting that, for sub-spaces which only preserve the scales larger than those of the observed signal, an extra term must be added to the observation error covariance matrix R to account for the truncated modes [Cane et al., 1996].This kind of representativeness error typically corresponds to spatially correlated signals, and in this case o -diagonal elements must be included in R.
In the open ocean, dynamical considerations can justify the omission of some of the PE variables in the state vector, such as the horizontal velocity components which are expected to adjust very quickly to the density properties.Di erent choices of estimation space, motivated also by the faith in the error statistics to be prescribed for the selected variables, are discussed by Brankart et al. [2003] for primitive equation circulation models.
The question of assimilating satellite altimetric data into ocean circulation models has been approached many times by reducing the statistical estimation problem to the surface variables.The downward penetration of assimilated information is then achieved empirically, using a variety of statistical (e.g.Hurlburt [1986]; Mellor and Ezer [1991]; De Mey and Benkiran [2002]) or physical (e.g.Cooper and Haines [1996]; Oschlies and Willebrand [1996]) extrapolation schemes on the vertical direction.These di erent vertical projection methods, as well as the dynamical adjustment process described by Brankart et al. [2003] in the context of HYCOM, can be considered as approximations of the pseudoinverse of T needed to convert the reduced state back to the full model space after statistical estimation.

Reducing the dimensionality of the error space
A number of other methods are based on approximations of the error with fewer degrees of freedom than the model itself.Error sub-spaces are built with the aim of selectively correcting the model state along the most representative directions of the forecast error.The concept fits well with statistical assimilation schemes because the analysis increment computed by (20) can only take place within the sub-space spanned by the forecast error covariance.The unknown are the coe cients of the correction projected along the multivariate error directions.The concept of Error Subspace Statistical Estimation (ESSE) introduced by Lermusiaux [1999] has been developed on the basis of this principle.Note that recent oceanographic studies have also started to explore the potential benefit of order reduction using variational assimilation methods [Robert et al., 2005].
The sub-space can be prescribed as a time-invariant set of error directions.The success of assimilation depends essentially on the capacity of the sub-space to capture the observed variability of the system.A num-ber of applications have been conducted successfully along these lines in the tropical oceans, where the dynamics is "slow" and nearly linear [Verron et al., 1999].Alternatively, the error sub-space can be allowed to evolve using deterministic or stochastic approaches, or a mixture of both.Evolving sub-spaces are in general more suitable to track the nonlinear evolution of "fast" energetic error modes, or to dynamically adjust the unbalanced components of the sub-space [Lermusiaux, 2001].Related concepts have been introduced in the context of atmospheric data assimilation by Cohn and Todling [1996], who proposed approximate schemes for error covariance propagation in the case of stable and unstable dynamics.

Low-rank error covariance matrix
By construction, a covariance matrix is symmetric and positive definite and can always be decomposed as P = N N T .The columns of N are formed by the orthonormalized eigenvectors n k of P, and is a diagonal matrix formed with the corresponding eigenvalues k .The inverse of P is then given by P 1 = N 1 N T .The pdf defined by ( 9) can be refomulated as follows: where k is the component of the error vector in the n k direction.
A reduced-order Kalman filter can be implemented by approximating the error covariance matrix P with only the "leading" columns of N associated with the r largest eigenvalues.The pdf defined by such a low-rank matrix is obtained by taking the limit of (31) when k 0 for k > r (assuming that the eigenvalues have been sorted in decreasing order).Equation (31) shows that the probability tends to zero if k = 0 and the error vectors are confined in a sub-space of dimension r.From a stochastic point of view, the leading vectors describe the principal axes of the probability ellipsoid oriented along the dominant directions of uncertainty; from an algebraic point of view, they define the basis of a sub-space where the error is expected to lie.

5.4
Specification of error sub-spaces using EOFs Several strategies can be adopted to determine the leading directions of the error sub-space.One of them is the Singular Value Decomposition (SVD) of a covariance matrix constructed with prescribed analytical functions.Other methods have been proposed in the literature which utilize singular, Lyapunov or breeding vectors of the transition matrix (e.g., Miller and Erhet [2002]).An approach based on EOFs was put forward by Cane et al. [1996] to elaborate a reduced state Kalman filter, and by Pham et al. [1998] in the context of the Singular Evolutive Extended Kalman (SEEK) filter.
A practical way to estimate a low-rank error covariance matrix is to perform an EOF analysis of state vectors generated by a prior model simulation.The EOF analysis provides a compact description of the spatial and temporal variability of the model in terms of orthogonal functions.Usually, most of the variance of the time sequence is described by the first few orthogonal functions whose patterns may then be linked to dynamical mechanisms [Emery and Thomson, 1998].In order to compute an EOF basis, a series of s model state vectors x(t i ) is extracted at regular intervals from a free model simulation, and the vectors form the columns of a "scatter matrix" X of dimensions n × s.The normalization factor 1 s 1 is introduced so that the unbiased estimation of the covariance matrix is given by the product XX T .The size s of the sample is always much smaller than the dimension n of state vectors involved in realistic ocean models.The EOFs correspond to the orthonormalized eigenvectors N of the n × n matrix XX T which has a rank necessarily smaller than or equal to s.The explicit calculation of the matrix XX T is not required to compute the eigenmodes.Indeed, the matrix X T X of much lower dimensions (s × s) has the same eigenvalues , and its eigenvectors V allow the computation of the "large" eigenmodes N by simple multiplication, N = XV.This provides a first practical method to calculate the EOFs.A second possible approach is to perform a singular value decomposition of the matrix X directly, using standard SVD algorithms [Kelly, 1988;Emery and Thomson, 1998].This provides a decomposition of the form: The EOFs determined by the two methods are identical, but the SVD has the advantage of greater computational stability.One should keep in mind that the EOF decomposition of multivariate state vectors also depends on the units adopted to represent the di erent physical quantities, and the choice of a particular metric may be more critical than computational stability in determining the structure and ordering of the dominant EOFs.The r dominant eigenvectors form the columns of an array noted N 0 with dimensions n × r , and the diagonal matrix of the associated eigenvalues is noted 0 .The covariance can be approximated by the matrix of rank r, where S 0 = N 0 0 contains dimensional quantities.Assimilation experiments can be initialized using S 0 S T 0 as a guess of the initial error covariance.
The minimum dimension of the error sub-space has to be determined by practical considerations.A condition of stability for a reduced-rank KF in idealized conditions (linear model and autonomous system) is shown to be that r should be larger than r , where r is the number of eigenvalues of M having an absolute value larger than or equal to 1 (Pham et al. [1998]; Carme et al., [2001]).Since this number cannot be systematically evaluated with large ocean models, more pragmatic criteria must be considered to define the truncation threshold.For example, the statistical representativity of the r retained EOFs can be measured by the percentage where the k are the eigenvalues.The choice of an acceptable threshold of explained variance provides a means to determine the minimum dimension of the EOF basis for practical assimilation problems.This number typically ranges from a few tens to a few hundreds.
The assumption underlying the procedure described above is that the model is unbiased (which is a necessary condition for the Kalman filter to yield optimal estimations) and su ciently "good" for its intrinsic variability to be statistically representative of real world variability.The hypothesis that a model provides an unbiased simulation implies the use of the mean vector x(t i ) as a first guess.Sometimes, these ideal conditions are not verified with a free model simulation and more complex strategies must be developed.For example, an assimilation sequence may be carried out in a first stage with a simplified method which does not require EOFs (such as an OI or nudging scheme) so as to bring the model and observations together; an EOF analysis of the resulting fields is then performed, and a new assimilation sequence can be obtained in a second stage with a reduced-rank KF.The process can be iterated several times to provide an improved EOF basis which combines the variabilities of the model dynamics and of the observations.

Examples of sub-spaces based on EOFs
The EOFs calculated with primitive-equation models are fully multivariate and three-dimensional, and cover the whole model domain: all the variables of the state vector (SSH, temperature, salinity, zonal and meridional velocities) are considered together in a dynamically consistent manner.The extrapolation of the data from observed to non-observed variables is performed along the directions represented by these EOFs which connect all grid points of the numerical domain.(Parent [2000]).
The physical nature of an EOF basis is discussed by Parent et al. [2003], who studied the variability of the circulation in the Tropical Pacific ocean during the period 1994-1998 using a primitive-equation model of the Equatorial basin between 20 N and 20 S and a SEEK filter to assimilate satellite altimeter data.They computed an EOF decomposition of the simulated variability over the 5-year integration period and selected the first 15 EOFs to build a reduced basis for assimilation.The first dominant EOF illustrates the well-known west-east seesaw, which is the most important feature of the El Niño and La Niña phases of the ENSO phenomenon.In 1997, the warm pool of the western basin migrated towards the eastern basin and produced a positive SLA, whereas during the La Niña phase (second part of 1998), the opposite movement was observed (figure 5).
An issue of practical interest is the estimation of small correlations associated with distant variables, which is a well-known di culty of finite sampling procedures such as that described above.As pointed out by Houtekamer and Mitchell [1998], the accurate estimation of small correlation coe cients would necessitate a very large number of independent model samples to guarantee the statistical convergence of the computations.In order to keep the number of samples within tractable limits and prevent the data from exerting a spurious influence at remote distances through large-scale signatures in the EOFs, a technique based on EOFs with compact support has been setup by Testut et al. [2003] in which regional sub-domains of adjustable size are considered to characterize the error sub-space.From a theoretical point of view this approximation can be justified by the argument that, when the ocean surface is divided into local regions of moderate size, the background error in such regions tends to lie in a sub-space of much smaller dimension than the full ocean state.Similar hypotheses have been put forward to develop local ensemble Kalman filters for atmospheric data assimilation [Ott et al., 2004].This technique has been developed more specifically for eddy-resolving ocean assimilation models in which the signal dominates in the small scales.Figure 6 illustrates the surface signature and vertical extension of the three dominant modes calculated with an eddy-permitting model in the Gulf Stream region.Theses modes reflect to some extent the anisotropic nature of the multivariate covariances associated with the local dynamics, with smaller scales represented by higher modes.
The local representation of the error sub-space has been shown to be particularly e ective for capturing the mesoscale features of the turbulent ocean.In the example described by Pendu et al. [2002], an assimilative system based on local EOFs was implemented in a 1/3 South Atlantic OPA model to perform hindcast experiments for the 1993-1996 period.The assimilated data are similar to those shown in figure 4, consisting of composite AVHRR observations of SST and altimetric measurements of the sea-level anomalies.The results show that the assimilation is able to successfully reproduce the Agulhas Rings present at that time in the real ocean.In addition to correcting the variables observed at the surface, the three-dimensional multivariate properties of the EOFs also permitted a correction of the non-observed variables in the ocean's interior.The beneficial impact of the assimilation was particularly impressive on the mean salinity in the Confluence region down to about 1500 meters depth.

Low-rank Kalman filters
Several low-rank filters based on static or evolving error sub-spaces have been developed over the past ten years, such as the SEEK filter introduced by Pham et al. [1998] and the Reduced-Rank-SQuare-RooT (RRSQRT) formulation explored by Verlaan and Heemink [1997].Many features of the Ensemble Kalman Filter (EnKF) put forward by Evensen [1994] can be discussed using a similar framework.
The Ensemble OI scheme [Evensen, 2003] and the SEEK filter with static error sub-space are two sub-optimal schemes which preserve a number of important properties of statistical estimation but require only a small fration of the computer resources needed by the model.In contrast to those simplified schemes, the EnKF or the SEEK filter with evolutive sub-space propagate the error statistics according to the model dynamics, but they need the simultaneous integration of model states perturbed along each direction of the error sub-space.
The consequences of using a low-rank error covariance matrix to compute the forecast and analysis steps of the KF are examined in the following sections.

Forecast error with a reduced basis
The reduced basis concept allows drastic simplifications to compute the evolution of the error statistics over the assimilation window.A hierarchy of algorithms of increasing sophistication and computer requirements has been proposed to compute the forecast step.Assuming that the analysis error covariance is represented as a low-rank matrix at time where S a i (of dimension n×r) defines the error sub-space associated with x a i , Eq. ( 18) becomes The computer load associated with Eq. ( 36) is primarily determined by the rank r of P a i which specifies the number of model integrations needed to evaluate the forecast error covariance matrix.As originally proposed by Pham et al. [1998], this algorithmic variant known as "Extended Evolutive Basis" requires the derivation of the tangent linear model M(t i , t i+1 ) to update the error directions, i.e. the r columns of S a i .The evolved sub-space S f i+1 reflects how the model dynamics a ects uncertainty during the forecast.
An alternative scheme to Eq. ( 36) can be used for the calculation of the time evolution of the reduced basis as follows : where {} j is the jth column of the matrix, and is an adjustable parameter that determines the size of the perturbations along each error direction.Equation ( 37) is a finite-di erence approximation of the linear error evolution if is small.However, the value of is usually taken to be of the order of 1 to simulate the non-linear evolution of model perturbations that have an amplitude comparable to the error covariances.This algorithm known as "Interpolated Evolutive Basis" has a two-fold benefit: firstly, it avoids the computation of the tangent linear model which numerically can be a delicate task; and secondly, it seems more robust with regard to the model non-linearities because the finite di erence takes into account the amplitude of the uncertainties, while the classic linearization does not.Due to the recursive character of the Kalman filter, P f i+1 should have the same rank as P a i in order to preserve the advantage of a lowdimension space for the subsequent assimilation cycles.The di culty arises from the fact that the rank of P f i+1 intimately depends on the structure of Q.A possible method is simply to project the error vector i and its associated covariance on the sub-space generated by the columns of S f i+1 .As it is often impossible to specify the model error perfectly, for the reasons discussed above, a simple parameterization of system noise can be introduced, assuming for instance that where is a scalar quantity called the "forgetting factor" (0 < < 1), by analogy with the approach used in automatic control algorithms.This kind of model error parameterization leads to a forecast error covariance matrix of the form : which is singular and has the same rank r as the previous analysis error covariance.The forgetting factor is one of the many possible options for accounting for some simple form of model error in the assimilation scheme.Other approaches may be implemented, however, such as using a perturbed model M (t i , t i+1 ) instead of the original model to dynamically update the error modes through Eq. ( 37).For instance, in most EnKF implementations, stochastic perturbations are introduced in the surface forcings to update each ensemble member, accounting in this way for the uncertainty in the atmospheric fluxes.
With respect to Eq. ( 36), an even more drastic simplification of the forecast step can be obtained by simply neglecting the dynamical transformation of the error directions during the assimilation period (t i , t i+1 ), leading to the "Fixed Basis" algorithm: where I is the identity matrix.As in the Ensemble OI scheme [Evensen, 2003], temporal persistence of the error sub-space basis is assumed in this variant.Static error sub-spaces have been successfully used in a variety of assimilation applications (Verron et al. [1999]; Gourdeau et al. [1999]; Parent et al. [2003]; Pendu et al. [2002]), being justified by two basic arguments.The first one is of practical interest: the cost of a Fixed Basis assimilation experiment is of the same order of magnitude as a model simulation, with only a few additional computations needed to perform the algebraic operations of the analysis step.This makes the algorithm extremely useful in evaluating the overall performance of the system, testing new parameterizations, examining the impact of di erent observation configurations, etc.The second argument constitutes a more objective justification in terms of statistical estimation: it is often impossible to properly characterize the statistical structure of the model error originating from imperfections in the model forcings, discretization schemes, space/time resolution, etc.As a result, the approximation made by assuming persistence of the error sub-space directions is often negligible compared to the mis-specification of the systematic errors.
When the amplitude of the model error increment Q dominates the difference between P a and MP a M T in the error propagation Eq. ( 18), it is a waste of CPU resources to explicitly compute the error propagation with the model dynamics, because the result will eventually be polluted by mis-specified model error estimates.In such situations, it is sensible to by-pass the error propagation equation and concentrate on identifying the dominant forecast error directions using, for instance, adaptivity mechanisms as described in Section 7.

Analysis step with a reduced basis
The linear variance-minimizing analysis ( 19) can be re-formulated nicely when the forecast error covariance used to compute the Kalman gain is of low rank.Using equations ( 20), (39) and the matrix equality (41) the expression for the Kalman gain, after some mathematical manipulations, can be transformed as follows : Equation ( 42) shows that the size of this inversion problem is determined by the error sub-space dimension, while the original Kalman gain (20) requires an inversion in the observation space.As the number of observations is usually much larger than the rank of the error sub-space, the inversion step becomes less expensive than the corresponding computation of the original Kalman gain.By combining equations ( 19) and ( 42), the correction of the forecast state can be written as a linear combination of the error modes with The vector c i+1 contains the r amplitudes of the error modes that need to be estimated at each analysis step.Finally, the scheme evaluates the analysis error as follows and updates the vectors of the reduced basis according to This shows that, if the rank of P f i+1 is r, then the rank of P a i+1 is equal to r also; therefore, recursivity in the forecast/analysis cycles is allowed.For better numerical conditioning, it is possible to re-orthonormalize the reduced basis by recomputing a SVD decomposition of the analysis error covariance matrix.
There is one additional consideration that should be mentioned concerning the practical computation of the Kalman gain in the reduced space (42).The robust estimation of small correlations associated with remote observations is a common di culty that can be avoided by computing EOFs with compact support, as mentioned above.Instead of computing local EOFs, a simplification of the analysis scheme can be designed by enforcing to zero the error covariances between distant variables which are believed to be uncorrelated in the real ocean [Houtekamer and Mitchell, 1998].
This simplification is implemented by assuming that distant observations have negligible influence on the analysis.The global system is split into sub-systems, and for each of these the traditional analysis is computed.Only data points located within individual regions, centered on a sub-domain of one or several grid points to be updated, actually contribute to the gain in (42).This approach can be understood as a tuning of the observation operator according to the sub-domain in question.Intuitively, this approximation makes sense because only data points located in the neighborhood of a model grid point should objectively have an impact on the analysis for that grid point.The size of the regions is determined in such a way that the distribution of the observations available on the model domain always provides at least a few data points within each region of influence (if there were no data available in the region of influence, no correction would apply).This algorithmic simplification also improves the analysis because it enables a larger part of the estimation space to be spanned over a particular subdomain (see, for instance, Brusdal et al. [2003]).
Other interesting procedures exist for addressing the locality issue.One of these is the partitioning of the large estimation problem into a series of separate smaller calculations using the partitioned Kalman filter and smoother proposed by Fukumori [2002].The objective of the partition is to make global eddy-resolving data assimilation problems computationally viable.In their example, the reduced state consists of perturbations of the barotropic mode and the first five baroclinic modes defined in eight overlapping cells covering the globe.

Stochastic vs. deterministic filters
In a low-rank filter like SEEK, the error directions are determined at the initialization step and their evolution can be anticipated in a deterministic way.As these filters do not require any further randomization, they are termed deterministic filters in contrast to the EnKF which can be considered as a stochastic filter because ramdomization has to be repeated at each assimilation cycle.
Evensen [2003] reviews the theoretical formulation and practical implementation of the EnKF.The EnKF was introduced to avoid the occurrence of instabilities found with the Extended KF due to the non-linear evolution of the probability density functions [Evensen 1994].However, the EnKF only solves half of the non-linearity problems because it still combines the model prediction and the data by using only the first two moments of the pdfs, assuming that the distributions are nearly Gaussian.The non-linear analysis equations would be more di cult to use in practical applications, as discussed by Evensen and van Leeuwen [2000].
The EnKF was initially proposed by Evensen [1994] based on the following general procedure: a sampling of the state space is achieved using Monte-Carlo methods to generate an ensemble of r model states x a,j i representing the spread of possible initial conditions at time t i around the mean x a,j i .Each member is then propagated individually as using the non-linear model with stochastic perturbations.The vectors j have a covariance matrix Q and are generally introduced through the perturbation of the atmospheric forcings.The new ensemble x f,j i+1 provides implicitly the forecast error covariance matrix The statistical analysis equation of the KF is then applied to each individual member, providing implicitly the analysis error covariance matrix In order to avoid the problem of systematic underestimation of the analysis covariance that occurs when the same data and the same gain are used in the set of analysis equations, an ensemble of perturbed observations y i+1 is considered in (49) instead of the original y i+1 [Houtekamer and Mitchell, 1998;Burgers et al., 1998].The stochastic nature of the EnKF filter arises as a consequence of using perturbed observations [Lawson and Hansen, 2004].Deterministic variants of the EnKF, which do not require perturbed observations, have been proposed recently involving square-root analysis schemes [e.g.Whitaker and Hamill, 2002].An interpretation of the error propagation scheme in the SEEK filter has been proposed by Ballabrera et al. [2001] in terms of ensemble model integrations.Indeed, Eq. ( 37) depicts the natural dispersion of an ensemble of di erent model trajectories initialized in the vicinity of the initial guess; it represents the amplification of unstable error modes (or the damping of stable modes) inherent in the system's dynamics.A more exhaustive exploration of the similarities and di erences between the SEEK and Ensemble Kalman Filter (EnKF) is discussed in Brusdal et al. [2003] and Nerger [2004].

Statistical consistency of assimilation schemes
A key issue concerning statistical assimilation systems relates to their capacity to produce reliable error statistics about the ocean state estimates, and to propagate those error statistics properly from one assimilation cycle to the next.In the linear KF, the specification of the system noise Q, the observation error R and the background error covariance at the initial time P 0 perfectly determines the subsequent evolution of the error statistics throughout the assimilation sequence.This is because the observations actually control the trajectory of the model state, but they have no impact on the evolution of the error statistics themselves (except through the observation network H).For a KF to yield optimal performances, it is necessary to provide the correct a priori description of these error covariance matrices.As only guesses of these quantities are available in real cases, it is necessary to verify the consistency of the underlying assumptions.
Comparisons with independent information provide, of course, the ideal way to assess the quality of the assimilation and to detect what has possibly gone wrong in the system.Testing the ability to produce a forecast from an analyzed state (i.e., using data that have yet to be assimilated) is also very helpful in checking whether the dynamical model and assimilation scheme are consistent with one another.The most common test is to compare the forecast to persistence [De Mey et al., 2002].Additionally, information acquired during system operation can be used to verify if the prior error statistics have been prescribed in a way which represents the actual errors in the model and the observations.The di erence between the observations y used in the assimilation system and the forecast or analysis fields (called innovations d and residuals r, respectively) provide two series of data that contain a wealth of important information about the consistency of the prior error estimates.It can be shown that examining diagnostics on the innovation vector is essentially equivalent to examining them on the residual vector, but in practice some features of the analysis may be easier to diagnose from one or the other vector [Talagrand, 1999].
In the section below, we will examine a number of useful criteria that can be diagnosed from the sequence of innovations and residuals to detect imperfections in the specification of the error statistics.Since such imperfections can be a source of drift in the assimilation system or, even more dramatically, can a ect the stability of the filter, we will introduce the concept of adaptivity, the aim of which is to enforce consistency between the error statistics predicted by the filter and the observed misfits.

Verification of statistical consistency
Simple diagnostics can be implemented quite easily in assimilation systems, providing interesting tools to monitor overall performance and detect anomalies in system operations.A first-order check can be made by computing the mean innovation, which is expected to vanish over a su ciently long assimilation sequence: A non-centered innovation sequence is the obvious indication of biases in the model and/or observations which in principle should be removed from the assimilation system.If the source of bias is in the model, d i only provides information about the projection of this bias in the observation space, and the di culty remains of inversing this information back to the model space.
Similarly, the expectation of the residual r i = y i Hx a i should be zero for an optimal system as well as the mean increment H(x a i x f i ) = d i r i [Talagrand, 1999].Figure 7 illustrates the mean assimilation increment of sea-surface height computed from a 10-year analysis sequence of the MERCATOR global prototype (at 2 × 2 horizontal resolution) assimilating altimeter data during the 1993-2003 period [Ferry et al., 2005].Over large portions of the ocean, the amplitude of the increment is indeed very small, but in some regions the assimilation system systematically corrects the predicted sea-surface pressure towards higher (e.g. in the southern oceans) or lower values (e.g. in the sub-polar gyre).Ferry et al. [2005] discuss how the detection of such biases can be used to improve the modelling or assimilation components of the MERCATOR system.By taking the covariance of the innovation and remembering the assumptions made in Section 2 about observation and forecast errors, we obtain if the error covariances are correctly specified.The comparison between the matrix d i d T i and the sum of the observation and forecast error covariances used by the assimilation scheme indicates whether the forecast misfits "seen" by the filter are compatible with the prior information.
As all these quantities are full covariance matrices in the data space, the comparison is often made by looking only at a few selected diagnostics such as the trace of the matrix or its diagonal elements.After simple manipulations, it can be shown using the residuals that This formula accounts for the fact that the same observation error a ects both the data and the analysis.It shows that, for an asymptotically perfect estimation system (with P a i 0), the residual error covariance converges towards the observation error.
Another more synthetic diagnostic can be implemented in both statistical and variational assimilation systems.If the covariances are correctly estimated, the scalar quantity behaves as a chi-squared variable with as many degrees of freedom as there are independent data (say, p) [Bennett, 1992].It can be regarded as a particular norm (the so-called Mahalanobis norm) of the innovation vector.Significance tests may therefore be used to accept or reject the covariance models.In the context of low-rank KFs based on the decomposition (39), the norm (54) can be written as and can be used to objectively evaluate the suitability of the error subspace S f i .Instead of testing the complete 2 p behaviour of J i , only the first and second statistical moments need to be examined.These should show that In particular, a too small (resp.large) value of J i is symptomatic of an overestimated (resp.underestimated) innovation amplitude, which may be the consequence of too large (resp.small) observation or forecast error covariances.

Adaptivity
In most applied assimilation systems, large deviations may be observed with respect to the theoretical criteria (51), ( 52), ( 53) or ( 56), reflecting some flaws in the prior statistical assumptions.Feedback mechanisms can be implemented on-line or o -line to adjust the prior error statistics in such a way as to restore consistency between the errors diagnosed by the filter and the innovation or residual information.Inherent in the family of techniques used for this purpose is the concept of adaptivity.A comparison between di erent sophisticated adaptive KFs is provided by Blanchet et al. [1997].In this section, we examine more closely di erent examples of adaptive methods that have been explored in the literature to identify and correct model biases, to tune the parameterization of model errors and to build the error sub-space of low-rank KFs.
Research e orts aimed at improving error covariance modelling in assimilation systems are of limited interest if biases are left in the models or the observations, i.e. if criterion ( 51) is not verified.The correction for biases in operational systems is expected to have a very strong impact on assimilation performances.In the context of sequential assimilation, Dee and Da Silva [1998] proposed a rigorous method for estimating the forecast bias and correcting the forecast prior to the analysis, assuming unbiased observations.The algorithm is designed to perform on-line and its implementation does not require substantial modifications to the assimilation system.The basic idea consists in running a simplified KFtype algorithm to estimate d i in addition to the KF for the state itself.Assuming temporal persistence of the bias, the extra cost of the method is equivalent to one additional computation of the statistical analysis step.
The question of model error parameterization can been addressed by means of adaptive methods too.In many practical studies, the model error covariance Q is probably the least well known statistical quantity impacting the forecast error.Mitchell and Houtekamer [2000] developed an adaptive EnKF using a maximum likelihood method to estimate the parameterization of model errors from the innovation sequence.The approach recently developed by Brankart et al. [2003] can also be viewed as an adaptive parameterization of the model error.By adjusting the coe cient of Eq. ( 38) according to the local innovation variance, the method is able to account for regional properties of the ocean dynamics.The estimation of the innovation variance is based on a weighted average of the latest innovations, using a weight that decreases exponentially with the age of the innovation.
This mechanism was explored in the context of hindcast experiments conducted for the 1993-1996 period.The SEEK filter was implemented in two di erent models : the 1/3 North Atlantic OPA model of the MERCATOR prototype system, and the Atlantic/Arctic MICOM model of the European DIADEM system.Sea-surface temperature from the NASA Pathfinder project and altimetric data from the Topex/Poseidon and ERS missions were assimilated in both systems every 10 days.Testut [2000] studied the distribution of the 10-day forecast error in the OPA system once the assimilation experiment had reached an asymptotic regime (figure 8  Another interesting application of the adaptivity concept is the update of the initial structure of the error sub-spaces.Using a low-rank KF with an idealized primitive-equation model of the Gulf Stream, the assimilation error dynamics was investigated by Ballabrera et al. [2001].They examined how the filter performances were a ected by imperfect specifications of the initial error statistics.They observed that the truncation error does not impact the control of the solution when at least one of the following three conditions is met: (i) the initial error is perfectly described by the reduced basis; (ii) the truncation error is dynamically uncoupled with the error components of the low-dimension space; (iii) the sub-space must contain all the components of the growing error during the forecast time period.As these three conditions are never perfectly verified in realistic assimilation systems, an adaptive procedure was developed in order to introduce some feedback between the data and the error sub-space used by the filter.The algorithm proposed by Brasseur et al. [1999] updates the error sub-space of the SEEK filter along the geostrophic attractor by extracting information left in the residual vector after each analysis step.This update of the reduced basis is performed in such a way as to attenuate the truncation error and to improve the projection of the next innovation onto the error sub-space: this leads to the concept of adaptive sub-space.An advantage of this variant of the SEEK filter is that it allows the evolution of the error subspace without incurring the cost of propagating the whole set of error directions dynamically.

Intermittent vs. time-continuous filtering
In the basic assimilation problem introduced in Section 1, two major simplifications were considered: (i) the observations were available at discrete time intervals, and (ii) the analysis was performed at the exact time of the measurements.In oceanographic and atmospheric applications, the situation is quite di erent since the flow of observations can be considered as almost continuous in time (for instance, the acquisition of altimeter data).Therefore, ocean data assimilation with intermittent sequential filters necessarily involves approximations.
In principle, sequential filters could perform an analysis step as often as a new piece of information arrives.Time-continuous formulations of the KF exist [Gelb 1974] and have been applied to analogic signal processing.However, their application to oceanographic or atmospheric models is inappropriate: for practical reasons, it would be impossible to incorporate the data at their exact time of acquisition.Experience shows that it is necessary to accumulate a certain number of observations between two successive analysis steps to correct the ocean state with su cient impact.Besides, operational assimilation systems must be scheduled on a regular temporal basis so as to avoid unnecessary algorithmic complications, to account for human intervention, delay in data delivery, etc.

8.1
Computing innovations using FGAT Typical lengths of assimilation cycles are 3 to 7 days for mesoscale ocean current predictions, and 10 to 30 days for initialization of the oceanic component of seasonal climate predictions.In spite of the fact that the ocean cannot be considered as static over these time scales, intermittent sequential filters incorporate at one single instant the set of observations collected during the whole assimilation interval.This is a major di erence with 4D-VAR algorithms, which can take full advantage of the temporal distribution of the data within an assimilation window.
Fairly simple solutions can be set up to alleviate these problems in the context of statistical filters.For example, the FGAT (First Guess at Appropriate Time) method initially introduced in meteorology can be used to evaluate the innovation vector more correctly: instead of computing the di erence between the time-distributed data set and the model forecast at time t i+1 as in Eq. ( 19), the innovation is evaluated "on the flight" by cumulating the di erences between each piece of observation and the corresponding element of the model forecast at the measurement time.This approach has also been used with 3D-VAR assimilation systems [Weaver et al., 2003] to benefit from the temporal dimension.Due to the fast propagation of equatorial waves, the FGAT feature may be particularly important in the tropical oceans.

Incremental Analysis Update
A direct consequence of intermittency is the discontinuity of the forecast/analysis estimates, which is recognized as a major drawback of sequential methods.Two related problems, -shocks to the model and data rejection -, arise with intermittent corrections.It is found that observations assimilated into models may introduce transient waves excited by the impulsive insertion.These waves are often the result of imperfections in the corrected state associated with physically unbalanced error covariances.In the example illustrated in figure 9, six vertical profiles of temperature and salinity measurements are assimilated into the HYCOM model using the SEEK filter.The profiles are inserted in distant regions (Labrador Basin, Irminger Sea, Gulf Stream, Azores and North Brazil currents and Caribbean Sea) in order to avoid mutual interference.The corrected state is then integrated using the model with realistic forcings for one month after analysis time. Figure 9 depicts the SSH increment after 3 days of simulation, showing the occurrence of spurious transients (particularly in the Gulf Stream region) which the model generates to dynamically adjust the new state.In order to incorporate analysis increments in a more gradual manner, a new algorithm based on Incremental Analysis Updates (IAU) has been proposed by Bloom et al. [1996], which combines aspects of intermittent and continuous assimilation schemes.Using the regular KF equations, the IAU algorithm first computes the analysis correction (x a i+1 x f i+1 ).This correction is then distributed (uniformly or not) over the assimilation window and inserted gradually into the model.The state obtained at the end of the assimilation window can be used as the initial conditions for the next assimilation cycle, leading to time-continuous filtered trajectories.The concept of IAU can also be implemented by computing corrections to the atmospheric forcing or by introducing dynamical relaxation terms in the prognostic equations, according to the analysis increments.The IAU temporal strategy can be complemented by the FGAT scheme which computes the innovation "on the flight".
A possible implementation of the IAU scheme in primitive equation models may include the following steps: (i) first-guess model integration and evaluation of the innovation vector using FGAT; (ii) computation of the analysis increment using the Kalman gain; (iii) second integration on (t i , t i+1 ) using primitive equations modified by the increment.The additional cost incurred by this process will be one model integration or less, depending on the details of the algorithmic implementation.Noting by T the increment of the temperature field, a possible modification of the PE equation for the temperature may be written as follows: where T/(t i+1 t i ) is a new term acting as a body force, in a similar way to the nudging assimilation technique [Verron and Holland, 1989;Blayo et al., 1994].The extra term is, however, di erent from newtonian relaxation since it does not explicitly involve the current value of T and has the advantage of being multivariate in general and weighted consistently with error statistics.A linear analysis of the IAU procedure shows that it has the properties of a low-pass temporal filter.Variants of the IAU procedure are currently being explored with the MERCATOR assimilation systems.

Conclusions
The Kalman filter provides a theoretical framework from which a hierarchy of algorithms of increasing sophistication and increasing computer requirements can be derived.These algorithms range from Optimal Interpolation schemes, which require only a few percent of the computer resources allocated to run the model, to non-linear Kalman filters such as the SEEK or the EnKF, which require the simultaneous integration of perturbed model trajectories in equal number to the dimension of the error sub-space.The research conducted in support of operational oceanography has shown that error statistics is at the heart of applied assimilation systems and remains a major challenge for ocean data assimilation.In this respect, the derivation of a hierarchy of simplified filters o ers a unique possibility to test di erent approaches for specifying and calibrating the background, systematic and observation error statistics.
A salient feature of statistical estimation methods is their multivariate nature : observations related to several di erent model variables (e.g., SST, SLA, SSS, in situ temperature and salinity data) are used to correct the whole model state in a consistent manner.This opens promising avenues for e ectively envisaging the simultaneous assimilation of in situ data with satellite data from di erent sensors.Error bars on state estimates can be computed by the algorithms on a rigorous basis, making the schemes useful in qualifying the reliability of di erent forecasts, or comparing the relevance of di erent observation systems.For practical implementations, the adjoint of the model code is not necessarily needed as with 4D-VAR, and the basic architecture of the algorithms is modular.The transition from one model version to another, or from one model code to another can be made smoothly without much recoding.
The large variety of sequential statistical methods developed in the context of scientific or operational applications is an indication that a given assimilation technique cannot be considered as a "plug-and-play" system, capable of solving universal problems.To design the best possible assimilation system, it is necessary to clearly define the purpose of data assimilation (forecast initialization, reanalysis, model error detection, etc.), the physical characteristics of the processes of interest, the sampling properties of the observation systems, and the limitations of the assimilation techniques.Moving towards hybrid sequential/variational methods is probably an astute way of taking the best from both approaches.Due to the complexity of models and algorithms, the success of an assimilation system also depends on the community of people working within a common framework.In the future, the sharing of generic assimilation tools between operational and research teams should accelerate the progress of the methods and provide feedback from more intensive utilizations.
A number of important issues for ocean data assimilation have yet to be fully resolved, such as the incorporation of inequality constraints in statistical estimation algorithms.Such constraints are needed, e.g. in the context of tracer data assimilation to maintain the positiveness of tracer concentrations, or for the proper handling of the physical properties of the water column such as static stability.Another important challenge will be to further develop assimilation systems for coupled physical-biological models, with the aim of demonstrating the capacity to estimate and forecast marine ecosystems and biogeochemical variables in the ocean on a routine basis.As a first step, the incorporation of biogeochemical models into the physical assimilative systems developed within GODAE will provide new "bio-metrics" to evaluate the representation of the physical processes of critical importance to biology.A further challenge will be to implement suitable methods to assimilate ocean colour data in the coupled assimilative systems.

Figure 1 .
Figure 1.Vector diagram of analysis and forecast errors in the state space.

Figure 2 .
Figure 2. Conceptual representation of filtering with sequential assimilation.

Figure 3 .
Figure 3. Physical representation of a portion of the HYCOM state vector in a North Atlantic model.The panel on the right shows a snapshot of the SSH field ; the panels on the left show a vertical section in the 3D temperature and salinity fields at 30 W, superimposed to the vertical grid oriented along the layer interfaces.The full state vector also includes the velocity components (not shown here).

Figure 4 .
Figure 4. Multi-satellite data set collected during the week August 19-26, 1993, illustrating the di erent sampling properties.On the left: altimeter measurements (in cm) obtained by merging the Topex/Poseidon and ERS ground tracks; on the right: composite AVHRR picture of the sea-surface temperature.

Figure 5 .
Figure 5. Sea-surface height component of the first EOF (in cm) of the interannual variability of the circulation in the Tropical Pacific(Parent [2000]).

Figure 6 .
Figure 6.Multivariate local EOFs of the mesoscale ocean variability in the Gulf Stream region simulated by an eddy-permitting model: surface signature of temperature and sea-surface height of the first three dominant modes, and associated vertical extensions of the temperature and salinity structures (reproduced from Testut [2000]).
).The largest forecast errors are found along the Gulf Stream path between Cape Hatteras and 40 N, where the standard deviation exceeds 20 cm in some places.Local maxima are also detected in the North Atlantic Current extension and along the Azores Current at 35 N. Using the MICOM system, Brankart et al. [2003] compared the estimated innovation variance with the sum of the observation and forecast error variance diagnosed by the filter in a zonal section across the Gulf Stream region at 30 N. The comparison indicates a fairly good agreement between the 2 estimates in most regions, reflecting a successful adaptive procedure.

Figure 8 .
Figure 8. Distribution of the 10-day forecast error diagnosed with the adaptive SEEK filter implemented in the OPA model (reproduced from Testut [2000]).

Figure 9 .
Figure 9. SSH increment (in m) obtained with the HYCOM model 3 days after assimilation of vertical T/S profiles.