Variational assimilation of ocean tomographic data: Twin experiments in a quasi-geostrophic model

The possibility of using tomography data as a constraint in variational data assimilation is explored by performing twin experiments. Realistic travel-time data are generated in a quasi-geostrophic model applied to the western Mediterranean Sea. After checking the robustness of the method with this dataset, a sensitivity study analyses the propagation of information by the model, and tests the effect of different parameters such as the starting point of the optimization or the length of the assimilation period. It appears that the variational method is well adapted to the nature of tomographic data that provide at the same time high time resolution and space integrals. The results clearly show the ability of this dataset to recover the temporal evolution of heat content in each layer and improve various components of the circulation described by the model. Tomography must then be considered as an important source of information that may contribute with altimetry or local in situ measurement to the description of the ocean state and its evolution.


INTRODUCTION
In the context of global ocean observing systems for studying climate, the oceanographic community is exploring different ways of building the optimal combination of in situ observations, complementary with satellite observations, under the constraint that these measurements be sustained on a long-term basis. Among the existing observational techniques, acoustic tomography represents a potential valuable candidate and a few pilot sites are already planned in a number of CLImate VARiability projects. Ocean acoustic tomography samples the ocean in depth and over large distances, during long time periods and with high temporal resolution. It directly provides a strong constraint on the heat content over vertical sections across ocean basins, which may be used to prevent the drift observed in the models when doing long-term climate studies or remove biases due to undersampling of the mesoscale field. It is also important information to help discriminate between steric and dynamic effects contained in altimetric data. Proposed in the late seventies (Munk and Wunsch 1979) as a remote-sensing measurement tool, acoustic tomography has since then been applied in a large number of experiments, either as stand alone, or merged in multi-instrumental arrays. A synthesis of the possibilities, based on the results of the most illustrative experiments, is presented in Dushaw et al. (2001).
Tomography data have mostly been used in diagnostic methods (or inversions). These methods rely on a combination of data, statistics and stationary conservation constraints. Such methods were applied to study local phenomena as convection events (Send et al. 1995;Morawitz et al. 1996;Gaillard et al. 1997) as well as to estimate basinscale temperature fields over the Pacific Ocean (Dushaw 1999), or over an unaccessible region such as the Arctic (Mikhalevsky et al. 1999). The range of scales resolved by the data is determined by the size of the cells defined by the sections of the observational array. Depending on the array geometry and scale, and on the availability of additional datasets, the above analyses were able to provide estimates of horizontally averaged heat content and vorticity or low-resolution three-dimensional (3-D) fields of temperature and horizontal velocity.
Given the wide range of space and time-scales in the ocean, a full description of the circulation cannot be obtained based only on observations and statistics. Assimilation in a prognostic ocean model is necessary to relate the variables sampled at different points and times. This is how altimetric data are now commonly analysed (Fukumori 2000). In order to reproduce the full vertical structure of the ocean, these data need to be complemented by observations of the ocean interior. Assimilation of local in situ data is rapidly developing, and exploration of remote in situ data such as tomography is starting. Menemenlis et al. (1997) performed the first assimilation of real tomographic data to estimate the circulation of the Mediterranean Sea. They used the heat-content time series deduced from range-independent inversions of three tomography sections, in combination with altimetry.
In the present work we explore the possibility of directly assimilating the acoustic travel times, and study their impact on the model behaviour. By retaining features that cannot be resolved by a simple inversion, we expect to make a better use of the information contained in the observations. A simplified ocean configuration is designed, having in mind future assimilation of the Thetis 2 experiment that took place in the western Mediterranean basin during the year 1994 (Send et al. 1997). Working in the perspective of post-experiment analysis (reanalysis), the variational method, which uses the full time series of observations as a constraint for the model, seems appropriate. Twin experiments will provide the possibility for exact comparison of the solution proposed by the assimilation with the true field used to simulate the observations.
The information content of tomographic data has been studied already in the context of inversions. Malanotte-Rizzoli and Holland (1985) simulated two sections placed at different locations of a gyre in a three-layer quasi-geostrophic model. They performed a one-dimensional inversion to estimate the total heat content over the section and the mean pycnocline displacements. Gaillard (1992) studied the case of threedimensional inversion with a mesoscale array. The data were simulated in a quasigeostrophic spectral model with three baroclinic modes. In both studies, the acoustic data was simulated by ray tracing in the sections extracted from the full 3-D soundspeed field provided by the model. The first twin experiment based on variational assimilation of a dataset, similar to tomography, was done by Sheinbaum (1995). He considered the highly simplified situation of a barotropic model advecting temperature to demonstrate the possibility of assimilating integrals of temperature along horizontal lines. In this simple case, the integral observations have an efficiency similar to that of point observations. None of the above experiments cover the full domain of realistic tomographic data assimilation. Before going to the assimilation of the full Thetis 2 travel-times dataset, we feel the need for a twin experiment based on an ocean model representing the main features of the real ocean in terms of horizontal and vertical scales, and tomographic travel times close to real data in terms of sampling and geometry. We chose to work with a quasi-geostrophic model and its adjoint. Such a model does not contain thermodynamics, and is obviously not adapted for real-data assimilation. On the other hand, a quasi-geostrophic model provides the time and space variability that allows testing of the resolution of the tomographic array and studying of how the information is propagated. Its simplicity makes numerous testing procedures and sensitivity study easy.
The configuration is that of a rectangular basin, with size and stratification of the western Mediterranean basin, forced by monthly winds. The acoustic data are obtained by the method of Gaillard (1992). The interface deviations are related to temperature variations and sound speed, then a ray tracing is performed in the sound-speed field. We add here the simplifying assumption that the ray paths, computed in the ocean at rest, remain unchanged.
The first goal of the twin experiments is to evaluate the feasibility of variational assimilation of tomographic data. Then, if convergence towards an acceptable solution is reached, we wish to define the efficiency of this integral dataset in controlling the model. In particular, we want to know what variables are influenced, how is information propagated in time and which components of the circulation are well constrained. Knowledge of the information content of a particular dataset is necessary before performing any type of data merging, or in the design of future experiments. In section 2 we describe the model, the assimilation method and how we have simulated the acoustic data. Section 3 is a sensitivity study concerning various parameters of the assimilation. The results of a standard run are analysed in section 4.

ASSIMILATION METHOD
Data assimilation combines a prognostic model and observations, in order to give the best estimate of the state of the system under study. The numerical model is a set of differential equations representing an approximation of physical laws and where the unresolved processes are parametrized. It predicts the evolution of the variables of the system, as determined by the forcing terms, boundary and initial conditions, which are known with uncertainties that may sometimes be large. As a consequence, the modelled circulation develops biases which grow over acceptable levels in the longterm simulations. Observations of quantities related to the model variables can be used to compensate for the model deficiencies. Observations have their own limitations: they sample the model in time and space at a rate usually much lower than the model grid size or time step and so cannot fully determine the model state. They contain measurement errors and subgrid processes and thus the relation with the model variables is only approximate. For combining those two sources of information, inverse and assimilation methods rely on complementary statistical knowledge expressing the errors associated with the model and data, and a priori information on the ocean state.
Data assimilation has been used in meteorology for several decades, and is now operationally employed for prediction. In oceanography, a growing number of datasets are now available and represent potential candidates for assimilation. Selecting between sequential or variational methods depends on the objectives of the analysis and of the type of data available. Sequential methods are more commonly used in the real-time applications, since they only require past and present data. In the present study, the focus is on acoustic tomographic data, which are travel-time measurements of sound propagation through a large volume of ocean. These data have integrated temperature and current effects over long distances, and at different levels. They are available with a high time resolution. We work here in the context of post-analysis, which means that we process a time series of the order of several months, all at once, in order to provide the best estimate of the ocean circulation during that period. Since we are free of the real-time processing constraint, we select the variational approach where the model is applied as a strong constraint. We will explore the feasibility of the method and evaluate the performances with a tomographic dataset.
(a) Principles of the adjoint method The variational method was introduced for assimilation in meteorological models (Le Dimet and Talagrand 1986;Talagrand and Courtier 1987) and later in oceanography (Thacker and Long 1988). A cost function containing a data misfit term, representing the distance between the model and the data, and a penalty term, representing any knowledge we have on the field, are built. The solution is defined by the minimum of the cost function relative to the variations of control parameters, usually the initial conditions. In the unified notation described by Ide et al. (1997), the cost function is written where x is the state vector, representing the model variables at all time steps and y obs is a vector containing all observations available in the assimilation period. The data misfit term is the quadratic difference between observations and their equivalent in terms of model variables; H is the observation operator relating the model variables to the data and W is the prescribed error covariance matrix on observations which sets the acceptable amplitude of the misfit. It represents both the measurement errors and the representativity error due to the approximation in H. The R term contains a priori information on the solution. It is needed to prevent the problem becoming ill-posed.
In the case of under-determined problems, it ensures the uniqueness of the solution.
The cost function is minimized iteratively, the increment defining the new value of the control vector being deduced from the gradient of the cost function. If we introduce the tangent linear model M i = ∂M i /∂x 0 , the gradient of the cost function is approximated by From this expression, we see that the adjoint model M T i is integrated backwards in time, forced by the misfit term. The direct model propagates forwards in time the information introduced by the new initial conditions.
(b) The prognostic model The prognostic model was developed in Grenoble and is derived from Holland (1978). The adjoint code was developed by Luong (1995) for altimetry. The quasigeostrophic nonlinear equations are written in terms of stream function ψ; they give the time evolution of the vorticity q k in each layer k: where z is height, δ is the Kroenecker symbol, H is the layer thickness, h i is the level of the layer limit, τ is wind stress, f 0 is the mean Coriolis parameter, ǫ is the bottom friction coefficient, is the model domain and T is the time range of the model. Vorticity is defined as ). The right-hand-side terms represent, respectively: lateral diffusion, wind forcing and vertical diffusion. The domain is a rectangular closed basin with a flat bottom centred on the western Mediterranean basin (Fig. 1). The parameters of the stratification are chosen using hydrological profiles of this region. The three layers represent the principal water masses: modified Atlantic water in the surface layer, Levantine intermediate water from the eastern basin in the second layer and deep Mediterranean water in the deep layer (Millot 1987). The horizontal resolution is 5 × 5 km, and the first deformation radius is 12 km. The circulation is forced by the monthly climatological wind field from May (1982) which is given with a spatial resolution of 1 • × 1 • . The forcing terms are interpolated in time and space down to the model resolution and time step. Although highly simplified, this configuration reproduces a circulation with sufficiently realistic space-and time-scales. The circulation obtained after two years of spin-up is rather turbulent and dominated by the barotropic mode, especially in the lower layers. The barotropic mode represents 85% of the total kinetic energy. The upper layer reproduces the observed cyclonic circulation around the basin and the north Balearic front. South of this, large-scale eddies are found with the correct scale of one hundred kilometres. The intermediate and deep layers show very similar circulation patterns, because a realistic circulation in the intermediate layer cannot be reproduced without a density forcing through the Straits of Gibraltar (west) and Straits of Sicily (east) (Herbaut et al. 1996). (c) Cost function The solution is found by minimizing the cost function defined by Eq. (1), where the control vector is the stream-function initial conditions. The solution is searched iteratively from a starting point-the first guess. At each iteration, the full nonlinear model is run forward in time, while the adjoint is run backwards. The relationship between the observations and the state vector at all time steps appears in the cost function through the data misfit term. Observations are weighted by W. Additional information on the control vector is brought by the regularization function R. In our experiments, we assimilate the tomographic data related to temperature, also called 'acoustic thermometry'. We detail here the construction of H and present the expression of the penalty term R.
(i) Simulated acoustic tomography data. Acoustic thermometry uses the travel-time anomaly, defined as the difference between the measured travel time and the travel time predicted in a reference ocean (Munk et al. 1995) with mean sound-speed profile C 0 (z). The reference profile and a few corresponding ray paths are shown Fig. 2. Variations of the sound speed modifies the ray path itself. We neglect here this nonlinear effect and assume fixed ray paths. Under this 'frozen rays' approximation, the travel-time anomaly δτ is linearly related to the sound-speed anomaly δC along the ray path R 0 : The observation operator H relating δτ to the model variables is defined as in Gaillard (1992). In the quasi-geostrophic model, the sound-speed variations are only related to the displacement of interface between each layer: where Here ∂C p /∂z is the vertical gradient of the potential sound speed, δC k+ 1 2 is the average value of the high-resolution ∂C p /∂z profile over layer k, and g ′ = g ρ/ρ 0 , where g is the acceleration due to gravity, ρ 0 is the mean density and ρ is the density anomaly with respect to ρ 0 . The integral over a particular ray path R 0l is replaced by a discrete sum over m model grid cells encountered. The distance dl is also computed on this fine grid and interpolated on the coarser vertical model grid. To every grid point m corresponds a triplet (i, j, k + 1 2 ), locating the point in the x, y and z dimension of the model: The observational array used to simulate the observations is similar to the Thetis 2 array deployed in the western Mediterranean in 1994 (Fig. 1). Ray tracing is performed along 12 sections, and we obtain on average 20 ray paths per section. Transmissions are simulated twice a day.
In the case of a real experiment, the acceptable misfit to observations specified in the covariance matrix W represents both the measurement error and the representativity error of H introduced by the various approximations. In the present study, the data are generated with the observation matrix H itself and we do not add any synthetic noise. The data are then perfect relative to the model and the problem of noise effect is not addressed here. Nevertheless, the matrix W is used to adimensionalize the misfit term and introduce a weighting relative to the regularization term. It is built as a unit matrix with (5 ms) 2 amplitude.
(ii) Regularization term. Since the number of independent data is smaller than the dimension of the control vector, the problem is highly underdetermined and many solutions are compatible with the data. An additional term R is necessary to avoid the problem being ill-posed by ensuring the uniqueness and stability of the solution.
It is introduced here as a smoothing function that gives one constraint per grid point on the stream function at initial time. This makes the problem formally over-constrained.
Each term is adimensionalized by a scaling factor, α k ,β k and γ k , computed as the rootmean-square (r.m.s.) amplitude of the corresponding term in one of the model runs.
The first term, expressed as a Laplacian, represents the kinetic energy, the second term is a bi-Laplacian and corresponds to the enstrophy. Both terms act as horizontal filters with a length-scale depending on the derivation degree. The Laplacian induces a correlation length of about 35 km while the biharmonic is more scale selective. The last term corresponds to a vortex-stretching; it imposes a vertical correlation between interface deviations of two successive layers. In summary, this regularization function will maintain kinetic energy, enstrophy and vortex-stretching at initial time, within acceptable limits. Small-scale perturbations will be penalized.

SENSITIVITY STUDY
We start the study with a basic experiment, based on 'reasonable' assumptions, and analyse in that case the relative influence of the data misfit term, including the observation matrix, and the regularization terms. The directions of search, the rate at which we converge towards the solution and the choice of a particular solution strongly depend on the shape of the cost function in the control parameter space. In order to evaluate the basic experiment results obtained, we will modify two parameters that influence the cost function: the starting point of the minimization (the first guess) and the length of the assimilation period. Since twin experiments allow exact comparison of the solution with the true field, r.m.s. differences, computed over the model domain, are used as a global measure of distance between two fields.
(a) The basic experiment After a three-year spin-up, the model has reached a stable annual cycle. The reference or true field, in which observations are simulated, is taken starting at the beginning of May of the fourth year. The first guess used to initialize the minimization is the field obtained in May, two years after the true field. Comparison of the two fields shows large differences: the northern cyclonic gyre is more concentrated in the first guess, a strong dipole in the centre present in the true field does not exist in the first guess (see Fig. 4, experiment 0). We are here in an extreme case, where the first guess is far from the true field, that will permit testing the robustness of the method under difficult conditions.
The length of the assimilation window is 50 days, beginning in May. The model trajectory that best fits the observations is found by minimizing the cost function J with respect to the control vector, the initial conditions for the stream function: The first term is the quadratic misfit to travel-time anomalies δτ, normalized by their error. The second term is the regularization term described in section 2. The control-vector dimension equals the number of grid points times the number of layers (approximately 38 × 10 3 ). At each assimilation step (every 12 hours), 360 observations are provided, but not all of them are independent from the model point of view, as will be seen in the observation matrix study. Part of the data at time t + 1 can be predicted by the model from the data at time t. In the end, the total number of independent data is bounded by 30 × 10 3 but the exact number is probably much lower by at least one order of magnitude. Whatever the exact number of independent data, it is obviously smaller than the dimension of the control vector. The problem is clearly underdetermined and the regularization term is needed to introduce complementary information on the unknowns.
(i) Analysis of the observation matrix. The misfit which appears in the cost function measures the distance between the data and their equivalent in terms of model variable as given by Eq. (6). The observation matrix H represents the weighting of each observation with respect to the model variables. The study of this matrix weighted by the observational covariance error matrix brings some elements about the observability of the ocean with the considered tomographic data. The specificity of tomographic data due to the integral nature of the measurements is expressed in H. The measurements do not give direct access to the model variables, but only to linear combinations of elements of this vector, the interface deviations along the ray path. The singular value decomposition (SVD) of W T HW, W T HW = U V T , gives an idea of the relative importance of the data and of the information they bring to the system; are the eigenvalues, U is the matrix of eigenvectors spanning the data space, and V the matrix of eigenvectors spanning the model variable space, which here are the interface deviations at each grid point of the sections.
The resolution matrix R defines how the true value of ψ t is filtered by the observational array: ψ obs t = Rψ true t . It is built with the eigenvectors associated to non-zero eigenvalues: R = VV T . The closest R is to a unit matrix, the closest ψ obs t is to ψ true t . The diagonal elements of the resolution matrix for each single pair are dominant, but they are much smaller than unity, meaning that the stream function or interface deviation cannot be resolved at individual points along the section by the tomographic measurements alone. The coefficients associated with the second-interface deviations are smaller than for the first interface. When all sections are used simultaneously, the resolution improves at the section crossings. The vertical resolution provided by the ray paths is not exploited by the assimilation because it exceeds the vertical resolution of the model, which has only three layers. Tomographic measurements bring information about the first baroclinic mode, which is the dominant component of the interface deviations, but says nothing about the barotropic circulation.
This analysis gives an idea of the information available at every measurement time. The exact information used by the assimilation, which combines data and model, can formally be accessed by computing the Hessian matrix of the cost function but, such computation being too heavy, the information provided by the model will be explored indirectly in section 3(c).
When data are provided for a single section, the estimated correction at the data introduction time is approximately constant and maximum along the section. Since tomographic data are integral measurements, a change in the field, equal to the mean value of the difference between the first guess and the true field is sufficient to fit the observations. When the full array is assimilated, it becomes necessary to introduce space variability in the correction, and this is how horizontal resolution is gained.
(ii) Decrease of the cost function. The first diagnostic of the success of the minimization is the decrease of the cost function during iteration. Figure 3 shows the evolution of the total cost function, the observation misfit and regularization terms for the basic case. At the beginning of the iteration process, the misfit to observations is two orders of magnitude larger than the regularization term. In the first ten iterations, the misfit to observations decreases rapidly while the regularization term increases. This term is largely dominated by the biLaplacian term, indicating that, in order to match the observations, small scales features are introduced. They are not present in the true field .I t should be noted that these small scales appear at the initial time (the control vector), but are rapidly attenuated by the model dynamics. As the minimization goes on, the regularization term starts to decrease slowly, showing that it succeeds in reducing the small-scale development. The regularization term should also decrease down to the level it had at the beginning of the minimization because the solution has the same properties as the first guess in terms of horizontal scales and vortex stretching. The data misfit keeps decreasing. After 100 iterations, the value of the cost function is divided by almost two orders of magnitude and the two terms have comparable size. The misfit to observations reached the a priori error level. Even if it does not represent a realistic criterion in twin experiments, we stop the optimization at this point as the convergence becomes really  slow and the result does not change significantly. This is shown by the evolution of the r.m.s. error in Fig. 3. The main improvements are done during the first 20 iterations.
(b) Influence of the choice of the first guess By initializing the minimization with different first guess we would like to know first if we are going in the right direction (as the cost function seems to indicate), and second if we obtain a better solution when starting closer to the true field. If the answer is positive, then we can conclude that the system is robust and that both the method and the dataset, when correctly initialized and/or complemented, can contribute efficiently to the estimation of the true field. The minimization method used, a quasi-Newtonian method, is based on the assumption that the function to minimize is nearly quadratic, so the gradient given by the adjoint provides a good descent direction. Starting from a first guess very distant from the true field enhances the nonlinear effects, and the adjoint technique, which relies on linear approximation, is not so efficient in such cases and increases the risk of finding local minima.
We will compare three experiments in which we vary the distance between the first guess and the true field. The first case is the basic experiment (experiment 0); it represents the maximum distance studied (d). In the second and third experiments, the first guess is interpolated at intermediate distances between the basic first guess and the true field. Experiment 1 starts at distance 3d/4 and experiment 2 at distance d/2. Figure 4 shows a map of the elevation of the first interface at the initial time (the control variable), as specified by the first guess and the solution obtained at the end of the minimization. The field at initial time (Fig. 4) is well reconstructed in experiment 2. The main structures are now located correctly, although some amplitudes are still underestimated, and the tendency to create small-scale structures is strongly reduced. In order to measure the progress made over the whole time period, we also present the r.m.s. error for the first interface deviation as a function of time for the different experiments in Fig. 4. In each experiment, the solution is closer to the true field than the first guess over the entire assimilation period. Experiments 1 and 2 demonstrate that the solution improves when the starting point is closer to the true field. The r.m.s. distances are smaller over the whole assimilation interval.
The choice of the first guess is of primary importance with this iterative method based on linear approximation. It is clear that to obtain the best results from a given dataset, the first guess must not differ too strongly from the true field. This problem is probably enhanced by the integral nature of tomographic data that cannot locate the structures accurately. This lack of spatial resolution can be compensated by direct measurements such as hydrographic sections, expandable bathy thermographs or profiling floats. In a monitoring configuration, the first guess problem occurs only once, at the beginning of the monitoring. It is then conceivable to design some intensive array, based on direct measurements, to complement a large-scale tomographic array in order to initialize the assimilation.
(c) Propagation of information by the model In the variational formalization used here, the initial conditions are modified by the observations distributed over all times, through the data misfit term, in order to keep a dynamical coherence between these observations. The information attached to each piece of data is carried forward in time by the direct model, and backward by the adjoint. Considering that at a single time step we have many fewer observations than unknowns, it is tempting to use as many data as possible. Increasing the time period over which data are assimilated increases the number of data, and so should improve the solution. On the other hand, lengthening the period of assimilation increases the effect of nonlinearities, and the linear tangent approximation on which the adjoint relies may become invalid. We will examine here how the specific tomographic information is propagated in time and space by the model and try to define the optimal period over which we can extend the assimilation.
(i) Optimizing the length of the assimilation period. Part of the information introduced at large scales on the baroclinic modes is transferred towards smaller scales and to the barotropic mode through mode coupling and nonlinear interactions. The nonlinear terms then play the double, and contradictory, role of contributing to the transfer of information and simultaneously imposing a strong limitation on the duration of the assimilation period.
The time limitation depends crucially on the validity of the tangent linear model. To determine this period of validity, we compute the time evolution of the difference between two states separated at the initial time by a small perturbation δψ 0 , and compare it to its linear approximation: where is some function of the nonlinear perturbation. We chose δψ 0 as being 10% of the difference between the first guess and the true field, which is approximately 1% of ψ 0 amplitude, and we consider that the error due to the nonlinear terms, (δψ 2 0 ), becomes too important when its magnitude is the same as the linear term. The barotropic mode starts to diverge after 40 days, followed by the first baroclinic mode 5 days later, after 50 days even the second baroclinic mode has grown over the acceptable level.
The growth of the error due to nonlinear terms depends not only on the time of integration but also on the amplitude of the initial disturbance, and the above results cannot be simply transposed to the real assimilation case. Therefore, the length of assimilation period can only be optimized by running the assimilation over different durations. The results of the basic experiment are compared to those of two additional experiments in which a shorter assimilation period T a = 30 days and a longer one T a = 70 days are tested. For all cases, we compare the r.m.s. error over the longer 70-day period. This means that in the 30-day and 50-day cases, the integration is continued up to 70 days, without assimilation. The results are analysed in terms of equivalent vertical modes of the model. Only the barotropic and first baroclinic modes are considered, the second baroclinic mode is not well estimated and has a minor contribution to the total stream function. The r.m.s. errors, normalized by the r.m.s. initial error (first guess−true field), are shown in Fig. 5. In this representation, the normalized first guess error is uniformly equal to one by definition. It is difficult to establish a clear criterion for the optimal assimilation length since the results depend on the particular situation. The run T 70 gives the best results over the period 0-60 days for the barotropic mode; it degrades considerably in the last 10 days. The baroclinic mode estimation is not as good as the two shorter runs during the first 30 days, but it is clearly the best from day 30 to day 70. Run T 30 is more or less equivalent to run T 50 over the first 30 days: the longer assimilation periods tend to slightly degrade the initial time estimate, while improving the central period. The run T 50 seems the best trade-off, confirming the study on the development of nonlinearities by the model.
(ii) Reconstruction of the 4D circulation. The basic experiment is stopped after 100 iterations. At this stage, the cost function has decreased by almost two orders of magnitude, the data misfit term is below the a priori noise level and the decrease rate has become very low. Comparing the amplitudes of the different fields: first guess, result and true field in the middle of the assimilation period give an overview of the quality of the estimation. Figure 6 presents those fields for the two main equivalent modes at t = 30 days. The barotropic mode of the true field is dominated by a strong negative structure N1 at 350-500 km. In our solution, the first guess has been modified in such way that the structure N1 initially at 200-500 km, is now placed approximately half way from its correct position. Three secondary positive structures P1, P2, P3 are recovered with different quality. Structure P3, placed on the intersection of several tomographic sections is now placed correctly but a ghost of the initial structure remains near the western border, away from the tomographic sections. Structure P2 is building on. Structure P1, mostly out of the array, is not correctly recovered but the field has evolved in the right direction. The baroclinic mode is dominated by a negative structure N1 and three secondary positive structures. In the solution of the basic experiment, N1 is building on but lacks intensity, here again the best results are obtained for P3. With three layers a second baroclinic mode exists but, given the very weak stratification between layers 2 and 3, this mode has negligible amplitude and almost no effect on the total field. Results for this mode are not discussed. In summary, the solution proposed by the basic run is closer to the true field than was the first guess especially in the vicinity of tomographic sections. Our first goal is reached: feasibility of assimilating tomographic data with a variational method is demonstrated in a realistic configuration. Although considerable progress has been made, the exact solution has not been recovered. We will now analyse what parameters have influenced the choice of the solution and how they could be modified in order to improve the result. We must also evaluate what components of the field are recovered, and what components are not controlled by the dataset. This can be useful for designing future experiments associating different data types.
We also conducted an experiment where a random normally distributed noise is added to the simulated data. The result, even if slightly worse than for the basic experiment, is not really different. The higher vertical and temporal resolution of the observations reduce the effect of the noise by using redundant information. The issue of the noise is secondary here, where our data do not allow us to recover the true solution completely.

ANALYSIS OF THE 'BASIC SOLUTION'
As seen in the previous section, the first guess for the basic run is probably not close enough to the truth, and the dataset is not sufficient to totally determine the field. Nevertheless, we study the characteristics of this basic solution, expecting that the robust features will survive in other experiments. The tomographic data are integral measurements and they provide almost direct information on the heat content. The model plays a minor role in the estimation of this quantity, at least along the data sections. We will quantify here the estimation of the heat content along different sections over the assimilation interval. Regarding the reconstruction of structures at scales smaller than the tomographic array grid size, the model is of major importance. The acoustic data provide only large-scale baroclinic information. We have already seen that the model dynamic is able to transfer this information to different horizontal scales and to the barotropic mode. We will determine which variables of the model are modified and how their time and space variability are recovered.
(a) Heat content estimate A specificity of tomography measurements is their high temporal frequency. This type of large-scale integral information, related to heat content, cannot be obtained from any other measurement technique presently used. If these data are proven to be efficient constraints on the evolution of the heat content, it would provide a way of correcting the long-term drift of the models. In the present case, temperature is not a variable of the quasi-geostrophic model but heat content variations can be related to interface displacements. Density variations are related to the vertical gradient of the stream function: ρ ′ =−(ρ 0 f 0 /g)(∂ψ/∂z). If those variations of density are essentially due to temperature variations, the heat content θ for layer k can be expressed in terms of interface deviations: where i is the horizontal index of model boxes crossed by the ray path. The travel-time anomaly δτ is similarly expressed as a linear combination of interface deviations along the considered ray path. We see from correct heat content is recovered within 10 m r.m.s. The estimate improves after ten days of assimilation. The tomographic constraint is able to drive the time evolution of the heat content in all three layers, even with a very poor first guess.

(b) Internal scale transfer of information
(i) Barotropic mode estimation. Tomography data constrain almost directly the mean interface deviations along sections. These deviations are dominated by the first baroclinic component of the stream function. We have seen in the sensitivity study that the model transfers the information from the baroclinic mode to the barotropic mode (Fig. 5). What is most surprising is that, although this mode is not directly constrained, it becomes as well estimated as the baroclinic mode after only a few days. It must be recalled that in this simulation the barotropic mode strongly dominates the field, consequently it is probably necessary to estimate the barotropic mode correctly, in order to produce a baroclinic field that fits the data.
(ii) Spatial spectra of error. The spatial resolution of tomography data is defined by the length of the sections when independent sections are used, or by the size of the cells formed by the section crossings when they form an array, as is the case here. This information, provided on the large scale can be transferred towards smaller scales by the model. Tanguay et al. (1995) have studied in detail the transfer across scales in a barotropic spectral model, where the data were provided at grid points. They noted that, when they assimilate only the large-scale part of the data, the small-scale component of the solution at initial time diverges, until the nonlinearities act to transfer information from large to smaller scales. They concluded that, in this case, the assimilation interval must be longer than the characteristic period of the nonlinear processes, which itself depends on wavelength.
We analyse the stream function in the first layer and define the error as the difference with the true field. The wave-number spectrum of the error is computed for the first guess and for the estimated field. The spectrum for the initial conditions and the spectrum averaged over the assimilation period and are shown in Fig. 8. Also shown by shading are the scales at which the solution has been improved by the assimilation of tomography data. At the initial time, only the scales between 140 and 200 km, are better represented. These scales correspond to the scales at which the information is introduced. On average over the 50 days, the energy spectrum of the error is reduced for all scales above 40 km. The transfer of information towards smaller scales continues down to the deformation radius. Our data behave as the large-scale data of Tanguay et al. (1995), but combine the transfer across vertical modes. The tomography data, which provide information on the large-scale baroclinic component of the field, are able to constrain all scales and modes of the model when assimilated with a variational method.

CONCLUSION
We have investigated the possibility of assimilating tomography data in a numerical model with a variational method. A twin experiment approach has been chosen in order to evaluate the performance of the system and decide the pertinence of this method and dataset in providing valuable informations on the ocean state. Although we have worked in the simplified case of a quasi-geostrophic model, the main characteristics of a real ocean in terms of time variability, horizontal scales and vertical structure are represented.
Used as a stand-alone dataset, tomography is able to drive the model towards the correct solution, even when starting from a very different first guess. The method is robust and the linear tangent approximation remains valid over assimilation periods of 50 days. This period is clearly linked to the dynamics of the Mediterranean basin represented here. The benefit of the variational technique is obvious in the way information is transferred in both time directions, and across horizontal and vertical modes. The mean temperature over long cross-basin sections is perfectly controlled and tomography can be thought of as a means of preventing model drifts during assimilation over long time-scales. No other in situ dataset has this capacity of filtering the mesoscale without aliasing.
Several aspects of this preliminary study need to be complemented. Since the duration of the assimilation period is limited, a strategy has to be designed for analysing longer periods, while conserving heat contents. Methods have been proposed such as overlapping intervals (Luong et al. 1998), or successive assimilations with refined resolution at each step (Verseé and Thépaut 1998).
It must be kept in mind that interface deviations calculated from the stream function are relative to the dynamic of the circulation and not the response to thermodynamical processes. Although a relation has been established between interface deviations and temperature, some conclusions of this study must be reviewed with a model describing explicitly thermodynamic processes and forcing which are observed by tomography.
Finally, the horizontal resolution of tomography data will always remain limited to large scale, and should be complemented by independent measurements to improve determination of the solution. As seen in the spectrum of the error, the mesoscale band is not well constrained by this dataset and must be adjusted by both the model and the regularization term. Tomography is an interesting partner for two types of dataaltimetry and profiling floats. Tomography will constrain the large-scale heat content that may be biased by profilers which undersample the mesoscale field. It will also help discriminate between static and dynamic effects in the altimetry signal.