Elastic full waveform inversion based on the homogenization method: theoretical framework and 2-D numerical illustrations

S U M M A R Y Seismic imaging is an efficient tool to investigate the Earth interior. Many of the different imaging techniques currently used, including the so-called full waveform inversion (FWI), are based on limited frequency band data. Such data are not sensitive to the true earth model, but to a smooth version of it. This smooth version can be related to the true model by the homogenization technique. Homogenization for wave propagation in deterministic media with no scale separation, such as geological media, has been recently developed. With such an asymptotic theory, it is possible to compute an effective medium valid for a given frequency band such that effective waveforms and true waveforms are the same up to a controlled error. In this work we make the link between limited frequency band inversion, mainly FWI, and homogenization. We establish the relation between a true model and an FWI result model. This relation is important for a proper interpretation of FWI images. We numerically illustrate, in the 2-D case, that an FWI result is at best the homogenized version of the true model. Moreover, it appears that the homogenized FWI model is quite independent of the FWI parametrization, as long as it has enough degrees of freedom. In particular, inverting for the full elastic tensor is, in each of our tests, always a good choice. We show how the homogenization can help to understand FWI behaviour and help to improve its robustness and convergence by efficiently constraining the solution space of the inverse problem.


I N T RO D U C T I O N
Since the late sixties, seismic data have been used to investigate the Earth interior and to image its mechanical properties, for academic and industrial purposes, at scales ranging from few metres to the global Earth. Modern seismic imaging methods are based on an inverse problem making it possible to retrieve information about the Earth mechanical properties from active (explosion or airgun) or passive (earthquakes, ambient noise) seismic records. Because of the tremendous computing power such a scheme would require, a full exploration of the inverse problem solution space based on the direct modelling of the seismogram full waveforms, as envisioned by Tarantola (2005), is still out of reach, despite the progress in the design of high performance computing devices, reaching now exascale performances.
Because of this intrinsic limitation, seismologists have developed seismic imaging and tomography techniques which are mainly based on reduced data, (i.e. secondary observables such as body waves arrival times, surface wave phase velocities, normal model eigenfrequencies ...) together with an appropriate approximate so-lution of the wave propagation modelling problem, that are quicker to solve than the full wave equation. These choices make it possible to assume that the inverse problem nonlinearity is weak enough so that a solution to the inverse problem can be computed through local optimization techniques. Nevertheless, mathematical regularization (non-data-driven constraints) are always necessary to make the solution of the inverse problem unique. In this framework, for global and regional scales, time arrival and phase velocity tomography methods have provided significant results and images of the deep Earth (e.g. Romanowicz 2003, for a review). At the exploration scale, migration imaging techniques from reflection data have been introduced to recover high-resolution structural information of the subsurface reflectivity. These high-resolution seismic imaging techniques rely on a prior knowledge of the smooth background wave velocity and, in their earlier version, on a linearized wave propagation operator, aiming at refocusing primary reflections only in depth (Claerbout 1985;Beylkin 1987;Beylkin & Burridge 1990).
During the last two decades, the simultaneous development of wide angle/azimuth broadband seismic acquisition systems and high performance computing devices have made possible the 1094 Y. Capdeville and L. Métivier successful application of seismic imaging techniques based on the full waveform, from the exploration to the global scale. Compared to the previous techniques (tomography, migration), full waveform inversion (FWI) aims at extracting the information from the entire waveform in a limited frequency band, through a local minimization of the difference between the observed and synthetic data. At the global and continental scales, FWI methods, often based on the Spectral Element Method for the forward modelling tool and gradient descent algorithms (steepest descent, nonlinear conjugate gradient) for the optimization, are making important progress (Tape et al. 2010;Lekić & Romanowicz 2011;Fichtner et al. 2013). At the exploration scale, the method is now routinely used in the industry, mainly for the 3-D reconstruction of the pressure wave velocity in the acoustic approximation (Plessix & Perkins 2010;Sirgue et al. 2010;Peter et al. 2011;Warner et al. 2013;Operto et al. 2015).
Based on these successful applications, two main research directions are currently identified. The first consists in a better interpretation of physics of wave propagation, going beyond the current common mechanical approximations (typically the acoustic approximation at the exploration scale and the V S approximation at the regional and global scales) and a better accounting for the amplitude of the seismic signal. This requires the use of much more expensive viscoelastic wave propagation modelling solvers, correctly accounting for the attenuation and the anisotropy of the subsurface. This also yields a challenging multiparameter inverse problem involving at least P-wave and S-wave velocities to be reconstructed. The second research direction consists in the design of a more automatized workflow, less relying on human expertise, enhancing the robustness of FWI procedures. Indeed, the FWI problem is intrinsically ill-posed as the misfit function might exhibit several local minima leading to non-informative geological models. Designing strategies to avoid these local minima is at the heart of many methodological studies. In practice, this requires a careful data analysis, and the design of ad hoc multilevel hierarchical schemes based on frequency band, time-windowing and offset selections (also known as layer stripping approach) (Bunks et al. 1995;Pratt 1999;Shipp & Singh 2002;Brossier et al. 2009). This is complemented through the design of kinematically accurate initial models through high-resolution tomography techniques (Billette & Lambaré 1998;Alerini et al. 2007) and the introduction of prior knowledge through the design of regularization strategies (Tikhonov, smoothing techniques) and prior model information.
This second line of investigation might be summarized as a search for an FWI formulation leading to a better posed inverse problem, potentially less prone to exhibit local minima. In this paper, we investigate numerically how the concept of homogenization and equivalent medium could help in this respect. Our leading idea is to account for the fact that seismic waves propagating within the subsurface interior are always frequency band-limited. For this reason any subsurface mechanical properties variations smaller than a fraction of the shortest propagated wavelength are 'seen' as smooth, often anisotropic, heterogeneities . The homogenization theory provides means to compute this equivalent anisotropic medium, the medium 'seen' by the wavefield.
Homogenization, effective media or upscaling techniques gather a wide range of methods able to compute effective properties and equations of a fine scale problem when large scale properties are needed. In the context of wave propagation, the idea is to remove the heterogeneities of scale much smaller than the minimum wavefield wavelength and to replace them by effective properties. For 'long' elastic waves propagating in stratified media, Backus (1962) gave explicit formula to upscale finely layered media. One important result of this work is to show that a finely layered isotropic medium becomes an anisotropic effective medium for long waves. For periodic media, an important class of methods, the two-scale homogenization methods, has been developed (e.g. Sanchez-Palencia 1980). To obtain the effective media, the effective equations and local correctors, two-scale homogenization methods require one to solve the so-called periodic-cell problem. This periodic-cell problem can be solved analytically only for the specific case of layered media, whereas a numerical method such as finite elements is necessary for more general media. For stochastic media, methods formally similar to the two-scale homogenization methods exist (e.g. Bensoussan et al. 1978;Blanc et al. 2007). Finally, solutions called 'numerical homogenization' (e.g. Weinan et al. 2007) also needs a scale separation.
Typical geological media present no spatial periodicity, no natural scale separation or any kind of spatial statistical invariance. This difficulty excludes all of the above mentioned homogenization techniques from the problem of upscaling geological media. To fill this gap, the non-periodic homogenization technique (Capdeville et al. 2010b;Guillot et al. 2010;Capdeville et al. 2010aCapdeville & Cance 2015) has recently been introduced. While the non-periodic homogenization technique is strongly inspired from the classical two-scale periodic homogenization, it has some major differences. One of them relies on the fact that the obtained effective properties are not spatially constant, they are just 'smoother' than the original medium. So far, this homogenization method has been mostly used to simplify seismic forward modelling. In this context, the non-periodic homogenization can be seen as a pre-processing step of the elastic model prior to being used in the wave equation solver (e.g. spectral elements, finite differences ...). By removing all the fine scales, the non-periodic homogenization drastically reduces meshing complexity associated with elastic discontinuities and makes the computation optimum (see e.g. . Note that for relatively weak elastic contrasts, an alternate solution based on a second order Born approximation exists (Jordan 2015).
In the inversion context, the homogenization method has been already used either to mix different scales (Afanasiev et al. 2015) or to constrain FWI gradient updates in the layered media case (Afanasiev et al. 2016). More generally, the homogenization concept raises a puzzling point: for a given frequency band-limited data set, at least two Earth subsurface models minimize similarly a waveform misfit function, the true and the homogenized models. Knowing that, for local optimization approaches, only one model can be found, we can already guess that the solution model is, at best, the homogenized model and not the true model. Indeed, homogenized models represent what is 'seen' by the wavefield. More technically, the homogenized models are 'simpler' in the sense that they need a finite number of parameters to describe them, which is not the case for the true models. This has been confirmed in a recent work, but only in the layered media case: it has been proposed that FWI can retrieve at best the homogenized medium and that this fact can be used to properly set up the inversion problem . Extending those results from the stratified media case to the general case is not trivial for similar reasons that extending the homogenization principle from the layered case to the general case is also difficult (Capdeville et al. 2010a).
In the present work, we pursue mainly two objectives. First, we extend the results of  to higher dimensions (2-D and 3-D). Second, we aim at presenting ideas on how homogenization concepts can be used to better constrain and better understand FWI. Beyond the objective of making and understanding the FWI better, we also aim at establishing the relation between the true model and the model resulting from an FWI. This relation is fundamental to set up a downscaling inverse problem. Its objective is to properly interpret the FWI results as a probability distribution over possible fine scale models (Bodin et al. 2015;Nawaz & Curtis 2017) where fine scales are here scales of interest for the interpretation below the FWI resolution limit.
In the following, we first define FWI and its solution space. We then introduce the homogenization concept and its consequences for the inverse problem. We then show a series of examples illustrating the exposed ideas before discussing our results and concluding our work.

Context
We consider an elastic body of boundary ∂ (see Fig. 1). We assume that this elastic body is fully characterized by its 'true' parameters, the density ρ t (x) and the fourth order elastic tensor c t (x) defined for all x in . We consider N s point sources, characterized by their location x t s and mechanism q t s (moment tensor or force vector) for s ∈ {1, ..., N s }. These sources are triggered independently in and recorded by N r , d = 2 or d = 3 component receivers, in 2-D or 3-D respectively, located in x r , r ∈ {1, ..., N r }.
Based on the recorded waveforms, an FWI aims at retrieving information about the elastic model and sources in . In the following, we define more precisely the model space M to which belong potential solutions to the inverse problem, the data set d and the forward problem linking potential models and synthetic data. Finally, we define the misfit function between the recorded data and synthetic data as well as the chosen optimization algorithm to solve the inverse problem.

The fine scale model space
The fine scale model space is the one to which the true model belongs. It contains all the unknown parameters required to model the data in the full frequency band. In our case, this corresponds to the density and elastic parameters in , together with the source parameters.
We first define E( ), the admissible elastic properties space. E( ) gathers all the physically possible density and elastic tensor distributions (ρ(x), c(x)) in such that (i) ρ(x) is positive and bounded; (ii) c(x) is a fourth order tensor, positive definite and bounded; (iii) c(x) follows the classical major and minor symmetries, leading to only six independent coefficients in 2-D and 21 in 3-D.
In general, elastic structures are multiscale: for a given scale, there are always heterogeneities at a smaller scale (see Fig. 1). Consequently, the number of parameters necessary to characterize in a deterministic way the spatial distribution of a general elastic structure is infinite. As a consequence, E is an infinite dimensional space. Note that, in the real world, there may be a finite limit to this dimension when reaching the atomic scale. But this limit is out of reach and is not considered here.
Assuming the sources are localized in space (point sources), we define the source parameter space as where x s is the source location and q s the source mechanism, which can either be a moment tensor M s (in this case we have p = d(d + 1)/2) or a force vector f s (respectively p = d).
The model space M is defined by It is also an infinite dimensional space by definition. A model m ∈ M, potentially a solution of the FWI problem, made of an elastic model and of the source parameters, can be written as The 'true' model m t is In the following, the model m restricted to the source number s is denoted by m| s :

The forward modelling equations
We assume that, for a given model m, of component (ρ, c), for a source number s among the N s sources, for any x ∈ and any time t ∈ [0, T], the displacement u(x, t; m| s ) is the solution of the elastic wave equation where s s (x, t) is the sth external source term, σ (x, t; m| s ) is the stress and (u) = 1 2 (∇u + (∇u) ) is the strain and the 'transpose' operator. Inelastic losses are ignored in this work for the sake of simplicity. They have nevertheless an important effect in practice (see, for instance, Kamei & Pratt 2013).
The wave equation is completed by the normal stress free boundary condition, for x ∈ ∂ , where n is the outward normal to ∂ , and zero initial conditions (the medium is considered to be at rest at t = 0). We consider sources s s (x, t) of the following form for moment tensor sources or for force vector sources, where g s (t) is the source time function.

The data set
The N s elastic sources are triggered independently in , generating elastic waves, for which the corresponding displacement d s,r (t) at receivers r is recorded for a duration T. The collected data set is In the following, we assume that the physics of the problem is known (i.e. the true model m t belongs to M) and that solving the forward problem defined in Section 2.3 accurately models the data within some limit corresponding to numerical noise.

FWI misfit function
We define the classical least-squares misfit function: where u(x r , t; m t | s ) is the solution of the wave eq. (6) computed for the model m restricted to a source s. Other misfit functions could be chosen (such as those based on phase or envelope measurements, cross-correlation approaches, deconvolution approaches, optimal transport approaches, etc.), but as long as they rely on finite frequency data, we expect the conclusions of the present study to remain unchanged. The FWI problem is formulated as the minimization problem It can be mathematically proven that, assuming perfect illumination (unlimited number of sources, unlimited number of receivers recording both displacement and traction on a closed surface, different from the free surface, and an infinite observation time) the true model m t can be uniquely recovered and, in this case,m = m t (Nachman 1988;Nakamura & Uhlmann 1994). However, in practice, because computing power is finite, (12) cannot be solved. In the FWI framework, the classical solution to this problem is to limit the data frequency band by introducing a maximum frequency f m to the recorded signal and to the source time function g s . This can simply be done using a low-pass filter F fm : The misfit function to minimize is then where u fm is computed with the low-pass filtered source time function g fm s = F fm (g s ). For most media, a bounded dispersion relation between the frequency f and the wavelength λ of the wavefield exists (there are a few exceptions, see Section 7). For such media, introducing a maximum frequency f m to the recorded signal warrants the existence of a minimum wavelength λ min . Using the assumption that waves only see a blurred version of scales smaller than λ min , and therefore that these scales can somehow be ignored in the reconstruction process, introducing a maximum frequency f m has an important consequence: it yields the possibility of using a finite dimensional space approximation M h of M. The superscript h is a number that characterizes the discretization made in the finite dimensional approximation and can be thought as a 'resolution' parameter. For example, if the elastic model is represented spatially by constant velocity blocks, h can be the size of these blocks. In general h is directly related to λ min but other information such as the illumination angles, offset ranges or data coverage can influence the choice of h.
In M h , models do not contain arbitrarily small scale heterogeneities and, knowing that the data need to be modelled only in a limited frequency band, the forward problem (eqs 6 and 7) can be solved in a bounded computing time. Finally, replacing the original minimization problem (12) bȳ makes the inverse problem solvable, M h being a finite dimensional space.
To solve (15), two classes of algorithms exist: local optimization techniques or global search approaches (Tarantola 2005). While Bayesian global search approaches would be ideal, yielding the possibility to access the full posterior density function, they are still too expensive, from a computational point of view, to be used for realistic scale FWI problem. Only a few limited experiments have been done in this direction so far (e.g. Käufl et al. 2013).
For practical applications, local optimization techniques represent the state-of-the art for solving the FWI problem, relying on the gradient direction and different levels of approximation for the inverse Hessian operator [identity, l-BFGS (Nocedal 1980;Byrd et al. 1995), Gauss-Newton (Pratt et al. 1998); see Métivier & Brossier (2016) for a review]. Provided that a linesearch algorithm or a trust-region strategy is used, these methods converge to the closest local minimum from the initial guess and hopefully to the global minimum if that initial guess is good. Nevertheless, if the global minimum is not unique, this is a problem. In practice, all the FWI techniques based on local optimization approaches implicitly assume a unique global minimum exists and assume that the initial guess lies within the 'basin of attraction' of the minimization. This is the reason why the uniqueness of the solution of the inverse problem is important. Even if, in M, minimizing E under perfect conditions warrants a unique solution, it is not the case for E fm : there may be an infinite number of fine scale models that can produce the same limited frequency data (there is a large null space). In M h , it can be different and the uniqueness can be recovered, but it is not granted.
Independently of the global search versus local optimization choice made to solve (15), many other choices may influence the inverse problem solutionm h , even for a perfect data coverage. The resolution parameter h itself and the way it is implemented have a strong effect. Depending on the authors and their chosen approach, a wide range of solutions for the spatial approximation is used, from spatially constant blocks to high degree polynomials (typically, the same polynomial approximation as the one used for the waveform forward modelling). Physical approximation choices made for the constitutive parameters may also strongly influence the inversion result. For instance, we might hope from fine scale prior information that the true model is isotropic, in this case a common choice in seismology is to only invert for V P and V S (or even only one of the two) instead of c ijkl .
If the solution to the inverse problemm h depends on technical choices to parametrize M h , we know that it cannot be the true model m t . Indeed, in general, m t does not belong to M h : the single fact that m t is multiscale while M h is not prevents it. This is expected: we know for instance, following the diffraction tomography theory (Devaney 1984), that FWI can only reach a resolution down to half the smallest propagated wavelength. Nevertheless, the relation betweenm h and m t is unknown and this is a serious issue: in particular it can be difficult to dissociate true high-resolution information and small scale artefacts coming from a combination of noise, lack of illumination, and discretization setups. The knowledge of a more formal relation betweenm h and m t would thus allow a better interpretation of the FWI resultm h .
The problem thatm h strongly depends on the particular choice of M h and that the relation betweenm h and m t is unknown is not often discussed in the literature. We feel that this discussion is missing, as understanding these issues would help to better apprehend the behaviour of the inversion schemes and therefore could open ways to improve them. To us, this issue is directly linked to the question of how scales relate to each other through the wave equation: it is a homogenization problem.

H O M O G E N I Z AT I O N C O N C E P T S
In this section, we summarize the results obtained by Capdeville et al. (2010a), Guillot et al. (2010) and Capdeville et al. (2010b) about homogenization of complex deterministic media with no scale separation for wave propagation problems. For limited frequency band data, seismic waves do not 'see' the true Earth model, but a blurred version of it. This blurred Earth model is called the effective or the homogenized Earth model. For a given signal maximum frequency f m , it can be computed thanks to the homogenization technique.
As mentioned earlier, in most media, the maximum frequency f m can be associated with a minimum wavelength λ min , and in the following, we assume that a minimum wavelength λ min exists. We moreover assume that a lower bound estimate of λ min can be found as V Smin /f m where V Smin is the slowest velocity in the medium. This is a strong assumption. However, to the best of our knowledge, natural media always satisfies it (while synthetic complex meta-material can break it: for instance, resonating media with Helmholtz resonator inclusions (Zhao et al. 2016)).
We introduce λ 0 , a constant separating scales considered as macroscopic (large scales) from the one considered as microscopic (fine scales). We then introduce a small parameter which measures the scale separation position with respect to λ min . For example, when homogenization is used as a preprocessing step to the forward modelling solver, in order to obtain a precise agreement between true and homogenized waveforms, ε 0 = 0.5 is often a good choice for most geological media and a few tens of wavelength of wave propagation (Capdeville et al. 2010b). Two-scale homogenization techniques are asymptotic methods that explicitly use two space variables: x for the macroscopic spatial variations and y = x ε 0 for microscopic spatial variations. For example, for a multiscale medium such as a geological medium, the density ρ depends on both macroscopic and microscopic scales and is written ρ x, x ε 0 . Similarly, the solution of the wave equation with a maximum frequency f m is written u fm x, x ε 0 , t . Although these properties are written as if they depend on two independent scales, it should be noted that these variables (and others to follow) only have a physical meaning once evaluated on the axis y = x ε 0 . A function κ that would not depend upon the microscopic scale would be written κ(x). For some quantity κ(x, x ε 0 ) depending on both scales, we may find an 'effective' version κ * (x) that only depends on the macroscopic scale. In the following, effective quantities are noted with a * . In general, the relation between two scales and effective quantities is complex and not just a simple linear smoothing. All the effective quantities actually depend on the actual value of ε 0 and λ min . In the following, however, we drop this explicit dependency on ε 0 and λ min to simplify the notations. An effective quantity κ * ,ε 0 ,λmin is simply denoted by κ * .
The main results of the homogenization theory are the following: (i) to the first order in ε 0 , the relation between the true and effective displacement is where u * is the effective displacement, χ is the first order corrector and (u * ) is the strain related to the effective displacement ). This first-order corrector χ is a third order tensor. It does not depend on time nor on sources, however, it depends on the fine scales x/ε 0 , ε 0 and λ min (see Appendix A).
(ii) The effective displacement u * is solution of the following effective wave equation where T(u * ) = c * : (u * ) · n * , ∂ * is the effective boundary, n * its outward normal and * is a Dirichlet to Neumann operator (DtN) for the effective boundary condition (Capdeville & Marigo 2008, 2013. It shall be noted that, to the leading order in ε 0 , the effective wave eqs (18) and (19) is also a wave equation, comparable to the original wave eqs (6) and (7), but with different elastic and density coefficients. This remains true to the first order in ε 0 , but the original Neumann boundary condition changes to become the DtN condition (19).
(iii) The effective density ρ * (x), elastic tensor c * (x), the boundary layer corrector * (x) as well as the corrector χ (x, x/ε 0 ) can be obtained thanks to the homogenization operator H: The operator H depends on ε 0 and λ min . As λ min is directly tied to f m , H is a frequency dependent operator. For a fixed ε 0 , the roughness of the effective medium ρ * , c * increases with f m . The nonlinear process behind the homogenization is summarized in Appendix A.
From eq. (17), two important conclusions can be drawn: to leading order in ε 0 , the displacement does not depend on the microscopic scale x ε 0 . To the first order, the displacement depends on the microscopic scale x ε 0 , but only through a 'site effect' χ, that is independent of time and sources and which needs to be known only at the receiver positions.

1098
Y. Capdeville and L. Métivier (i) Finally, the effective source mechanism q * s can also be required. For moment tensor sources, the moment tensor has to be corrected to the leading order in ε 0 as where G is the strain concentrator and is defined in Appendix A.
The order 1 correction for vector forces can be found in Capdeville et al. (2010a).
Effective models belong to a space that is more complex to define than the fine scale models: (i) the effective domain * itself can be different from the original domain : a fine scale topography can be replaced by an effective one (Capdeville & Marigo 2013); (ii) E * ( * ), the effective elastic properties admissible space gathers all the possible effective elastic tensor and density. The main difference with E is that it is a finite dimensional space and its dimensions is proportional to (f m /ε 0 ) d (see Appendix A); (iii) the source parameter space remains unchanged, unless first order correctors are introduced; (iv) a new space, C gathering the receiver correctors χ at the receiver locations is necessary. Its dimension is proportional to N r ; (v) a new space, G * gathering all the possible boundary correctors * is also required. Its dimension is also finite and proportional to (f m /ε 0 ) d − 1 ; (vi) finally, note that no new space is required for the source correction term (21). To the leading order, the corrected moment tensor is still a moment tensor and can be inverted as usual. This also implies that only the corrected moment tensor can be retrieved from the seismic data, not the true moment tensor (Burgos et al. 2016).
The effective model space is also a finite dimensional space. It is the image, or the projection, of M through the homogenization operator H: In general, M does not contain M * This can be due to the receivers and boundary correctors, but this might not be the only reason. Another important case where M * is not included in M is related to the situation where M is restricted to isotropic media. Because fine scale isotropic heterogeneities lead to anisotropic effective heterogeneities, M * contains anisotropic media, even for isotropic M: thus, M * might not be included in

M.
Finally, a model m * ∈ M * can be written as

T H E H O M O G E N I Z AT I O N F U L L WAV E F O R M I N V E R S I O N ( H F W I ) P RO B L E M
Based on the homogenization framework, we can define another misfit function: for any model m * ∈ M * , is the zero order effective displacement plus the first order corrector at x. The associated effective (or homogenized) minimization problem is Eqs (26) (17) Knowing that m t minimizes E and obviously also minimizes E fm , we deduce from the last equation that m * t = H(m t ) is a solution of the HFWI problem (28). The solution of the HFWI problem therefore belongs to M * by construction. This fact is not true for many M h choices for classical FWI approaches. For example, M h is often limited to isotropic media whereas the solution of the inverse problem is anisotropic due to upscaling effects, implying the solution does not belong to M h . This is an important result: the homogenized version of the true model is a solution of the HFWI problem. If we then assume that the solution of the HFWI problem is unique, this leads to establish the relation between the true model and the limited frequency band inverse problem solution Assuming the uniqueness of the solution of the HFWI problem is a strong statement. Nevertheless, this uniqueness can be reached, as it has been illustrated numerically in the layered media case . We show in this study that uniqueness of the solution is also possible in the case of 2-D spatially varying media, at least for low noise and good data coverage. If this assumption is not met, which can happen in many situations (bad coverage, bad quality data ...), we fall back to a classical local minimum problem. In such cases, similarly to most other local optimization methods, little can be done apart from using better data coverage, better quality data, relying on a better starting model, or introducing a priori information. Compared to more classical finite dimensional space approximation M h used to solve the FWI, using M * and solving the HFWI problem thus presents the following advantages: (i) similarly to M h , M * is a finite dimensional space (see Appendix A) rendering the HFWI problem solvable; (ii) the relation between the solution of the HFWI problem and the true model is known through (30). This point is very important for the interpretation of the inverted model (also called 'downscaling' step); (iii) the solution of the HFWI problem exists and belongs to M * .
In the next section, we present how to practically solve the HFWI problem.

The minimization scheme
We rely here on the least squares approach and local optimization algorithms, a pragmatic and well adapted choice for the HFWI problem. In this study, we implement the Gauss-Newton iterative scheme (see for instance Tarantola & Valette 1982).
Given m i , the inverted model at iteration i, we obtain the model at the iteration i + 1 as where d is the data vector for all sources, receivers and components, u * (m i ) is the effective synthetic seismograms computed in model m i also for all sources, receivers and components, F i is the partial derivative matrix (matrix of the Fréchet derivatives) The damping parameter λ i is used to stabilize the inversion of the approximate Hessian (F i ) · F i . We rely on the adjoint technique to numerically evaluate the waveform partial derivative F i (Tarantola 1984;Pratt et al. 1998). Practically, the full wavefields for each source and adjoint source are stored on disk and the partial derivatives are assembled afterward. For modest size 2-D tests, this solution can be efficiently implemented in terms of memory requirement and computational resources. The damping λ i is decreased while the iteration number increases until convergence. Whatever the technical choices made to solve (28), it implies describing the homogenized model space M * . The most direct way to do so would be to set up an explicitly parametrization of this space. We consider two cases: the layered media case, for which such an explicit parametrization already exists, and the general case, for which it is not yet available.

Parametrization: the layered media case
The layered media case is a special case in the sense that there exists an analytical solution to the cell problem (A3). VTI layered media can be described by five elastic parameters A(z), C(z), F(z), L(z), F(z) (e.g. Stoneley 1949) where z is the axis perpendicular to the layering. In such a case, the homogenization operator H consists of the three following steps (Backus 1962;Capdeville & Marigo 2007): (i) build the Backus parameter vector: (ii) upscale the Backus vector with a simple spatial low-pass filtering (see Appendix A; this low-pass filtering in space shall not be confused with the time domain low-pass filtering used previously to limit the data frequency band): (iii) build back the effective transversely isotropic coefficients: , p * 5 , p * 6 (z) .
Note that eq. (34) is equivalent to muting coefficients in the Fourier domain for spatial frequencies larger than k 0 . Ifp are the Fourier coefficients of p, where L z is the layered domain, then, eq. (34) corresponds to wherew k are the Fourier coefficient of the filtering wavelet hidden in F k 0 . In such a case, we can define a parametrization based on the Backus parameter in the Fourier coefficient domain. Forgetting about receiver and boundary correctors, we can build M * as which defines a finite dimensional explicit parametrization of M * . A similar parametrization has been used by  (but with Lagrange polynomial instead of Fourier polynomial) to numerically illustrate that an FWI model is indeed an homogenized model in the layered media case.

Parametrization: the general media case
For more general media, an explicit parametrization of the homogenized model space is more difficult to set up. This is due to the fact that the analytical solution to the cell problem (A3) used in the layered media case does not exist for higher dimension cases (2-D and 3-D heterogeneities). It could be nevertheless possible to design an explicit M * parametrization and this is a possibility we intend to study in the future. In this study, we rely on a simpler idea: instead of searching for a solution directly in M * , we rely on an approximate M * h and use the homogenization operator H to project the solution back to M * . For iterative schemes such as the Gauss-Newton algorithm used in the present work, the projection can be done at each iteration (see Fig. 2), or only at convergence, 1100 Y. Capdeville and L. Métivier as needed. If an increasing frequency band is used through the iterations (for frequency-grid continuation (Kolb et al. 1986;Bunks et al. 1995)), the projection needs to be done at least at each frequency band jump. This is a simple solution, which comes with the following drawback: depending on the practical choices made for M * h , the iterative inversion scheme may fail to converge if M * h restricts too much the update from progressing toward the solution once the new model is homogenized. Unfortunately, we are not able yet to provide a precise criteria on M * h to ensure the convergence. Indeed, there are an infinite number of possibilities for practically setting up an explicit parametrization for M * h . In the following examples, limiting ourselves to 2-D cases, we rely on a piece-wise polynomial basis for the spatial parametrization. More precisely, a square inversion sub-domain I ⊂ is chosen and divided into n × n non-overlapping elements: An example of inversion domain I and an associated inversion mesh for n = 10 is shown in Fig. 3(a). Over each element I n e , similar to what is done for spectral elements, elastic parameters and density are represented using a 2-D tensorial product polynomial approximation of degree N in each direction. This defines the parametrization P N n (I) (n × n elements of degree N × N) that we use for the numerical tests in the next section. We do not impose the continuity of the fields between elements, which implies that P N n (I) has n 2 × (N + 1) 2 degrees of freedom for each scalar.

S Y N T H E T I C I N V E R S I O N T E S T S
In this section, we present a series of examples illustrating the ideas presented above. More precisely, the objectives are to illustrate through numerical experiments: (i) the validity of eq. (30) in the presented case studies; (ii) the influence of the practical choice of M h for standard FWI; (iii) the influence of the practical choice of M * h for HFWI; (iv) that the HFWI problem can be successful where a classical FWI fails.
We run three synthetic inversions in the two heterogeneous models presented in Figs 3 and 4. In both 'true' elastic models, unknown heterogeneities are embedded in a known homogeneous isotropic background. The background density, P and S wave velocities are denoted by ρ 0 , V P0 and V S0 respectively. We use 16 sources and 16 receivers, regularly located around the heterogeneities.
The source mechanism is the same for all sources: it is an horizontal vector force. The source time function g s (t) is a tapered door function in the frequency domain (see Fig. 5) which has a maximum frequency f m . This source time function is the same for all the sources. We define the background minimum wavelength as In the following, all distances are measured as functions of λ min , the frequencies as functions of f m and the time as a function of 1/f m . For both settings, the domain size is 16 × 16 λ 2 min with absorbing boundary conditions around the domain. The inversion sub-domain I is a square of dimension 12 × 12 λ 2 min . The sources and receivers are located around I leading to an excellent data coverage. Nevertheless this coverage is still sparse as the distance between each receiver and each source is equal to 3.2 λ min . Note that the sources and receivers are located outside of I and that no free surface is  Table 1 for a complete description of the elastic material properties of those models.  Table 1 for a complete description of the elastic material properties of this model.  present in the inverted domain. The idea of such settings is to avoid inverting for the receiver corrector χ , the source parameters as well as the boundary corrector * and to focus on the reconstruction of ρ * and c * . For the first setting, ('circular inclusions', Fig. 3) two circular inclusions, one faster and one slower than the background medium are present. The faster inclusion is surrounded by a thin slow layer representing a damaged layer. The thickness of this layer is λ min /8. Both the sharpness of the circular inclusions and the thin damaged layer are fine scale and can only be recovered in an effective way by FWI. Two velocity contrasts are used for the inclusions in two different experiments, one labelled as 'weak contrast', the other one labelled as 'strong contrast' (see Table 1 and Fig. 3c).
For the second inversion setting, ('faulted layers', Fig. 4), a horizontally layered medium is split by a tiled fault. The layer thickness is also λ min /8. Both the layers and the tilted fault are fine scale which can, again, only be recovered in an effective way by FWI.
Choosing λ min = O(1cm), the 'circular inclusions' model could represent for instance a piece of concrete with an iron bar inclusion (the fast inclusion 'A') and two damaged areas (the slow inclusions 'B' and 'd'), one circular and one around the bar. Choosing λ min = O(10 m), the 'faulted layered' model could represent a more geological situation where the sediments represented by the layers are offset by a tilted fault. In both cases, imaging as best as possible the heterogeneities is the challenge.
For the first situation, the acquisition setting could be thought as 'realistic', however this is not the case for the second. Nevertheless, our objective here is to focus on the ideas developed in the previous sections without mixing with other important issues such as the data coverage. This will be the purpose of future works.
For each experiment, the data to be inverted are generated using a 2-D spectral element solver (Komatitsch & Vilotte 1998), meshing all the interfaces of the heterogeneities (see Figs 3b and 4b).
Perfectly Matched Layers (PMLs; Festa & Vilotte 2005) are used to absorb outgoing waves around the domain. No synthetic noise is added to the synthetic data. The spectral element mesh used to compute the approximate Hessian and the gradient in eq. (31) is a simple regular mesh based on the inversion parametrization (on the I n e elements). In these synthetic tests, we avoid an 'inverse crime' by virtue of the following: first, the fact that the spectral element meshes used to generate the data to be inverted (d fm ) and to model the data during the inversion process (u * ) are not the same implies that the discretization errors are different. Second, the true model cannot be accurately represented with the inversion parametrization, which prevents us from injecting prior information about the inclusion shapes into the inversion. Nevertheless, the PML error (remaining small reflections from the domain boundaries) are the same for the data generation and inversion, and this error might be used by the inversion scheme.
The following inversion tests often involve the full elastic tensor c ijkl and its six independent coefficients (in 2-D). To present the results, we choose to first project anisotropic c to the nearest isotropic tensor c iso following Browaeys & Chevrot (2004). Then, P-and S-wave velocities are defined as and the total anisotropy as (42)

Circular inclusion weak heterogeneity test
In this Section, we present a first test using the circular inclusion model with the 'weak' heterogeneities (see Fig. 3 and Table 1). The heterogeneities are chosen to be weak enough so that the Gauss-Newton scheme can converge without projecting to the homogenized model space at each iteration. In other word, the Gauss -Newton scheme is applied classically without using homogenization.
The fine scales of the model such as the layer 'd' and also the sharp velocity jumps between the background media and the inclusions cannot be recovered by the inversion. Indeed they are below the resolution limit, and, technically, they cannot be represented in a spatial parametrization without prior knowledge about their shape (i.e. without explicitly meshing the discontinuities). This situation is realistic: in general, the heterogeneity geometry is unknown.
We perform three different inversions each using a different model space M h parametrization: (i) M 1,iso 30 : P 1 30 (I) with ρ, V P and V S ; (ii) M 3,ani 10 : P 3 10 (I) with ρ and c; (iii) M 1,ani 20 : P 1 20 (I) with ρ and c; where the notation P N n (I) for the decomposition of the domain I in n × n elements of degree N × N is defined at the end of Section 5.3. These three parametrizations have roughly the same number of free parameters (unknowns), 30 2 × 2 2 × 3 = 10800 for M 1,iso 30 , 10 2 × 4 2 × 7 = 11200 for M 3,ani 10 and 20 2 × 2 2 × 7 = 11200 for M 1,ani 20 . The three inversions are carried out and the Gauss-Newton scheme is stopped when a 98 per cent reduction in the square-root of the misfit function is reached. In each of the three cases, the inversion converges in about 10 iterations.
The raw results of the inversions are presented in Figs 6 and 7. By raw results, we mean the inversion results before any homogenization. From these figures, it can be noted that the three inversions qualitatively retrieve the location and signs of the reference model heterogeneities. Nevertheless, quantitatively, the models are different from each other and all include a significant amount of noise. Indeed, from the V S maps (Fig. 6), the heterogeneous circles can be roughly located and the sign of the heterogeneity contrast (slow or fast) can be guessed. Even the slow layer around the inclusion 'A' seems to be reconstructed. Nevertheless, from the cross-sections (Fig. 7), it appears that the results are not accurate. The actual velocity values in each inclusion are not recovered. Moreover, the results strongly depend on the parametrization choice: the imprint of the inversion mesh is clearly visible and, for example, it is difficult to measure the value of the actual thickness and the V S value of the thin slow layer 'd'.
We next follow the main idea of this paper and project all the inversions result as well as the reference model to the homogenized space using the homogenization operator H. To this purpose, we need to give a value to the ε 0 parameter (see eq. 16). As mentioned earlier, when dealing with forward modelling problems, ε 0 = 0.5 is often a good choice. However, for this projection step any value can be chosen in practice. To focus on long wavelength results, we choose ε 0 = 2, to study results at the resolution limit, ε 0 = 0.5 is used while for intermediate results we choose ε 0 = 1. The homogenized reference model and those resulting from the three inversions are presented in Figs 8 and 9. From the 2-D maps obtained for ε 0 = 1, it appears clearly that the four models are the same once homogenized. This is quantitatively confirmed with the model cross-sections (Fig. 9) plotted for three different values of ε 0 (2, 1 and 0.5). For large ε 0 (low resolution), the models are all the same within a small error. For small ε 0 (high resolution), the models are still in good agreement, but with a larger dispersion. The fact that the error becomes larger with smaller ε 0 , for a fixed maximum frequency f m , is expected: when reaching the resolution limit ( λ min /2, which corresponds to ε 0 = 0.5), the wavefield sensitivity to the heterogeneities is lower, leading to a poor constraint on their amplitude. A good compromise between resolution and accuracy seems to be here ε 0 = 1.
It is interesting to note that the isotropic inversion (M 1,iso 30 ) is able to retrieve the expected anisotropy once homogenized (even if its amplitude is slightly too low). It implies that the inversion is able to find an isotropic model, that is not the true model, but that once homogenized is in good agreement with the true homogenized model.
Finally, it can also be noted that the density is retrieved relatively accurately (the density results seem to be more noisy, but this should be related to the fact that the density contrast is quite small compared to the velocity contrasts associated with V S and V P ).
From this first experiment, we can conclude that the elastic models resulting from FWI are definitively not unique: they depend on the particular choice of parametrization for M h . However, the homogenized versions of these models are actually all the same.

Faulted layers test
A case study similar to the previous one is presented here, with the faulted layers model described in Fig. 4. Only two parametrizations are tested: M 1,ani 20 and M 1,iso 30 . The inversions are carried out using the same settings as the one defined for the previous case study, with the same misfit reduction stopping criteria. The M 1,ani 20 inversion converges within 20 iterations, but the M 1,iso 30 inversion stops progressing after four iterations and only a 50 per cent misfit reduction is achieved. As it can be seen in the raw inversion results (Figs 10 and 11), it is not possible to find a model explaining the data in M 1,iso 30 , at least with a local optimization algorithm such as the Gauss-Newton scheme used here.
To test if the homogenized inversion principle can help in this case, we use the algorithm presented in Fig. 2: at each Gauss-Newton iteration, we project the obtained model to the homogenization space with the H operator using ε 0 = 1/3. We therefore use M 1,iso 30 for M * h . In the following, we refer to this inversion as the M 1,iso, * 30 parametrization inversion. Inserting this projection step, the inversion almost converges (a 96 per cent misfit reduction was obtained) after about 50 Gauss-Newton iterations. Of course, there is here a little interest in using the homogenized inversion compared to inverting in M 1,ani 20 (the inversion convergence is slow), but it shows that projecting to the homogenized space can help to recover a failing inversion. At this stage, we cannot conclude if this observation goes beyond what could be obtained with simple Tikhonov regularization (see e.g. Brenders & Pratt (2007)) and this will need to be investigated deeper in future works. This M 1,iso, * 30 inversion is the first presented example of the HFWI algorithm.
The raw results for the M 1,ani 20 and M 1,iso 30 inversions are the ones presented in Figs 10 and 11. There is no raw results for the M 1,iso, * 30 inversion because each iteration is homogenized. From these raw results, we see that, differently to the circular inclusions test, it is only possible to tell that an heterogeneity is present in the layered area, without any visible or qualitative detail on it. We can only tell that, for the M 1,ani 20 inversion, the layered area appears as a slow velocity area. It is not possible to guess anything about the layering and even less about the fault.
Following the main idea of this study, Figs 12 and 13 depict the homogenized true model, the homogenized M 1,ani 20 , M 1,iso 30 and M 1,iso, * 30 inverted models, for ε 0 = 1. Similarly to the previous test, the M 1,ani 20 inverted model is, once homogenized, in very good agreement with the homogenized true model. These plots also confirm that the M 1,iso 30 has failed to converge to a correct model. Nevertheless, when the homogenization is used at each Gauss-Newton iteration, the same inversion M 1,iso, * 30 result is, once homogenized with ε 0 = 1, in very good agreement with the homogenized true inversions, the position of the fault can also be guessed from the total anisotropy 2-D plot (Fig. 12,  bottom panels).
From this test, we conclude that, depending on the model we want to retrieve, the choice of M h can influence the convergence of a classical FWI. In particular, choosing an anisotropic parametrization is a good option, even if it seems apparently more difficult. This example illustrates the advantage of projecting to the homogenized space at each iteration of the inversion.
Finally, Fig. 14 depicts horizontal cross-sections in the ε 0 = 1 homogenized reference and M 1,ani 20 inverted models for the c 1112 elastic coefficient. In a vertically transverse isotropic (VTI) model, we have c 1112 = 0. Knowing that a horizontally layered model becomes a VTI model through homogenization and that the faulted layers test model is horizontally layered everywhere but near the fault, we expect c 1112 to be zero except near the fault. This is indeed what is observed in Fig. 14 from both the homogenized reference and inverted models. It implies that, provided that we have the prior information that there is potentially a fault in the medium, it would be possible to identify it from the inversion, even at this low resolution. Such an interpretation of the inverted results can be seen as a downscaling inverse problem and is discussed in Section 7.

Circular inclusion strong heterogeneity test
Finally, for the last test, we come back to the circular inclusion configuration already used in Section 6.1, with stronger velocity contrasts, especially for the slow regions (see Fig. 3 and Table 1). The effect of the heterogeneity on the waveforms is large and none of the parametrizations already used, with or without homogenization at each step, is able to converge and to avoid falling in a local minimum. To resolve this problem, we rely on the classical frequency-grid continuation solution (Kolb et al. 1986;Bunks et al. 1995). We first perform the inversion in a low frequency band [0, f m /2] and use the homogenized inversion results to start the inversion in the full frequency band [0, f m ]. Applying these two successive frequency band inversions, the simple Gauss-Newton scheme (without homogenization) systematically proposes a non-physical model after a few iterations and fails to converge. We therefore rely on the homogenized Gauss-Newton scheme and use M 0,ani, * 20 for the low-frequency band step and then M 1,ani, * 20 for the full-frequency band step. In this framework, the convergence rate is slower than for the first experiments (about 20 iterations for the first step and 40 for the second), however the misfit function is decreased down to the 99 per cent objective and the convergence is reached.
The final results of the inversion, homogenized with ε 0 = 1, are summarized in Fig. 15. Once again, a very good agreement between the homogenized true and the homogenized inverted models is obtained. The high amplitude of the effective anisotropy (up to 30 per cent), knowing the isotropy of the underlying thin scale model shall be noted. The different anisotropy pattern between heterogeneities with or without a slow layers around them could help to find their finer scale characteristics in a downscaling process.
From this last case study, we conclude that the homogenization can help FWI to successfully converge in this difficult case.

D I S C U S S I O N
One of the objectives of this paper is to extend the results obtained in the layered media case , to higher dimension cases (here 2-D), which illustrates the fact that the model obtained from FWI are, at best, an homogenized version of the true model. It establishes the relation between the inverted and true models for limited frequency band data. Even if this relation might not always be true, the tests performed here seem to confirm this assumption: in cases where the inversion converges without homogenizing at each Gauss-Newton step, the inverted raw models are clearly Figure 8. Weak contrast circular inclusion test reference model and three different inversion results, all homogenized with ε 0 = 1 (see Section 6.1). V S , V P , ρ and the 'total anisotropy' are presented (lines of panels from top to bottom, respectively ). Reference, M 1,ani 20 , M 3,ani 10 and M 1,iso 30 inverted models are plotted (left to right columns of panels, respectively). V S , V P and the 'total anisotropy' are obtained following eqs (40)-(42). parametrization dependent. Nevertheless, once homogenized, the different models are all the same and consistent with the homogenized true model. We interpret this observation in the following way: the differences between raw inversion results depend on fine scales in the parametrization that are not uniquely constrained by the data. In other words, these fine scales may be necessary to explain the data, but they are not unique and depend on the parametrization. For example, using an isotropic parametrization (such as M 1,iso 30 in Section 6.1), knowing that the effective medium behave in an anisotropic way, might not prevent the convergence of the inversion. But this strategy may find isotropic fine scales allowed by the parametrization which will try to emulate the correct anisotropy in the effective sense. Depending on the parametrization, these fine scales will be different, but they will all have the same effective effect. We conclude that FWI, indeed, finds the homogenized model and not the true model. The next related important point is to ensure that the assumption that the solution to the homogenized inverse problem is the homogenized true model (eq. 30) is true. Once again, for all the examples tested here, this relation seems to be satisfied. A corollary of this observation is that, in the homogenized space, the FWI inversion has a unique solution. This is true at least in the presented example and all the other tests we have done so far. This is definitely not a proof but, at least, it shows that it is a reasonable assumption. Cases breaking the uniqueness are more likely to be found for poor data coverage or noisy data. In such cases, the relation between the true and inverted models through homogenization is broken which is a serious issue for the interpretation stage. But this is not specific to HFWI.
This framework gives some new insight about the FWI process, even if the homogenization is not used. For example, many FWI inversion scheme makes use of spatial smoothing to regularize their gradient update, especially near the sources and receivers (Williamson et al. 2011;Trinh et al. 2017). Based on our results, knowing that, for low-velocity contrast, the homogenization operator H is close to a simple low-pass spatial filtering F , it makes sense to low pass-filter results from an FWI: for weakly contrasted models, this is equivalent to homogenizing the inverted model.  Another example where the presented work gives a better understanding about common practice is the frequency-grid continuation methods (Kolb et al. 1986;Bunks et al. 1995). Using lower frequencies reduces the homogenized model space dimensional (see eq. A9), and, if low enough frequencies are present in the data, a misfit function with a single global and local minimum can be reached.
Our work also raises questions about other practices: for instance, inverting for fine scales, such as sharp interfaces, based on limited frequency band data, appears to be counterproductive. Indeed, even if, for instance, the true model contains sharp interfaces, inverting for these interfaces makes the problems strongly nonlinear. More importantly, the imprint of these interfaces in the bandpassed seismic data is very weak, making the inverse problem poorly constrained. It thus appears more reasonable to invert for a smooth anisotropic media and leave the interpretation of the inverted model (such as the detection of sharp interfaces) to a second-step downscaling inversion.
The resolution of FWI or HFWI is directly linked to λ min and to the choice of ε 0 . In the forward modelling context, for a fixed λ min value, depending on the desired accuracy, the number of propagated wavelengths and on the model velocity contrast, a good choice for ε 0 lies between 0.5 and 0.25 (Capdeville et al. 2010b). We could thus deduce that the FWI resolution can reach λ min /2 to λ min /4 resolution. Nevertheless, the different tests performed here show that with ε 0 = 0.5, the noise on the results is significant, increasing the uncertainty on the reconstruction (leading to large error bars). It is therefore probably best to limit the results to larger ε 0 , such as ε 0 = 1, for which the error appears lower. This could be potentially improved by increasing the number of sources and receivers, but this observation testifies that the sensitivity of the waveforms to heterogeneities decrease strongly for ε 0 close to 0.5, and that the resolution limit should be set closer to 1 than 0.5. Note that these numbers actually depend on where the actual maximum frequency f m is placed: in Fig. 5, f m is really the maximum frequency, but its position could have been chosen at the end of the plateau ( 0.8 of the current f m ). This would give an apparent more optimistic value for the resolution limit of the inversion, but it would not change the final images. This should be kept in mind when discussing the FWI resolution limits as, due to possible difference in the definitions, the λ min /2 resolution of one can be the λ min /4 of another.
One important notion, both for the homogenization and for the inverse problem, is the minimum wavelength λ min . Here, we have assumed that λ min is roughly constant all over the domain. For more realistic situation, especially for geological media, λ min may vary dramatically over the inverted domain, especially with depth. Such a case is not a real issue but an adaptive homogenization, that already exists for layered media (Capdeville et al. 2013,appendix B), would be needed for 2-D and 3-D media. Media that do not present a minimum wavelength for some frequencies are more problematic and cannot be inverted with any FWI methods (FWI or HFWI). Media with Helmholtz resonator inclusions (Zhao et al. 2016), but also with fluid or void inclusion of size comparable or larger to the minimum wavelength are among those. Being able to apply FWI or HFWI schemes to such media would require to use very low frequencies only (such that a minimum wavelength exists anyway) or a drastically different approach.
Another difficulty related to λ min is that, in the presented algorithm and examples, it has been assumed to be known a priori. For realistic situations, this often will not be true. Indeed, if the starting model is far enough from the homogenized true model, the true λ min can be significantly different from the initial estimate and therefore the homogenization may be poorly tuned. In such a case, the λ min is part of the unknown which makes the whole process implicit. An iterative scheme is therefore necessary to find the correct λ min . This is probably not a great difficulty but it will need to be tested in a future work.
As it has been shown in the examples, the parametrization choice for M h or M h, * has an influence on the raw results, but not on the homogenized results, if the inversion has converged. It appears that, for relatively weak velocity contrast target models, this choice is not critical as long as there are enough degrees of freedom in the parametrization. For intermediate 1108 Y. Capdeville and L. Métivier Figure 12. Faulted layers test reference model and inversion results, all homogenized with ε 0 = 1 (see Section 6.2). V S , V P , ρ and the 'total anisotropy' are presented (lines of panels from top to bottom, respectively ). Reference, M 1,ani 20 , M 1,iso, * 30 and M 1,iso 30 inverted models are plotted (left to right columns of panels, respectively). V S , V P , and the 'total anisotropy' are obtained following eqs (40)-(42). velocity contrast target models, classical FWI still converge, but using a fully anisotropic parametrization (M N ,ani n ) is necessary. In any case, we found it is safer to use a fully anisotropic parametrization than a isotropic parametrization. Once the fully anisotropic tensor is inverted for, the spatial parametrization is not really critical as long as it provides between 3 to 4 degrees of freedom per wavelength in one direction. We nevertheless observe a slightly better stability of the inversion for low polynomial degree (e.g. using P 1 20 is more stable than P 3 10 for the same number of degree of freedom). An interesting consequence is that a rough parametrization, such as square blocks containing constant elastic properties, performs very well in terms of stability and convergence, better than a high order polynomial parametrization, as long as there are enough degrees of freedom per wavelength. In any case, once homogenized, the results correspond to the homogenized true model.
As we saw above, the model resulting from HFWI are homogenized models and cannot be true models. Such models are well suited for data prediction. This could be very useful in many situations, such as source localization and moment tensor inversions. But they are poorly suited to geological interpretation. Indeed, because true interfaces are blurred and apparent anisotropy is introduced, interpretation can be difficult and misleading (for an example, see . Note that this issue is not specific to HFWI: any seismic inversion based on limited frequency band data is subjected to this issue. At least, with HFWI, it is made clear that the results should be interpreted with care. To properly interpret the homogenized model obtained through HFWI, we propose a secondstep inverse problem: the downscaling or de-homogenizing inverse problem. The idea of such a new inverse problem is to find all the finer scale models both compatible with some a priori information and the HFWI model. Such an inversion solution is highly  non-unique and needs to be carried out with a global search approach. The idea has already been successfully tested in the layered model case (Bodin et al. 2015) and its development for higher dimension problems needs to be done. Combining HFWI and a downscaling inversion is an indirect way to address the full waveform inverse problem with a fully probabilistic approach. If the overall context of this work is seismology at all scales, our results can be extended to other domains, such as non-destructive testing at all scales with (almost) no modifications. Furthermore, although we have limited ourselves to the elastic case, the acoustic case is probably very similar and most of our conclusions would probably remain valid. Nevertheless, there are some important differences between the two cases related to the non-uniqueness of the inverse problem that would deserve a particular attention.

C O N C L U S I O N S A N D P E R S P E C T I V E S
We have confirmed numerically that an FWI based on limited frequency band data can retrieve, at best, the homogenized version of the true model. The relation between the true and inverted models is known through eq. (30). This is an important progress as, for classical FWI scheme, this relation is not known. This relation opens the door to a proper interpretation of the inverted model with a downscaling inverse problem.
To make the principle exposed here applicable to more realistic geological cases, many aspects still need to be explored. Because, in general, the source and receivers are in the inverted area, their associated correctors need to be inverted. Similarly, the free surface boundary corrector needs to be inverted. The fact that, in general, the minimum wavelength strongly varies over the domain also needs to be accounted for. The adaptive homogenization is potentially a good solution for such a problem. An explicit parametrization instead of the implicit one used here should be attempted. The general downscaling problem needs to be addressed and, finally, these works need to be extended to 3-D. All these aspects will be explored in future works.

A C K N O W L E D G E M E N T S
We are very grateful to Gerhard Pratt and Michael Afanasiev for their positive and useful reviews that greatly helped to improve the manuscript. We thank R. Brossier, J. Virieux, A. Fichtner, T. Bodin and the European COST action TIDES (ES1401) for fruitful discussions. This work was funded by the HIWAI ANR project (ANR-16-CE31-0022-01). The computations were done on the Centre de Calcul Intensif des Pays de la Loire (CCIPL) computers.

A P P E N D I X A : T H E H O M O G E N I Z AT I O N O P E R AT O R H
In this appendix, we summarize the practical steps necessary to build the homogenization operator H. The complete development and theory can be found in Capdeville et al. (2010b. To practically implement the user-defined λ 0 scale separation, we need to introduce a low-pass filter operator F k 0 , such that, for any quantity g(x), F k 0 (g)(x) does not contain any spatial variation smaller than λ 0 = 1/k 0 . This low-pass filter operator can be written as where * is the spatial convolution and w k 0 is the filter wavelet. Once λ 0 is chosen, it sets the value of ε 0 through (16). The different steps allowing us to build the upscaling operator H such that and to find the correctors are the following: (i) Step 1. We first solve the so-called cell problem to find the initial guess correctors χ lm s (x). It consists in solving the following elastostatic set of problems (3 in 2-D, 6 in 3-D) in : ∇ · c : χ lm s = F lm , F lm = −∇ · (c : (e l ⊗ e m )) , with periodic boundary conditions on ∂ , where the e i , i ∈ {1, ..., d}, are the Cartesian unit basis vectors.