Bessel smoothing filter for spectral-element mesh

5 Smoothing filters are extremely important tools in seismic imaging and inversion, such as for 6 travel-time tomography, migration and waveform inversion. For efficiency, and as they can be 7 used a number of times during inversion, it is important that these filters can easily incorporate 8 prior information on the geological structure of the investigated medium, through variable co9 herent lengths and orientation. In this study, we promote the use of the Bessel filter to achieve 10 these purposes. Instead of considering the direct application of the filter, we demonstrate that 11 we can rely on the equation associated with its inverse filter, which amounts to the solution of 12 an elliptic partial differential equation. This enhances the efficiency of the filter application, and 13 also its flexibility. We apply this strategy within a spectral-element-based elastic full waveform 14 inversion framework. Taking advantage of this formulation, we apply the Bessel filter by solv15 ing the associated partial differential equation directly on the spectral-element mesh through the 16 standard weak formulation. This avoids cumbersome projection operators between the spectral17 element mesh and a regular Cartesian grid, or expensive explicit windowed convolution on the 18 finite-element mesh, which is often used for applying smoothing operators. The associated lin19 ear system is solved efficiently through a parallel conjugate gradient algorithm, in which the 20 matrix vector product is factorized and highly optimized with vectorized computation. Signif21 icant scaling behavior is obtained when comparing this strategy with the explicit convolution 22 method. The theoretical numerical complexity of this approach increases linearly with the co23 herent length, whereas a sub-linear relationship is observed practically. Numerical illustrations 24 are provided here for schematic examples, and for a more realistic elastic full waveform inver25 sion gradient smoothing on the SEAM II benchmark model. These examples illustrate well the 26 efficiency and flexibility of the approach proposed. 27


I N T RO D U C T I O N
Full waveform inversion (FWI) offers the possibility to extract high-resolution quantitative multi-parameters of the subsurface from seismic data.The majority of these applications are carried out under finite-difference (FD) approximation, due to the numerical efficiency of this method and its ease of implementation.Standard formulations of this approach are, however, limited on regular grids, which require significant extra effort in terms of design and computational cost in the presence of surface topography or important geological interfaces (Robertsson 1996;Bohlen & Saenger 2006;Fuji et al. 2016;Huiskes et al. 2016).Finite-element (FE) methods have become popular for regional and global problems, especially spectral-element methods (SEMs), where complex geometry can be handled with accurate numerical calculation of wavefields (Komatitsch & Tromp 1999).In dealing with 3-D elastic FWI, we would like to develop a complete inversion numerical workflow using a spectral FE scheme while including an efficient smoothing filter implemented directly on the FE mesh.
In most geophysical applications, FWI is introduced as an iterative local optimization problem that attempts to minimize the least-squares residuals between the observed and the calculated data at the receiver location.This inverse problem is mathematically ill-posed, which thus leads to the non-uniqueness of the solution.For seismic imaging using either traveltimes or wavefields, the inversion often needs to be stabilized by applying regularization.This can be performed through model-driven regularization (Menke 1984;Tarantola 2005) or by preconditioning the gradient through filtering operators (Guitton et al. 2012).This data-driven strategy is sometimes called model preconditioning (Fomel & Claerbout 2003).These regularization techniques normally assume the particular properties or structure of the model, such as smoothness or 1490 P.T. Trinh et al. geological features, to guide the problem toward the desired solution.In 3-D elastic FWI, the computational cost remains one of the main challenges (Virieux & Operto 2009), therefore decimation of the data volume by reducing the number of computed shots, random subsampling strategies, and/or source-encoding becomes mandatory (Capdeville et al. 2005;Herrmann et al. 2009;Krebs et al. 2009;Warner et al. 2013;Castellanos et al. 2015).This implies a decrease in the data coverage and the presence of cross-talk and artefacts, which makes the inversion problem even more ill-posed.In this context, the role of regularization at each iteration becomes crucial and needs to be performed efficiently.
Dealing with complex structures or geological heterogeneity might also require adequate regularization strategies.These issues can be addressed by attenuation of the wave-number content in a particular direction, such as plane-wave destruction filters (Claerbout 1992;Fomel 2002), or by imposition of expected structures through directional Laplacian preconditioning in the model space (Hale 2007;Guitton et al. 2012).This latter highlights the importance of non-stationary filters in complex geology; for example, the Laplace filter can smooth or drastically reduce local planar events according to a local dip field.
These non-stationary preconditioning filters are expressed as convolution operators, such as Gaussian or Laplacian filters, through The vector m(x) is transformed into a new vector s(x) through the convolution operator C nD , where n is the dimension of the problem (n = 2 or n = 3).In the FD grid, the Gaussian filter (without local rotation) can be applied efficiently to any model or gradient vector, due to the tensorial property of the function.For long filters, recursive implementation can be used to improve the computing performance (Deriche 1992;Van Vliet et al. 1998).These approaches can be extended to FE methods by including a projection between the Cartesian mesh, where the parameters to be reconstructed are located, and the FE mesh.However, this can be limited by the accuracy of the back and forth projections.Filtering can also be applied as a windowed convolution of the filter and the vector (Tape et al. 2010;Peter et al. 2011), as for the SPECFEM open-source package where L is the coherent length associated with the filter: as the kernel of the filter is decaying, the integration can be limited over the finite domain expressed by the effective radius αL from the position x.This convolution approach can be relatively computer demanding, because for each input point to be filtered, contributions of other points of the medium are required in the surrounding volume, which leads to significant computer manipulation, especially for functions with long tails, such as the Laplace filter.
In this report, we highlight that the inverse of given operators C nD can be relatively sparse.Therefore, it is interesting to consider them to obtain the contribution of the smoothing operator (Wellington 2016).Instead of performing the convolution in eq. ( 1), we can consider solving the following eq.( 3), relying on the inverse operator which requires knowledge of the inverse operator C −1 n D .This leads to the definition of Bessel filters, which are defined by the modified Bessel functions (Abramowitz & Stegun 1972) and their sparse inverse operators.As the filter is defined through a non-homogeneous elliptic partial differential equation (PDE) with delta source function, its inverse operator can be expressed as a distribution function.The integral eq. ( 3) can then be efficiently solved using any FD or FE method.In 3-D, applying the Bessel filter twice provides an excellent approximation of the Laplace operator, with negligible mismatch at the origin.This approximation implies that the Bessel filter can be applied either once as a smoothing filter, or twice to reproduce the decay of the Laplace filter.
This paper is organized as follows.In Section 2, we first define the Bessel filter through a PDE with specific boundary conditions, and the approximation of the Laplace filter by Bessel functions in 2-D and 3-D geometries.In Section 3, the application of the Bessel sparse inverse operator on a vector is described by a system of PDEs.The weak formulation of these equations yields a sparse linear system, which is symmetric positive even for variable coherent lengths, dip and azimuth angles.In Section 4, several numerical illustrations and an example from the synthetic 3-D SEAM Phase II foothills model are provided of such smoothing processes.In Section 5, efficient numerical schemes in a high-performance computer environment are presented.We show how a parallel conjugate gradient (CG) iterative solver can be implemented in a matrix-free fashion.The convergence of this CG iterative solver is analysed with respect to several parameters.Significant scaling behaviour is also obtained when this strategy is compared with the explicit convolution method.Conclusions and perspectives are given in Section 6.We highlight that the choice of applying the Bessel filter once or twice depends on the specific application, and this decision will be case dependent.

M E T H O D O L O G Y
Before moving to the mathematical development, we would like to specify some definitions regarding forward and inverse filters.Let us consider the kernel C(x) as a 2-D or 3-D smoothing filter.The application of this filter to a function g(x) is defined by the convolution operator, which is denoted by the symbol '*', through Bessel filter for spectral-element mesh 1491 The inverse filter of C(x), namely C(x) −1 , is defined through the relation (5)

Definition of Bessel filters and their sparse inverse operators
Depending on the space dimension, we introduce the normalized Bessel filters where L x , L y and L z are coherent lengths in the x, y and z directions, and K ν is the modified Bessel function of the second kind, as presented in Appendix A. While the 2-D Bessel filter is the normalized modified Bessel function K ν (r) with ν = 0, the 3-D Bessel filter is the modified spherical Bessel function z −1/2 K ν + 1/2 (r) with ν = 0 (Abramowitz & Stegun 1972).When coherent lengths are uniform over space, these Bessel filters are unique solutions of the following PDEs with radiative boundary conditions B nD (x) → 0 as x → ∞, and the additional bounded constraint on the integral through 0 < B nD (x)dx < ∞.
The construction of these normalized Bessel filters is developed in Appendix B.
According to eqs ( 8) and ( 9) and the definition of the inverse filter in eq. ( 5), the associated sparse inverse of the Bessel filters can be defined as where δ n is defined as It is important to note that Bessel filters only depend on the radial part (Appendix B), which favours the transformation from Cartesian coordinates to polar coordinates (r, θ ) in 2-D, or to spherical coordinates (r, θ , φ) in 3-D Let us consider an original vector m in the model space that we want to smooth.The vector s will be the smoothed vector obtained by applying the Bessel filter B 3D (z, x, y).We can equivalently consider the application of the inverse Bessel filter, which leads to the choice between the solutions of the following two problems: Both systems are convolution over space.The first one embeds a relatively broad kernel shape, where the computation can be demanding.Therefore, we are interested in the second equation.According to the definition of the inverse operator B −1 3D (eq.10), this convolution can be translated into the following PDE over the domain in which the vector m appears on the right-hand side.Once discretized by any FD or FE method, this system leads to a sparse operator and can be solved efficiently, to obtain the smoothed vector s.Our approach is similar in some ways to the structure-oriented smoothing filter proposed by Williamson et al. (2011), which was derived from the diffusion equation as proposed by Fehmers & Höcker (2003).This smoothing process is controlled by a symmetric diffusion tensor, without knowing the kernel shape of the forward filter.8), which defines the 3-D Bessel filter.

Approximation of the Laplace filter by the Bessel operator
The convolution of two Bessel filters can provide an excellent approximation of the Laplace filter in 2-D and 3-D.We first illustrate this for the 3-D case, as this is our main interest.The double operator B −1 3D * B −1 3D is the inverse filter of the operator B 3D * B 3D , which is the solution of the following equation where the function f satisfies the following PDE Changing parameters from Cartesian coordinates to spherical coordinates, where and knowing that the expected solution, the filter B 3D * B 3D , only depends on the radial part, eq. ( 17) now becomes The normalized 3-D Laplace filter L 3D (z, x, y) is given by This is the solution of eq. ( 19), for r > 0, provided the residual term M 3D [L 3D (r)] can be ignored, as The residual term M 3D for the 3-D Laplace filter satisfies which implies that the residual term becomes smaller when the distance r increases.Fig. 1 presents an excellent match between the 3-D normalized Laplace filter defined in eq. ( 20) and the convolution of the two 3-D Bessel filters for L x = L y = L z = 20 m, obtained from solving the PDE defining the 3-D Bessel filter.The mismatch at zero origin comes from the residual terms M 3D [L 3D (r)], which are more important at small distances r.This analysis is also coherent with the 3-D inverse operator of the Laplace filter proposed by Tarantola (2005).
In 2-D, by applying the same workflow, we can find a similar approximation.However, we would like to highlight the flexibility of this approximation by introducing a scaling parameter a into the definition of the 2-D normalized Laplace filter This 2-D Laplace operator can be well approximated by the application of two 2-D Bessel filters, as shown in Fig. 2, except at the origin where there is a singularity.By using different values of the scaling factor a, we have different approximations of the Laplace filter, which mitigates the discrepancies for both the amplitude and the decay.

W E A K F O R M U L AT I O N I N F E M E T H O D S
Following the weak formulation, as usual for FE methods, we can discretize eq. ( 15) and deduce a sparse linear system that can be efficiently solved using an iterative linear solver.First, we start by introduction of the weak formulation in SEM for homogeneous coherent lengths Bessel filter for spectral-element mesh 1493 without dip and azimuth, which naturally leads to a symmetric linear system.We then introduce an efficient method to incorporate variable coherent lengths, and dip and azimuth angles into our linear system, while preserving the symmetry of the left-hand-side matrix.Although the following was developed for SEM, it can be extended to other FE formulations, and also for FD methods (Wellington 2016).

Weak formulation of the filter for homogeneous coherent lengths without dip and azimuth
We do not develop the weak formulation of eq. ( 15) in Cartesian space, but in the dimensionless coordinates in the domain ˜ , as defined by the following expressions As coherent lengths L z , L x and L y are homogeneous, the relationship of eq. ( 24) is an one-to-one projection from (z, x, y) to (z, x, ỹ), which implies that solving eq. ( 15) in dimensionless or Cartesian coordinates should provide identical results.In the dimensionless coordinates, eq. ( 15) can be written in its weak form as leading to the weak formulation where the test function v(z, x, ỹ) is chosen such that it satisfies the homogeneous Dirichlet boundary conditions, which implies that the boundary terms have vanished.
We introduce the notation of the division of two vectors (x 1 , x 2 , x 3 ) t and (y 1 , y 2 , y 3 ) t (where ' t ' stands for the transpose operator) such that B satisfies the relationship This matrix B is actually the transformation matrix when the coordinate transformation is performed from the system (x 1 , x 2 , x 3 ) to (y 1 , y 2 , y 3 ).
In SEM, the physical domain is decomposed into a set of non-overlapping hexahedral elements.Each element is further discretized by (N + 1) 3 Gauss-Lobatto-Legendre (GLL) points (ξ k 1 , η k 2 , ζ k 3 ) in the reference space, k 1 , k 2 , k 3 = 0, . . ., N. The mapping from the reference space (ξ , η, ζ ) to the Cartesian space (z, x, y) is expressed by the Jacobi matrix, according to the notation in eq. ( 27), J = (∂z, ∂x, ∂y) t (∂ξ, ∂η, ∂ζ ) t and J e = det(J).( 29) Similarly, the relationship between the reference space and the dimensionless coordinate is defined through In the reference space, the basis functions are defined as Lagrange polynomials over the GLL points.The integrals are then numerically approximated by GLL quadrature (more details on SEM are given in Appendix C).Due to these ingredients and the property of the Lagrange polynomials that have values at the GLL nodes of either 0 or 1, the weak formulation in eq. ( 26) can be written as in which the mass matrix M is diagonal, and the stiffness matrix K is symmetric.The impedance matrix A = M + K is then symmetric.For the mass matrix, its components are given by and, for the stiffness matrix, by 3 where We introduce here the geometric factors G ij associated with the projection between the dimensionless coordinates and the reference coordinates which simplifies the expression of the stiffness matrix to These expressions of eqs ( 32), ( 34) and ( 35) are used for the implementation of the matrices M and to compute the product of the matrix A with a given vector in the dimensionless coordinates.To do so, the evaluation of the geometric factors G ij is critical.Note that the linear system of eq. ( 31) is constructed and solved in the dimensionless coordinates and not in the Cartesian coordinates.The results are not affected by this coordinate transformation, due to the bijective correspondence between the two coordinates systems (eq.24).
When wave-propagation simulation is performed by SEM, all of the elements of the inverse Jacobi matrix J −1 and the volumetric Jacobian J e associated to this projection are available at no extra cost.It is of great interest to incorporate these ingredients into the construction of matrices M and geometric factors G ij .According to the definition of the relationship between dimensionless coordinates and Cartesian coordinates (eq.24), the determinant J e of the Jacobi matrix J can be estimated from the volumetric Jacobian J e of the Jacobi matrix J through The inverse matrix J−1 can be deduced from the elements of the inverse Jacobi matrix J −1 through the identity Bessel filter for spectral-element mesh 1495 Following this framework, the application of the Bessel filter to a model vector, through the solving of the linear system (eq.31), can be efficiently achieved in the dimensionless coordinate system, whereas the wave-propagation simulation is still performed in Cartesian coordinates.
The symmetric matrix A is referred to as the discrete Helmholtz operator, which has been shown to be positive definite (Deville et al. 2002).The CG method is thus the method of choice to iteratively solve the linear system.In these numerical experiments, the condition number of the matrix A ranges from 10 2 to 10 4 , which depends on the value of the coherent lengths.Therefore, the iterative solver converges rapidly toward the desired solution, as we will see with the numerical implementation.

Variable coherent lengths, dip and azimuth angles
In the previous section, the weak formulation of the Bessel filter in a dimensionless coordinate system was shown to naturally yield a symmetric stiffness matrix.When introducing variable parameters-coherent lengths, dip and azimuth angle-we want to preserve the symmetry of the matrix A, and therefore we develop the weak formulation in a rotated dimensionless coordinate system (ṽ, ũ, w).Before defining these coordinates, let us provide the definition of the azimuth θ and dip ϕ angles for a given vector.The azimuth is the horizontal angle measured from the North, and the dip is the angle that the vector has with the horizontal (Sheriff 2002).Their values range such that θ ∈ [−π , π ] and ϕ ∈ [−π/2, π/2].In 3-D space, a rotation R 3D with azimuth θ and dip ϕ transforms the Cartesian coordinates (z, x, y) into the rotated coordinates (v, u, w), where v is vertical direction, perpendicular to the bedding planes, and the two horizontal directions u and w define the plane of the geological structure.Considering the previous definitions, we have the following rotational operator Similar to the last section, we would like to develop the weak formulation of the PDEs associated with the application of the sparse inverse Bessel filter in the rotated dimensionless coordinates system (ṽ, ũ, w), defined by the following relationships in which L v , L u and L w are coherent lengths in the v, u and w directions, respectively.In this rotated dimensionless coordinate system, the Bessel filter is defined by the PDE The rotated dimensionless coordinates are mapped to the Cartesian coordinates through the relationship As coherent lengths, dip and azimuth (i.e. the parameters of the filter) are non-stationary, eq. ( 40) does not provide the same structure as eq.( 8) when expressed in the Cartesian coordinate system.The chain rules for derivative estimation introduce spatial derivative terms that are related to the parameter variations in the PDE, when moving from one to the other set of coordinates, through the relationship of eq. ( 41).However, if the filter parameters vary slowly in space, their spatial derivatives can be ignored, which leads to the following approximation Within this approximation, the Bessel filters defined in the rotated dimensionless coordinates and in the Cartesian coordinates are almost identical.In other words, performing the smoothing operation in the dimensionless system or the Cartesian system should provide approximately the same results.This argument has an important role in this implementation, because it allows the weak formulation to be developed in dimensionless coordinates, to maintain the symmetry of matrices K and A; this is a key point for numerical efficiency.It should be noted that this approximation will systematically introduce an error into the amplitude of the filtering operator when rapid variations of the filter parameters occur.The associated error analysis will be discussed in Section 4.1.
In the rotated dimensionless coordinate system, the differential relationship between the original vector m and the smoothed vector s that was obtained by filtering with the Bessel filter B 3D (ṽ, ũ, w) can be written as s(ṽ, ũ, w) − ∇ 2 ṽ, ũ, w s(ṽ, ũ, w) = m(ṽ, ũ, w). ( 43) Ignoring the derivatives related to variations in the filter parameters allows the same workflow to be applied as for homogeneous parameters.
Let us consider the mapping from reference space (ξ , η, ζ ) to the rotated dimensionless coordinates (ṽ, ũ, w), defined by the Jacobi matrix Jrot and the volumetric Jacobian J rot e J rot e = det( Jrot ). ( 45) The weak form of eq. ( 43) can again be discretized into the linear system of eq. ( 31), where the mass matrix M is diagonal and the stiffness matrix K remains symmetric through relations where k stands for the triple of indexes and where ( p 1 , p 2 , p 3 ) := (ṽ, ũ, w) and (r 1 , r 2 , r 3 ) := (ξ, η, ζ ).
The geometric factors G ij associated with the projection between the rotated dimensionless coordinates and the reference coordinates becomes The numerical implementation is identical to the case of the homogeneous parameters in eqs ( 32), ( 34) and ( 35), where the linear system of eq. ( 31) is again constructed and solved in the reduced coordinates system.However, we need new expressions of the volumetric Jacobian J rot e , and the elements of the inverse Jacobi matrix ( Jrot ) −1 .These quantities can be computed from elements in the mapping between the reference space and the Cartesian space, knowing that Jrot = (∂ ṽ, ∂ ũ, ∂ w) t (∂ξ, ∂η, ∂ζ Combining eq. ( 49) with the approximation of eq. ( 42), we have As det [R 3D (θ, ϕ)] = 1, the determinant of the Jacobi matrix J rot is The inverse of the Jacobi matrix J rot is The volumetric Jacobian J rot e and elements in the inverse Jacobian matrix Jrot can be computed from eqs ( 51) and ( 53), which completes the construction of matrix M and geometric factors G ij in the dimensionless coordinates.If the azimuth and dip are zero, and if the coherent lengths are homogeneous, eqs (51) and ( 53) are identical to eqs (36) and (37).Finally, we end up with the linear system As = Mm. (54)

N U M E R I C A L I L L U S T R AT I O N S
Various 3-D examples that illustrate the numerical efficiency of the workflow are proposed in this section.We will show that the decay of Laplace filters can be mimicked by using Bessel operators.However, it should be emphasized that there is no obligation to use a Laplace filter; for example, the software SPECFEM3D uses a 3-D Gaussian smoothing filter (Peter et al. 2011).We will also quantify the numerical approximation for variable coherent length, dip and azimuth.We first consider an estimation of the Bessel kernel by considering an original vector m as a delta function at the origin.Then, we will consider the impact of the filter when the input vector only contains random noise, without any predefined structures.Finally, we show the application of the filter to a gradient obtained from a subset of the synthetic 3-D SEAM Phase II foothills model (Oristaglio 2012).

Spike test
When the source term, that is, the original vector m 0 , is set as a delta function at location (0, 0, 0), the linear system should provide a solution that is identical to the Bessel filter (eq.6).When we solve the linear system of eq. ( 55) twice through the following sequential system we should obtain a response similar to the Laplace filter.Let us consider a model of size 1.5 km × 1.5 km× 1.5 km.A delta function is located at the centre of the model, as shown in Fig. 3.
In this example, the coherent lengths in the x and y directions are homogeneous L x = L y = 80 m and the dip and azimuth angle are set to zero, as there is no specific orientation.The coherent length L z is either constant or variable in the z direction, as indicated in Figs 3A(a 3B(b), this method can correctly follow the shape of the Laplace filter, which implies that ignoring the spatial derivatives of the coherent lengths is acceptable.This conclusion is further supported by the identical shapes of the zx cross-sections of the 3-D Laplace filter and the output of the linear system of eq. ( 55) in Figs 3B(c) and (d).When this variation becomes more important, as ∂ z L z = 0.2, this approximation cannot exactly reproduce the amplitude of the theoretical filter, but the influence of this approximation still appears acceptable for the smoothing effect of the model vector m that we consider.
The implementation of constant dip and azimuth is illustrated by the spike test in Fig. 4. The delta function is again located at the centre of the model.Homogeneous different coherent lengths are used in all directions, but the filters are highly anisotropic, with L v = 50 m, L u = 100 m, and L w = 200 m.Due to this design, the 3-D filter is a tri-axial ellipsoid with distinct semi-axis length, as indicated in the zx, xy, and zy cross-sections in Fig. 4(A).After applying the rotation with azimuth θ = 60 • and dip ϕ = 30 • , the comparison between Figs 4(B) and (A) shows that the ellipsoid is tilted 30 • from the x axis in the zx section, and it is also rotated 60 • from the y axis in the xy cross-section.
When variable dip and azimuth are introduced into the filter, similar amplitude errors as for the variable-coherent-lengths study in Fig. 3(C) are expected.Furthermore, the amplitude error introduced by the 3-D rotation can be mitigated by careful design of the coherent lengths, as the rotation has no impact in an isotropic filter.For FWI applications, which are generally limited at low frequency, the approximation of the slow variation of the filter parameters still appears acceptable for the smoothing effect.

Random noise tests
The filtering operator normally assumes the particular properties of the structures, which implies that the shape of the smoothed gradient/ model is driven by the imposed variation of coherent lengths, dip, and azimuth.This argument is illustrated in the following examples, when the input vector m only contains random noise, with no pre-defined structures.Fig. 5 focuses on homogeneous coherent lengths, dip, and azimuth filters on a model of size 1.5 km × 1.5 km × 1.5 km.The zx, xy, and zy cross-sections of the input vector are shown in Fig. 5(A), which only contains high frequency variations of random noise.Fig. 5(B) shows the smoothed vector obtained with coherent lengths L z = L v = 25 m, L x = L u = 100 m, and L y = L w = 25 m, without dip and azimuth.As the filter is strongly anisotropic, with the longest semi-axis length in the x direction, the zx and xy sections both contain features aligned in the x direction.The patterns in the zy section have no specific orientation as L z = L y , that is, the 3-D filter is isotropic in the zy plane.In Fig. 5(C), after applying a rotation with 45 • dip, the aligned features are tilted 45 • from the x axis in the zx section.Note that the apparent coherent length L x is now almost identical to L y , with no specific trend in the xy cross-section.Meanwhile, the apparent L z becomes greater than L y , which induces the alignment along the z direction in the zy cross-section.A similar interpretation can be applied to Fig. 5(D), which shows the application of a filter with azimuth 45 • to smooth the initial vector.Compared to Fig. 5(B), the features in the xy section are rotated 45 • from the y axis, and the patterns presented in the zx cross-section are lost due to the apparent coherent length in the x direction.The example shown in Fig. 6 is a longer gradient/ model with 3 km length in the x direction that is used to illustrate the implementation of variable coherent lengths, dip and azimuth.We use simple sine variation for these geological properties.Again, the input vector (Fig. 6A) contains random noise without any predefined structure.In Fig. 6

FWI gradient smoothing
This section illustrates the application of a non-stationary Laplace filter to a realistic gradient vector from FWI, as obtained from a subset of the 3-D SEAM Phase II foothills model (Oristaglio 2012).Surface acquisition is used with a line of 20 sources, with 350 m between adjacent sources.The receivers are located in the whole surface, with 12.5 m between receivers.A Ricker wavelet centred at 3 Hz is used as the source signal.The 2-D cross-section of the shear-wave velocity (V s ) model underneath the source line is shown in Fig. 7(A).
The topography variation is significant in this model, with maximal vertical elevation of 900 m.SEM is used for both forward and inversion problems.The initial V s model is shown in Fig. 7(B), which is a smoothed version of the true model.Compared with the true model in Fig. 7(A), this initial V s is overestimated at the near surface and underestimated at greater depths.The dip field is extracted from the true velocity model by manual picking, as illustrated in Fig. 7(C).Fig. 7(D) shows the first scaled gradient without any smoothing filter, which contains a significant acquisition footprint at the near surface.The horizontal oscillation of the features at greater depths might come from high wavenumber artefacts.
An anisotropic non-stationary Laplace filter with coherent lengths L z = 0.05λ s and L x = L y = 0.15λ s is applied, where λ s is the shear wavelength at each spatial position.The true dip field, as shown in Fig. 7(C), and the zero azimuth angle are used for 3-D rotation.It should be noted again here that the Laplace filter is efficiently approximated by application of two Bessel filters.The filters help to remove the near-surface artefacts due to the acquisition footprint, without degrading the deeper structures.The continuity of the features at greater depths is actually enhanced because the horizontal-oscillation artefacts are attenuated.In addition, the gradient correctly determines the update direction of the model: it reduces the V s at the near surface and increases it at greater depths (knowing that FWI updates the model following the negative gradient).
In summary, we have illustrated the property of the Bessel filter through various numerical examples in spike and random-noise tests.The approximation of the Laplace filter by Bessel operators is shown, which indicates prospective applications of this filter, either as a smoothing filter, or to efficiently mimic the decay of the Laplace filter.We also illustrate the robustness of this method for variable coherent lengths, dip and azimuth.The numerical example on a realistic gradient highlights the potential application of the filter for FWI.

N U M E R I C A L I M P L E M E N TAT I O N I N A PA R A L L E L S E M S C H E M E
In FWI, the gradient vector of the misfit function is computed from the correlation of the forward and backward propagation wavefields (as one per source) at each iteration, due to the adjoint-state method (Plessix 2006).However, the modelling mesh can sometimes be quite dense compared to the resolution that can be expected from the inversion, which leads to high wave-number noise in many applications.Consequently, the gradient/ model vector must be smoothed or regularized on this forward-modelling mesh.By doing so, the filtering operation (i.e. the Bessel operator here) has to be directly and efficiently implemented on the modelling mesh, which can be described by a domain decomposition for parallel computation.As the application of the Bessel filter is related to a PDE, this can proceed in a similar way as for the wave-propagation simulation.
Following an analogue of the framework to that designed for the SEM for wave simulation, the linear system associated with the Bessel filters is constructed and solved by a parallel CG iterative solver According to Saad (2003), only the matrix-vector product and the inner product of two vectors are required in the CG method.Thus, the most expensive operator is the product of the matrix A with a given vector.Each subdomain computes its part of the product in parallel, and the communications between subdomains during the CG iterations share the same strategies as that of wave simulation, which requires no extra Bessel filter for spectral-element mesh 1503 effort to manage the parallelism.This section presents this implementation and highlights the efficiency of this approach, compared with the standard 3-D convolution method.The convergence rate of the CG solver for several parameters are then analysed.

Linear system construction and matrix-vector product evaluation
The mass matrix M is diagonal, and the number of non-zero elements (NNZs) of this matrix is identical to the size of the input gradient/ model vector, It is then stored in the same form as the input vector.The dimensions of the matrices K and A are [SIZE(m)] 2 ; thus the full storage is not reasonable for realistic application.Assuming that the same order of interpolation N is used in each direction, and the inverse of Jacobi matrix Jrot is full of non-zero elements, the NNZs in the matrix A can be estimated from the size of the vector m, as For example, when 4th order interpolation is used (i.e. each hexahedral element is discretized by 5 × 5 × 5 GLL points) This estimation illustrates that the stiffness matrix is extremely sparse.Therefore, the non-zero elements of this matrix can either be stored by some efficient compressed storage technique or the matrix-vector product can be directly evaluated without storing the matrix.These two implementation approaches are discussed in this section.

Coordinate list storage of non-zero elements of matrix A
As the impedance matrix A is sparse, only the non-zero elements of this matrix need to be computed and stored.At each CG iteration, these non-zero elements are used to evaluate the product of matrix A with a given vector.The matrix A is computed from the stiffness matrix K.According to eq. ( 47), the element at row kth, column ĥth of matrix K can be computed through a triple loop over all of the degrees of freedom (q 1 , q 2 , q 3 ) inside the current element, which is computationally expensive.This component can be further developed as a sum of six terms, as shown in eq. ( C17) in Appendix C.This development allows the computation of the left-hand-side matrix A, while minimizing the loops over all of the degrees of freedom inside the current element.
The matrix A can thus be stored using the compressed storage techniques coordinate list (COO) format (Golub 1996), where a list of row and column indices and the associated values of the non-zero elements are stored.In addition, as the matrix A is symmetric, only the upper triangular part has to be stored; that is, the elements A ji where i ≥ j.We only store half of the diagonal terms 1 2 A j j .This helps to simplify the matrix-vector product in the CG algorithm.By applying this triangular storage strategy, and based on several tests carried out with this implementation on realistic size problems, the required memory for storage of matrix A is reduced by 49 per cent, and the computation time of the CG by 33 per cent, .

Matrix-free implementation of the matrix-vector product Au
Instead of building explicitly the impedance matrix A by storing its non-zero elements, its product with a given vector u can be directly estimated inside each CG iteration in a matrix-free fashion.As the product of a vector with the diagonal mass matrix M is trivial, the challenge is the stiffness matrix-vector product Ku.According to the development of the weak formulation of eq. ( 26) in Appendix C and the definition of the geometric factors G ij in eq. ( 48), this matrix-vector product can be written as ĥ K k ĥ u ĥ = or under the factorized version, as ĥ K k ĥ u ĥ = Eq.( 62) implies that the product stiffness matrix-vector can be factorized as where the operator D evaluates the spatial derivatives of the vector u in the reference space, G is the geometric matrix, and D w is equivalent to a weighted spatial derivatives operator.
The derivatives operators Du and the weighted spatial derivatives D w GDu can be efficiently estimated using the highly efficient algorithms developed by Deville et al. (2002), which benefit from the tensorial properties of SEM and the optimization of cache usage by manual loop unrolling.For each degree of freedom, the geometric matrix G is symmetric and only depends on the projection between the rotated dimensionless coordinates and the reference coordinates (eq.48).Therefore, only six vectors, as G 11 , G 12 , G 13 , G 22 , G 23 , G 33 , are stored in the memory, and they are computed outside the CG loops.Due to the factorization in eq. ( 62) and the vectorized computation for the stiffness/vector product, each CG iteration costs about 0.46 times a time-step of wave propagation for deformed elements.Compared with the explicit matrix-vector product with the COO format of the previous section, this matrix-free implementation decreases the computation cost by a factor of around 7, and reduces the memory requirement by >20-fold.These numbers have been estimated on the same highperformance computing architecture for the two implementations, on a model with 33 × 10 6 degrees of freedom, with decomposition on 64 subdomains (64 computing cores with infiniband interconnections).

Parallel conjugate gradient iterative solver
The parallel CG solver introduced in this section is similar to the standard algorithm, except that the matrix-vector and vector-vector operators are computed in parallel.As mentioned before, the matrices M and A are evaluated in parallel without assembly over the boundaries between subdomains.After each matrix-vector product, some point-to-point communications are required to assemble the values on the shared degrees of freedom.
The modulus of the relative numerical error is used for the convergence criteria, where s k is the smoothed model obtained at the kth iteration.In the next paragraphs, we study the properties of the linear system (eq.57), as well as the convergence rate based on the values of the error modulus η.

Linear system behaviour
First, we build a reference system with smoothed vector s ref and input m ref , which is computed as The linear system is then solved iteratively by the CG solver.At each iteration, the modulus of the relative model error γ is computed The numerical error η is defined in eq. ( 64), and it is compared to this model error γ to characterize the behaviour of the linear system of eq. ( 57).An example of this comparison for a model vector that contains 30 × 10 6 degrees of freedom is shown in Fig. 8.The filter is defined with homogeneous coherent lengths L z = L x = L y = 80 m, without dip and azimuth.It should be noted that the numerical and model error Bessel filter for spectral-element mesh 1505 modulus both decrease with the same speed, which implies that the linear system associated with the Bessel filter is well conditioned.The system converges relatively rapidly, and attains the numerical limit of 10 −6 after 110 iterations, using FORTRAN single-precision arithmetics.Beyond this limit, the relative model accuracy remains unchanged, so the stopping criteria can be set as η = 10 −6 .In FWI applications, to optimize the numerical cost, the CG solver could be stopped before the numerical limit is attained, because the model parameters will be repetitively modified.Evaluation of this tolerance η should be carefully studied, according to the objective of the application.

Convergence properties
After fixing the stopping criteria as η = 10 −6 , we investigate the dependence of the convergence rate of the CG solver on several parameters, including coherent length, size of model vector, and presence of dip and azimuth.As we are approximating the Laplace filter by the Bessel operators, the linear system associated with the Bessel filter is solved twice to obtain the smoothed vector.The number of iterations shown in Fig. 9 is then estimated by consecutively solving two linear systems.Obviously, the same conclusions should be draw when solving one linear system, which corresponds to the direct application of the Bessel filter.Fig. 9(A) indicates that for a given model the number of iterations required for solving two linear systems increases linearly with the values of the coherent length.This plot is obtained from a model of size 80 × 80 × 80 elements; that is, 30 × 10 6 degrees of freedom.Each point in the plot is obtained from filtering with the isotropic Bessel operator, where the coherent lengths are homogeneous.We carry out a similar study when the coherent lengths are fixed, and the dip and azimuth angles are introduced into the filter.These parameters have no influence on the computation time.Therefore, these data are not shown here.In Fig. 9(B), the filter shape is kept unchanged, with L z = L x = L y = 80 m, but the size of the model vector increases gradually from 20 × 20 × 20 elements to 100 × 100 × 100 elements.The corresponding number of degrees of freedom in each direction is indicated by the axis at the top of Fig. 9(B).For a pre-defined filter, the total number of iterations required for smoothing a model vector is independent of the model size.

Numerical performance versus explicit 3-D convolution filtering
As Bessel filters can be efficiently used to approximate the Laplace filter, the efficiency of this method needs to be evaluated with respect to the 3-D explicit convolution approach.We limit this study to an isotropic and homogeneous Laplace filter The filter theoretically has infinite tail, but it can be implemented as a truncated 3-D convolution of the filter and the model/ gradient vector.
We then define a sphere αL centred at the origin, with radius αL, such that αL L 3D (z, x, y)dzdxdy = 0.99, (69) which provides α ≈ 8.4 (Fig. 11).This condition is equivalent to 99 per cent energy of the filter.Fig. 10 compares the computation cost of our approach with the 3-D explicit convolution in a model with 1.77 × 10 6 degrees of freedom and where the coherent length increases from 30 to 80 m.The 3-D explicit convolution is the most expensive method, but the comparison is partially biased as our approaches are highly optimized.However, it should be noted that the numerical cost of the 3-D explicit convolution method is driven by the number of points within the effective sphere αL , thus with the complexity O(L 3 ).Whereas, for a given model size the computational complexity of the approximation of the Laplace filter by Bessel operators only depends on the number of CG iterations, which increases linearly with the  coherent length (∝ O(L 1 )), as shown in Fig. 9(A).In Fig. 10, the observed numerical complexity of the explicit convolution method versus the increase in the coherent length is O(L 2.6 ) due to the limited model size.We obtain a sublinear relationship between the numerical cost of our approximation and the coherent length (O(L 0.7 )), for the two implementation approaches mentioned in Section 5.1: COO-storage and matrix-free implementation.This is due to some calculations that are independent of the coherent length.This property is encouraging for the application of this approach to large problems.

C O N C L U S I O N A N D D I S C U S S I O N
The application of a filter on a given vector is usually described through convolution of the vector by a filter that is a linear operation.Smoothing will require filters with decaying tails.Thus, the convolution operator might be expensive if it is computed explicitly.We promote the idea of applying the filter through its inverse operator, which is expected to be a sparse operator.The smooth output is obtained as the solution of a new linear system, in which the sparse inverse of the filter has been constructed.We introduce the Bessel filters and their corresponding diffusion-like PDEs to be solved numerically in 2-D and 3-D geometries.We analytically explain why the iterative application of two Bessel filters provides an excellent approximation of the Laplace operator for smoothing the model, which offers the choice of either using Bessel filters as smoothing operators or to mimic the decay of the Laplace filter.This argument is well supported by the numerical illustration of spike tests, where the PDE defining the Bessel filter is solved with a delta source function.These PDEs can be discretized through a weak formulation that involves SEM, which leads to a symmetric linear system that can be efficiently solved through a CG method.There are no specific requirements for the mesh, which can be chosen in a suitable way for the application being considered.
We present a robust and efficient approach to incorporate variable coherent lengths, dip and azimuth into the Bessel filter, without losing the symmetry of the left-hand-side term of the linear system.To do so, we develop the weak formulation of the PDE associated with the sparse inverse filter in the rotated dimensionless coordinate system.The construction of this system naturally takes advantage of the available ingredients in the underlying mesh that are associated to the study problem.All of the information about the geological variations of the medium is preserved in the Jacobi matrix and its inverse, which does not break the symmetry of the stiffness matrix.The well-behaving performance of this approximation is illustrated by spike tests, which closely reproduce the shape of analytical Laplace filter when assuming slow variations of these geological properties.In more complex tests, when random noise is used as the input model, the output vector correctly follows the imposed shape that is described by the filter parameters.When the non-stationary filter is applied to a realistic FWI gradient, it effectively removes near-surface artefacts due to the acquisition footprint, and further enhances the continuity of the deeper structures.
We propose efficient implementation of the application of the Bessel filter, where the linear system is solved with a parallel CG following the same domain decomposition as the wave simulation.This parallel computation of the matrix-vector product at each CG iteration can be achieved either by using explicit building of the local matrix, or in a matrix-free fashion.For the matrix-free implementation, the stiffness matrix-vector product can be factorized and highly optimized, due to the vectorized computation and the optimizing cache usage developed by Deville et al. (2002).This strategy significantly reduces the memory requirement and computation time for the CG solver.After each matrix-vector or vector-vector product, the vectors obtained are spatially communicated to assemble the values on the subdomain boundaries.Consequently, any projection back and forth from the modelling mesh and the inversion mesh is avoided, as well as any explicit windowed convolution.
The sparse inverse linear system associated with the Bessel filter is well conditioned, due to the structure of the PDE.The CG solver applied to this system converges relatively rapidly.The convergence depends on many factors, but it increases linearly with the coherent length.Other factors, such as the number of elements in the medium and the presence of dip and azimuth angles, appear to have no influence on the number of iterations required for the system to converge.The approximation of the Laplace filter by the Bessel filter proposed here is also compared with the windowed explicit 3-D convolution method.The approach here is significantly faster, with a sublinear relationship between the numerical cost and the coherent length, which promises general applications to various large problems.
One question might be whether we should apply one Bessel filter with longer coherent length or two Bessel filters, to reproduce the decay of the Laplace filter.Choosing between Bessel and Laplace filters is case dependent, and this should be decided based on the energy distribution of the filters and the numerical cost.Knowing that the energy distribution of these filters are different (as shown in Fig. 11), they should result in different smoothing effects.To have the same energy as the Laplace filter at 90 per cent, 50 per cent or 10 per cent energy, the coherent length in the Bessel filter should be increased by a factor of 1.4, 1.6 or 2.1, respectively.These different scaling factors also imply that it is not possible to mimic the Laplace filter by simply modifying the coherent length in the Bessel filter.For the same coherent length, the contributions of the neighbors surrounding the origin are more important in the Bessel kernel, which is indicated by the 10 per cent energy level.It should also be noted that increasing the coherent length will eventually increase the numerical cost of the iterative solving of the linear system associated with the Bessel filter.
Only the numerical implementation for SEM is presented here, but the method can be extended to other FE techniques, and also to FD techniques (Wellington 2016).This study illustrates the potential applications for model preconditioning, but this approach of applying a filter (or a covariance matrix) can be considered for regularization and for prior-model strategies.As the filter has a robust shape, even with dip, azimuth and coherent length variations, it can be used to amplify expected geological features, while attenuating features that come from numerical artefacts in other areas.In this way, the null-space dimension is reduced, to converge to more meaningful models while honouring the data fit.

A P P E N D I X C : W E A K F O R M U L AT I O N D E V E L O P M E N T
This Appendix clarifies the definition of the mesh used in SEM, which consists of Gauss-Lobatto-Legendre (GLL) points, defined in the reference space.The base functions are Lagrange polynomials.GLL quadrature is used to approximate analytical integrals over collocation points.Let us recall the weak formulation of eq. ( 26 In SEM, the physical domain is decomposed into a set of non-overlapping hexahedral elements e .This implies that the dimensionless space ˜ is also represented by a set of elements ˜ e , which is related to the physical element e through the one-to-one mapping (eq.24).Each reduced coordinate element ˜ e can be mapped to the unitary reference space of the GLL points, where the cube [−1, 1] ⊗ [−1, 1] ⊗ [−1, 1] is discretized into (N + 1) × (N + 1) × (N + 1) GLL points (ξ k 1 , η k 2 , ζ k 3 ); k 1 , k 2 , k 3 = 0, . . ., N. These collocation points define (N + 1) × (N + 1) × (N + 1) base functions, which are triple products of Lagrange polynomials of degree N, over the element ˜ e vˆk(ξ, η, ζ ) = k 1 (ξ ) k 2 (η) k 3 (ζ ) k stands for the triple of indexes {k 1 , k 2 , k 3 }. (C2) Lagrange polynomials have the interesting property that their values at GLL nodes are either 0 or 1 where the volumetric Jacobian J e is the determinant of the Jacobian matrix J We introduce the following notations of the coordinates of the dimensionless physical space and the reference space w q 1 w q 2 w q 3 f (ξ q 1 , η q 2 , ζ q 3 ), (C12) where w q 1 , w q 2 , w q 3 are quadrature weights.As the test function vˆk is the triple product of Lagrange polynomials (eq.C2), which is 1 at node (ξ k 1 , η k 2 , ζ k 3 ), and 0 at other nodes (eq.C3), the weak formulation developed in eq. ( C11) can be transformed into a linear system Ms + Ks = Mm.(C13) The mass matrix M is diagonal with and the stiffness matrix K is symmetric and sparse w q 1 w q 2 w q 3 J e (ξ q 1 , η q 2 , ζ q 3 ) ⎡ ⎣ The property mentioned in eq. ( C3) of the test functions can be used again to simplified the spatial derivatives of the test functions as ∂vˆk ∂ξ (ξ q 1 , η q 2 , ζ q 3 ) = k 1 (ξ q 1 )δ k 2 ,q 2 δ k 3 ,q 3 , (C16) similar expressions can be written for ∂vˆk/∂η and ∂vˆk/∂ζ at (ξ q 1 , η q 2 , ζ q 3 ).Therefore, the element at row kth, column ĥth of the matrix K can be explicitly computed as w k 1 w q 2 w k 3 J e (ξ k 1 , η q 2 , ζ k 3 ) h 2 (η q 2 ) k 2 (η q 2 ) 3 d=1 w k 1 w k 2 w q 3 J e (ξ k 1 , η k 2 , ζ q 3 ) h 3 (ζ q 3 ) k 3 (ζ q 3 ) 3 d=1 )

Figure 1 .
Figure 1.Comparison between the behaviour of the 3-D normalized Laplace filter defined in eq.(20) and the convolution of two 3-D Bessel filters for L x = L y = L z = 20 m, obtained from the iterative solving of eq.(8), which defines the 3-D Bessel filter.

Figure 2 .
Figure 2. Comparison between the 2-D normalized Laplace filter defined in eq.(23) and the application of two 2-D Bessel filters for L x = L z = 20 m at different values of the scaling parameter a.The Bessel filters are obtained from the iterative solving of eq.(9), which defines the 2-D Bessel filter.
), B(a) and C(a).In Figs 3A(b), B(b) and C(b), the output of the linear system (blue dashed line) and the theoretical Laplace function (red line) in the z directions are superimposed.Fig. 3A(b) illustrates that the double application of the Bessel filter provides an excellent approximation of the Laplace filter for homogeneous coherent lengths, although still with the negligible singularity at the origin.Under the slow variation of the coherent lengths, ∂ z L z = 0.1 in Fig.

Figure 3 .
Figure 3.Comparison of the normalized Laplace filter with the application of two Bessel filters through the spike tests, with homogeneous coherent lengths in the x and y directions L x = L y = 80 m.Coherent length in the z direction: (A) Constant L z ; (B) L z varies from 50 m to 200 m; (C) L z varies from 50 to 350 m.In each panel, (a) shows the variation of L z in the z direction, and (b) compares the output of applying the 3-D Bessel filter twice (blue dashed line), with the theoretical Laplace filter (red line).The respective zx cross-section of these filters are shown in (c) and (d).
(B), the coherent lengths in the z and y directions are homogeneous L z = L y = 25 m, whereas L x varies as a sine function in the x direction, from 25 m to 85 m.The zx section of the smoothed vector (Fig. 6B, left) correctly follows the variation of the coherent length L x (Fig. 6B, right).Fig. 6(C) shows the zx and xy cross-sections of the smoothed vector with homogeneous coherent lengths L z = L v = 25 m, L x = L u = 250 m, and L y = L w = 25 m, and no dip and azimuth.An extremely long L u was intentionally designed so that the smoothed vector has a layered structure, in parallel with the x direction.Fig. 6(C) will be used as the reference to compare this with the variable dip and azimuth filters in the next examples.In Fig. 6(D), similar values of homogeneous

Figure 4 .
Figure 4. Spike test to illustrate the kernel of the 3-D Bessel filter with homogeneous coherent lengths L v = L z = 50 m, L x = L u = 100 m, and L y = L w = 200 m.(A) zx, xy and zy cross-sections of the 3-D Bessel filter without dip and spike.(B) The filter consists of a 3-D rotation with 60 • azimuth and 30 • dip.

Figure 5 .
Figure 5. Random noise test with a stationary filter to illustrate the implementation of homogeneous coherent lengths, dip and azimuth.(A) zx, xy and zy cross-sections of the input model.(B) Smoothed model obtained from applying an anisotropic filter, L v = 25 m, L u = 100 m, and L w = 25 m, without dip and azimuth.(C) Smoothed model when 45 • dip and 0 • azimuth are introduced into the filter.(D) Smoothed model when 0 • dip and 45 • azimuth are introduced into the filter.Model size, 1.5 km×1.5 km×1.5 km.

Figure 6 .
Figure 6.Random noise test with non-stationary filter to illustrate the implementation of variable coherent lengths, dip and azimuth.The size of the model is 1 km in the z and y directions, and 3 km in the x direction.(A) zx and xy cross-section of the input model.(B) Smoothed model (left) obtained from a filter with variable coherent length L x (right).(C) zx and xy cross-section of the smoothed model obtained from a highly anisotropic stationary filter.(D) Smoothed model (left) obtained from a filter with variable dip (right).(E) Smoothed model (left) obtained from a filter with variable azimuth (right).

Figure 7 .
Figure 7. Example of the filtering operator on the FWI gradient, from a subset of the 3-D SEAM Phase II foothills model.(A) True shear wave velocity model.(B) Initial shear wave velocity model.(C) Dip field.(D) Original scaled gradient without any smoothing.(E) Smoothed gradient, with an anisotropic non-stationary Laplace filter (approximated by the application of two Bessel filters) for the dip field as presented in (C).L z = 0.05λ s and L x = L y = 0.15λ s , where λ s is the local shear wavelength.

Figure 8 .
Figure 8. Relative numerical error modulus η and model error modulus γ for smoothing a model vector containing 80 × 80 × 80 elements; that is, 30 × 10 6 degrees of freedom.The coherent lengths are homogeneous L z = L x = L y = 80 m, without dip and azimuth.

Figure 9 .
Figure 9. (A) Dependence of the total number of iterations (obtained from consecutively solving two linear systems associated with the Bessel filter) on coherent length.The model contains 80 × 80 × 80 elements; that is, 30 × 10 6 degrees of freedom.(B) Dependence of the total number of iterations on the size of the problem.For both images, the stopping criterion is η = 10 −6 .

Figure 10 .
Figure 10.Comparison of the numerical performance of this method.The COO storage of non-zero elements in matrix A (black) and the matrix-free implementation of the product matrix-vector Au (red) are compared to the truncated explicit 3-D convolution of the Laplace filter (blue) for a model with 1.77 × 10 6 degrees of freedom, with decomposition for two subdomains.

Figure 11 .
Figure 11.Energy analysis for isotropic Laplace and Bessel filters within the sphere αL centred at the origin with the radius αL (L is the coherent length in the z, x and y directions).E 10 , E 50 and E 90 indicate the 10 per cent, 50 per cent and 90 per cent energy.
x, ỹ s ∇ z, x, ỹ v dzd xd ỹ = ˜ m v dzd xd ỹ, (C1)where v(z, x, ỹ) is a test function, m and s are the original and the smoothed vectors, respectively.
) An element of volume d zd xd ỹ is related to an element of volume dξ dηdζ in the reference cube by d zd xd ỹ = J e dξ dηdζ, (C4)