A dedicated multiparametric strategy for the fast construction of a cokriging metamodel

This paper deals with the building of a gradient-based metamodel using a dedicated strategy for solving structural assemblies problems. This work is the ﬁrst part of a two-levels global optimization strategy. The general objective is to reduce computation costs; here, we focus on the costs which are associated with the generation of the metamodel. Our goal is achieved through the introduction of two main elements: what we call a “multiparametric strategy” based on the LATIN method, which reduces the computation costs when the parameters vary, and the use of a cokriging metamodel taking gradients into account. Several examples illustrate the efﬁciency of these two elements.


Introduction
Structural optimization involves two very different domains of expertise: numerical simulation and optimization.Of the two, simulation is the only one which generates high computation costs.The main difficulty in the context of optimization is that the optimizer often requires a large number of simulations to locate the optimum of an objective function.(In contact and friction problems, these simulations are nonlinear and, thus, even more costly.)Therefore, new methods for the resolution of complex optimization problems are needed.In this context we proposed to use a two-levels model optimization.
Here, we focus on the first level.Thus, this paper deals with the aim of reducing the computational cost associated to the building of kriging metamodels.Therefore we introduced a gradient-based kriging metamodel and a MultiParametric Strategy.
In the first part of this paper, the context of two-levels model optimization is reviewed.In the second part, a specific mechanical resolution process is introduced and its performance is discussed in detail.The third part concerns the kriging class of metamodels.Finally, the last part presents the application of the coupled multiparametric-metamodel approach to a mechanical test case.

Two-level model optimization
The proposed multilevel model strategy is based on the approach developed in [1].In our case, we consider the two-levels strategy illustrated in Figure 1.The first level consists in a metamodel defined using a limited amount of data.The zones where the optimum can be located are determined using, for example, a genetic algorithm.This information is transferred to the second level, where a precise search of the minimum is carried out.Therefore, the second level consists in an optimization process based on the full mechanical model.Two elements contribute to the computation cost: the construction of the metamodel from the responses of the full mechanical model, and the direct optimization based on the full mechanical model.Both elements are involved both levels of the two-levels model strategy.Therefore, the strategy proposed here leads to a reduction in the computation cost of both phases.

Sim ula tor
Me tam ode l Op tim ize r Sim ula tor Op tim ize r Optimization using the metamodel Direct optimization The first step of our two-level optimization is a classical approach that is often designated as surrogate-based optimization and, overall, our work fits in the context of parametric optimization.More details on the use of metamodels in the context of parametric optimization can be found in [2,3,4,5].
The scope of this paper is limited to a study of the cost associated with the generation of the metamodel.
Many types of metamodels can be used to find an approximate solution: polynomial regression [6], methods based on neural networks [7], radial basis functions [8], etc...We chose to use a particular category of metamodels called kriging approximations and, more precisely, a cokriging metamodel using derivatives.These approximations are presented in Section 4.

The multiparametric strategy
This chapter introduces the dedicated strategy called multiparametric strategy for solving problems on structural assemblies.Due to a judicious reinitialisation of the LATIN algorithm -described in Sect.3.2 -multiparametric strategy enables one to reduce significantly the computation cost.The multiparametric strategy is presented in Sect 3.3.1 and example of 2D mechanical benchmark with two, three and four design variables is used for showing the performance of the multiparametric strategy (See Sect.3.3.3).

The context
We are considering assemblies of linear elastic structures under the assumption of small perturbations.The only nonlinearities are considered to be due to contact or friction between parts.The strategy we use to solve this type of problem, based on twenty years of development at LMT-Cachan, relies on an iterative algorithm introduced by P. Ladevève [9,10].
In the context of assemblies, this strategy remains on three main points: • the structure being studied is divided into substructures and interfaces.In the context of assembly problem, the natural way is to considering substructures as the parts of the assembly and interfaces as the contact zones among parts.Moreover a mixed domain decomposition is considered (force and displacement are unknows at the interfaces) • a dedicated iterative algorithm is used to solve the mechanical problem; • the operators of the method remain constant and depend neither on the loading nor on the parameters (friction coefficient, gap) of the interface.

The resolution process
The resolution strategy, known as the LATIN (LArge Time INcrement) approach [10], consists of alternative resolutions of two groups of equations.The first group contains the local equations Γ (which may be nonlinear) related to the interfaces, and the second group contains the linear equations A d related to the substructures.The iterations between the two groups are handled through the use of search directions which are parameters of the method.This resolution process, shown in Figure 2, leads to the solution defined as the intersection of spaces Γ and A d .Friction and contact problems along with their specific approaches and laws [11] are part of equation group Γ.The schematic linear representation of A d is due to the linear behavior of the substructures.In the context of elastic assemblies, the boundary conditions (forces F i and displacements W i ) at each interface i between two substructures are sufficient to define the solution: the internal behavior is the result of a classical elastic problem.Thus, an approximate solution is described entirely by s = all the interfaces Readers can found additional details in [9,10] for the LATIN method and in [11,12] for its application to assemblies.

A method for reducing computation costs
A parametric optimization involves many calculations, each carried out with a different set of the parameters of the problem (design variables such as friction coefficients, preloads, gaps, etc...).This leads to the resolution of many similar problems in the sense that only the parameters of the problems vary.Therefore, it is essential to find a method to accelerate these calculations.The method we use, called the Multiparametric Strategy (MPS), was introduced in [13,14], then studied and applied to various sample problems in [15,16,17].

Principle of the method
The main idea of the multiparametric strategy is to take advantage of a feature of the LATIN algorithm: at each iteration, the solver yields an approximate solution over the whole loading path and at all points in the structure.Then, if a new calculation associated with other values of the parameters is requested, the algorithm can be reinitialized using a previously converged solution.Here, we assume that the only parameters which are different are the interface parameters, so space Γ alone is affected by the change.
Each new converged solution enriches a database of possible restart solutions.Figure 3 illustrates the use of the multiparametric strategy to obtain an approximate solution starting from a previous calculation: the schematic representation shows that the required number of iterations to carry out the converged solution s 1 is bigger than the required number of iterations to obtain the other converged solutions s 2 or s 3 .Moreover, in order to obtain the solution s 3 , the LATIN algorithm can be reinitialized with one of the two available converged solutions s 1 or s 2 .We observe that the benefit depends on the choice of the starting point.Here, we will use only a "closest point" strategy: for each new requested solution the LATIN algorithm is reinitialized with the converged solution associated with the set of design parameters which is closest to the one being considered.Some indications for choosing the best initialization strategy were given in [18].
Varia tion of para mete rs With this approach, the solution converges in fewer iterations of the algorithm and, therefore, in less time.

One Reduced Order Model: the Proper Generalized Decomposition (PGD)
Each converged solution used for reinitializing the LATIN algorithm can be considered as a reduced order model (ROM) of the solution in the sense that, as we deals here with elastic assemblies, the whole solution can be only described by the boundary conditions (more details can be found in [13]).In case of problems with nonlinear material behaviour for example this idea must be extended with the use of a PGD basis as ROM.
The Proper Generalized Decomposition is initially based on the space-time separated representations as a part of the LATIN method which was designed to deal with non-linearities such as plasticity or viscoplasticity.Then many developments has been achieved during the past three decades.Readers can found details in [19,20] for the developpments of the PGD approximations and applications.Recently the PGD was used on parametric models and this approach is initiated the use of the PGD to deal with parametric problems such as parametric optimization problems, inverse identification problems or real time simulation.In order to achieve these parametric problems the classical time-space separated approximation presented by the Equation 1 can be rewritten as presented by Equation 2where k denoted a parameter of the model.The PGD can also be used as space-parameter decomposition as presented by Equation 3.
Such a decomposition allows to reduce significantly the CPU time associated to the computation of the responses of the model on requested values of the time, space and/or parameter.Examples of the use of the parametric modelling can be found in [21,22,23,24,25].PGD was also carried out in the context of Multiparametric Strategy [26].Finally the PGD was also used to build response surface models [27].Actually, wether it is for optimization, the building of a reponse surface or, more generally, the building of a metamodel, it requires values of an objective function obtained for different values of the parameters of the model.The PGD enables to reduce significantly the computation cost associated to these required values.

Illustration of the performance of the method A two design variables example.
The example considered here is a quasi-static academic problem which was presented in [16].Figure 4 shows the geometry of the problem, which consists of three square parts (h = 50mm, Young's modulus E = 2 • 10 5 M P a and Poisson's coefficient ν = 0.3) which are in contact with friction.Each part was represented by a single substructure discretized into 24 × 24 bilinear quadrangles.The parametric study consisted in varying the friction coefficients µ 1 and µ 2 of the two contact interfaces.The loading consisted of two stages: first, a progressive vertical pressure P 1 up to a maximum of 50M P a applied at the top of substructure Ω 3 (the preloading stage); then, a progressive horizontal load from 0 to 30M P a applied to substructure Ω 2 .In this test case, variations of the friction coefficients µ 1 and µ 2 between 0 and 0.6 were considered, and the function studied was the reaction force on the rigid wall.Figure 4 shows the response surface of this function obtained with 18 × 18 values of the friction coefficients.
In order to illustrate the performance of the multiparametric strategy, the two-variable design space was sampled with a 4 × 4 regular grid.For each sample, the force and gradients were calculated (using a classical finite difference method for the gradients).Table 1 summarizes the characteristics of these calculations: The gain obtained with our method (compared to a classical calculation without the multiparametric strategy) was estimated using the following expression: Gain = Number of calculations × CPU time of the first calculation CPU time using the multiparametric strategy (4) We considered that the time of each calculation without the multiparametric strategy was almost constant and equal to the time of the first calculation (8.2sThe results presented in Table 1 show that the solver can reuse a previously converged solution to accelerate the resolution of similar problems.The most remarkable point is the gain obtained in the evaluation of the gradients.These were calculated using a finite difference method: each gradient required the value of the response at 3 points (a sample point plus 2 points obtained by very small variations of the parameters).The results show that for the same number of evaluations the cost was less when the gradients were calculated along with the responses than when the responses alone were calculated: in our example, 16 evaluations took 42.3s whereas 16 gradients (32 evaluations) took 58s.

A four design variables example.
In this paragraph we proposed a modified version of the previous two-variables example.By modifying the substructure Ω 1 and replace it with two new substrucures, a four-variables is introduced.Figure 5 shows the geometry of the problem.The material characteristics and the loading are retained and two new interfaces are introduced: a gap between substructures Ω 1 and Ω 4 and a new interface with friction between substructures Ω 2 and Ω 4 are introduced.In this context, two new design variables are added to the two existing design variables: thus we consider three friction coefficients µ 1 , µ 2 and µ 3 (between 0 and 0.6) and one gap j 1 (between −28µm and 48µm).Although we use a first-order finite difference method to compute the gradients and the fact that n d responses (in addition to the first calculation) of the objective function are needed to obtain the gradient on one sample point, the required time to evaluate the gradients is the same than the time required to evaluate all responses.For this example we obtain a gain around 4.5 for computing the responses and the gradients.Thus, the MPS is particularly efficient in calculating the gradients, thanks to the ability of the method to reduce the computation time for sample points which are close together.This led to our decision to use a dedicated gradients metamodel.

The cokriging metamodel
This type of metamodel is similar to the kriging metamodel proposed by [28] and developed by [29,30].The cokriging strategy stems from multivariate geostatistics [31].In our case, the cokriging metamodel is built using not only the responses, but also their gradients [32,33,34].

Notations
In this section, we will use the following notations: x (i) denotes a point in the design space D (x (i) is one of the n s sample points, while x (0) is a non-sample point); x i denotes the i th coordinate of a point x; Y (x) and Y (x) denote respectively the response of the analytical function (or the response of the mechanical model) and the approximate response given by the metamodel; finally, R(x (i) , x (j) ) is a correlation function.

Principle
The model can be viewed as the sum of two components: a linear model (which represents the trend of the data) and a departure from the linear model (which represents the fluctuations around the trend).
where E and Cov denote the classical statistical expected value and covariance.The main idea is to consider a covariance relation between the discrete responses of the deterministic function.This covariance depends only on the distances among the samples.Thus the departure is represented by a zero-mean, second-order stationary process.(The mean and the variance are constant, with a correlation depending on the distance.)Depending on the definition of the function µ, one can build different types of kriging or cokriging metamodels (simple kriging, where µ is the average of the response at sample design points; ordinary kriging, where µ is an unknown constant; or universal kriging, where µ is a polynomial function).
Function Z has a zero expected value and its covariance structure is a function of a generalized distance among the sample responses.The covariance structure can be written as: In our case, the correlation function chosen is a Gaussian function or a Matérn function [35,36].
In the case of a cokriging metamodel, additional covariance relations involving the different variables are introduced [32].In our case, the primary variables are the evaluations of the function being studied, and the secondary variables are the gradients: With this relation, one can also build the kriging or cokriging metamodel by considering the following linear predictors of the non-sample point x (0) : Thus, the determination of the Best Linear Unbiased Predictor (BLUP) leads to the λ's which, for a kriging metamodel, minimize the Mean Square Error: subject to the unbiasedness condition: The construction of a cokriging metamodel follows the same process.
In the universal kriging case, the best linear predictor of Y x (0) can be written in matrix form [37] as: where The regressor vector β is the generalized least-squares approximation of β.Thus, the first part of Equation 13 is the generalized least-squares prediction at point x (0) .The second part can be viewed as a correction of the generalized least-squares response surface to obtain the interpolating kriging model.
The formulation of the cokriging metamodel is very similar to that of the kriging metamodel.For simplicity's sake, following the idea in [34], we propose to use only an ordinary cokriging metamodel: where and where In the kriging case, vector Y s contains only the responses of the function at the sample point, whereas in the cokriging case vector Y sc contains both the responses and the gradients.
With these formulations, the models can supply approximate responses of the actual function at all the points in the design space.In our case, we consider the response of the function to be deterministic and we obtain both of the interpolating models.This type of metamodel has other advantages: for example, it provides statistical information (the expected value and the variance of the process).Due to the use of the unbiasedness condition, the expected value of Y is given by the trend model µ and the mean square error of Y [37]: This relation also holds in the cases of simple or ordinary kriging or cokriging models, provided the appropriate forms of vectors and matrices x 0 , c 0 , X and C are used.

Estimation of the parameters
The model's parameters (such as the characteristic correlation length scale l, the variance σ of the random process Z or the regression coefficients) can be determined by maximizing the likelihood [38].In our case, we use this technique to determine l and σ: with The variance σ 2 can be determined analytically through the derivation of the likelihood function: One can also use an optimizer to determine the value of the correlation length scale.This method has some drawbacks [39,40,41]: in many cases with very few points, the log-likelihood is monotonous; often, the correlation matrix is affected by conditioning problems which make finding the minimum difficult.When such problems arise, one sets the parameters (in particular the correlation length scale) to fixed values.

Analytical applications
In this section, we apply kriging and cokriging to one-and two-dimensional analytical test functions.We will use the abbreviations OK and OCK to designate respectively Ordinary Kriging and Ordinary CoKriging.

The one-dimensional test function
First, we applied the two types of metamodels to an analytical function, chosen to be y(x) = exp(−x/10) cos(x) + x/10.We used 5 sample responses of the analytical function to build the OK metamodel, and an additional 5 sample derivatives to build the OCK metamodel.The correlation function was the Matérn function.The sample points were obtained using Latin Hypercube Sampling.Figure 7 illustrates the capability of the cokriging metamodel to interpolate not only the values of the responses, but also the sample derivatives.In this example, the quality of the cokriging metamodel was better.This statement remained true as long as we worked with only a few points.We can also observe that for a relatively smooth function the kriging metamodel converges quickly toward a good approximation of the real function when the number of points becomes large enough.
The characteristics of the two metamodels are summarized in Table 2.
where ∀i ∈ {1, 2, ..., n c }, Based on the statistical information from the kriging and cokriging metamodels, one can derive confidence intervals.The two diagrams of Figure 8 show the 95% Confidence Intervals (CIs) obtained with Expression 18.In these types of metamodels, the size of the confidence envelopes is determined mainly by the distance between each pair of neighboring points.

The two-dimensional test function
The same two types of metamodels were used to approximate an analytical function of two variables.In order to illustrate the performance of the cokriging metamodel, we chose a very bumpy function: the six-hump camel back function 2 ). Figure 9 shows the response surface of this function.The two metamodels were constructed using 16 evaluations of the function for the kriging metamodel, and an additional 16 evaluations of the gradients for the cokriging metamodel.In both cases the correlation function was the Matérn function.The characteristics of these two metamodels are given in Table 3.   [10][11] For this 2D test function, the cokriging metamodel led to a relatively accurate approximation of the actual function using only a few sample points.Taking into account the gradients, we were able to develop more efficient approximate models.But for a problem involving the evaluation of mechanical responses the computation cost for the determination of n s responses and n s gradients is, of course, higher than that required to obtain n s responses alone.(In the former case, due to the use of finite differences to obtain gradients, the construction of the metamodel requires 3n s evaluations of the mechanical model.) Now let us take another approach: the idea is to build metamodels using the same number of evaluations.In the following case, we used the same six-hump function, but the kriging metamodel was constructed based on the responses at 27 sample points (Figure 12), while the cokriging metamodel was still constructed using the responses and the gradients at 9 sample points (Figure 13).[12][13] Table 4 shows that the quality of the prediction achieved with the two metamodels was similar, but a very important advantage of the cokriging metamodel was that it led to more zones where a minimum could be found than the kriging metamodel.If the problem involved the calculation of mechanical responses, the computation cost associated with the 9 points and their gradients, thanks to the multiparametric strategy and as shown in Table 1, would be much lower than that associated with the 27 points.

The five-and ten-dimensional test functions
In this section, we consider two analytical functions defined for n d variables: • the Rosenbrock function, which is a smooth function, can be defined as follow: • the Schwefel function, which is a very irregular function, can be defined as follow: Here, we only use these analytical function with 5 or 10 variables (i.e.n d = 5 or n d = 10).Using a Latin Hypercube Sampling 10 (for n d = 5) and 2 (for n d = 10) sets of sample points are generated for each number of sample points.Here we consider the number of sample points in the horizontal axis, thus for n s sample points the kriging metamodel is built from n s responses and the cokriging metamodels is built from n s responses and n s gradients (each gradient comprises n d components).The mean of the criteria R 2 and Q 3 is calculated for each number of sample points: metamodels are built with responses (and gradients) for each set of sample points obtained for each number of sample points; then for each metamodel the quality criteria are computed and finally for each number of sample points we compute the means of the two criteria.Figures 14 and 15 show the means of the quality criteria versus the number of sample points for the kriging and cokriging metamodels built using responses and gradients provided by the Rosenbrock and Schwefel functions with 5 variables.Figures 16 and 17 show the same results obtained with the ten-dimensional functions.Moreover, if we consider the case of the five-dimensional Schwefel function with the same total number of evaluations (i.e., if we consider n s responses and 5 × n s gradients for the cokringing, we can use n s + 5 × n s responses for the kriging), the quality is always better with the cokriging metamodel as soon as n s is greater than 30.As we have shown before that the MPS is particularly efficient in calculating the gradients, coupling a cokriging metamodel with the MPS should lead to a very efficient strategy to build metamodels.This is what we illustrate in the next part.

Application to a mechanical problem
In this section, we present the construction of the two previous metamodels (kriging and cokriging) using the multiparametric strategy presented in the first part of this paper and the actual responses of a mechanical model.The test case being considered is the three-squares example presented in Page 5. We carried out two studies: one with a fixed number of mechanical calls, and the other with a fixed quality of the metamodels.

Case of a fixed number of mechanical calls
Both metamodels were constructed using 15 mechanical evaluations.The kriging metamodel was determined using 15 values of the force, and the cokriging metamodel was determined with 5 values of the force and 5 gradients.The sample points were obtained through Latin Hypercube Sampling.Our calculations led to the following two response surfaces (Figures 18 and 19) to be compared with the actual response surface of Figure 4: Tables 5 and 6 show the characteristics of the calculations and of the metamodels.In this case, the cokriging metamodel led to a much better approximation of the mechanical model than the kriging metamodel.Moreover, this result was obtained at a lower computation cost.

Case of a fixed metamodel quality
Another way to study the cost of the metamodels consists in constructing kriging and cokriging metamodels with the same quality.For this study, we constructed the metamodels using sample points obtained through full factorial sampling.We chose to build the kriging metamodel with 9 × 9 samples and the cokriging metamodel with 3 × 3 and 5 × 5 samples.The results are shown in Table 7.
5 × 5 full factorial sampling led to a cokriging metamodel with the same quality as the 9 × 9 kriging metamodel.As in the previous case with a fixed number of mechanical calls, the cost with the cokriging metamodel was lower.In the case of more complex problems and more parameters, one can expect even better gains.

Conclusion
In this paper, we associated a cokriging metamodel with a dedicated multiparametric strategy.These two elements enabled us to achieve a significant reduction in the computation cost of constructing the metamodel.
Our multiparametric strategy takes advantage of a property of the LATIN method which is that the result of a previous calculation constitutes an efficient starting point for a subsequent calculation.This property is particularly advantageous for the evaluation of gradients, which are obtained using finite differences (i.e. with very small variations of the parameters).Thus, our strategy lends itself naturally to the use of a gradient-based metamodel such as the cokriging model presented in the paper.
Moreover, since this metamodel is part of a multilevel model optimization strategy, besides reducing the computation cost associated with the construction of the cokriging metamodel, one can a lso reduce the cost of the second level, in which algorithms based on gradient descent are used.This will be the subject of an upcoming work.

Figure 2 :
Figure 2: Schematic representation of the LATIN process

Figure 3 :
Figure 3: The multiparametric strategy using the LATIN algorithm

2 FFigure 4 :
Figure 4: The geometry of the problem and the response surface

Figure 5 :
Figure 5: The geometry of the problem of the four-variables example Total (w/o grad.)Total (w/-grad.)Responses Gradients 5Number of sample pointsGain of the MPS

Figure 6 :
Figure 6: CPU time and gain of the MPS associated two different sets of sample points for the four-variables test-case

Figure 7 :
Figure 7: The ordinary kriging and cokriging metamodels and their derivatives

Figure 8 :
Figure 8: The confidence envelopes for the kriging and cokriging metamodels

Figure 9 :
Figure 9: The actual function and its gradients (blue arrows)

Figure 12 :Figure 13 :
Figure 12: The kriging surface obtained with 27 responses (27 evaluations of the function)

Q 3 Figure 14 :
Figure 14: Quality of kriging and cokriging metamodels applied on five-dimensional Rosenbrock function

Figure 18 :Figure 19 :
Figure 18: The kriging response surface of the three-squares test case (using 15 responses)

Table 2 :
Characteristics of the two previous metamodelsCriteria RAAE and RMAE stand for Relative Average Absolute Error and Relative Maximum Absolute Error.Criteria Q i , which compare the actual response and the responses of the metamodels at n c points with n c >> n s , were calculated as follows: OCK 0.9958 3.027 • 10 −2 5.330 • 10 −1 1.391 • 10 −2 0.1824 2.027 • 10 −4

Table 4 :
Characteristics of the two previous metamodels (Figure

Table 7 :
The metamodels constructed using full factorial sampling