A Variational Discretization Concept in Control Constrained Optimization: The Linear-Quadratic Case

. A new discretization concept for optimal control problems with control constraints is introduced which utilizes for the discretization of the control variable the relation between adjoint state and control. Its key feature is not to discretize the space of admissible controls but to implicitly utilize the ﬁrst order optimality conditions and the discretization of the state and adjoint equations for the discretization of the control. For discrete controls obtained in this way an optimal error estimate is proved. The application to control of elliptic equations is discussed. Finally it is shown that the new concept is numerically implementable with only slight increase in program management. A numerical test conﬁrms the theoretical investigations.


Introduction
Consider the linear quadratic abstract optimal control problem; min (y,u)∈Y ×U J (y, u) s.t.y = Su and u ∈ U ad , (1) where U = U * denotes the Hilbert space of controls, Y Banach space of states, S : U → Y ⊆ U the linear, bounded control-to-state solution operator, and U ad ⊆ U the convex, closed set of admissible controls.For α > 0 further let where Z = Z * denotes a Hilbert space, z ∈ Z and Y → Z → Y * .Further let Ĵ (u) := J (Su, u).With this setting an equivalent formulation of problem (1) then is given by min u∈U ad Ĵ (u). ( Example 1.1.Z = L 2 ( ), S := (− ) −1 , Y = H 1 0 ( ), U = L 2 ( ) and U ad = {u ∈ U ; u a ≤ u ≤ u b a.e. in } fit into the abstract frame presented above, see also the subsequent sections.
The proof of the following theorem can be obtained by standard techniques and therefore is omitted.Theorem 1.2.Problem (2) admits a unique solution u * ∈ U ad , which satisfies the variational inequality There holds Ĵ (u) = αu + S * (Su − z), with S * denoting the adjoint of S.
In the present work a new discretization concept for the control problem ( 1) is introduced which is based on the discretization of the state space alone.It guarantees conformability of continuous and discrete admissible sets since these sets in fact coincide.Furthermore, in applications to control of partial differential equations with finite element discretizations the approach decouples the approximation of the active set from the nodes of the finite element grid.In the approach presented discrete controls are defined in terms of the discretized state and co-state variables, i.e. through the discrete optimality condition.

Main result:
Let S h denote a discretization of S which satisfies Assumption 2.3 below.Further let u * h ∈ U ad , u * h := argmin u∈U ad J (S h u, u).Then where u * denotes the solution of (1).Moreover, in practical applications, such as control of partial differential equations u * h is numerically computable with only slight increase in program management compared to the conventional methods.
In the literature the common concept for characterizing and computing approximate solutions to problem (1) is based on discretizations of the set of admissible controls U ad .Specifically when S denotes the solution operator of an elliptic partial differential equation and S h its finite element discretization on a finite element grid with gridsize h the approximation of controls often is also related to this mesh.Typical Ansätze are piecewise constant or piecewise linear, globally continuous controls, see Arada/Raymond/Tröltzsch [1].The authors for piecewise constant control approximations prove optimal error estimates of the form and also show convergence of the same order in L ∞ .A close inspection of their proof shows that the order of convergence can not be improved even in the presence of better L ∞ error estimates for the state and co-state variables.These observations extend to piecewise linear, globally continuous control approximations where the error of the control has the 2 ), see Rösch [12] for a proof under special assumptions on the continuous solutions, compare also Casas/Tröltzsch [4].These authors propagate the concept First discretize the control space, then discretize the state space and emphasize that state space approximation should be adapted to the needs of control space approximation.However, the discretization method in the presence of control constraints no longer is conform in the sense that projections of discrete controls onto the continuous admissible set need not be contained in the discrete admissible set of controls.Furthermore, relating the Ansatz for the controls to the finite element grid only allows to resolve the active set on this grid, which certainly is an additional drawback of the conventional approach.The discrete approach presented here circumvents these shortcomings.Moreover, it yields a discrete control u * h which also in the situations sketched above satisfies u * − u * h U = O(h 2 ) and, most important is computable with only slight increase in program management compared to the conventional methods.
The paper is organized as follows.In the following section the new discretization concept is introduced in the abstract setting.It is applied to linear quadratic control problems for elliptic equations in Section 3. Section 4 is devoted to numerical implementation of the new discretization concept and presents numerical results which confirm the theoretical considerations.

The discrete concept
In order to define discrete optimal controls let S h : U → Y h ⊂ Y ⊆ U be the linear, bounded control-to-discretized state solution operator, where Y h ⊂ Y is a finite dimensional subspace equipped with the norm of Y .

Note that Ĵh
Remark 2.2.In the case of linear quadratic control problems with control constraints (as is the case in the present work) the variational inequality in Definition 2.1 is the necessary and sufficient optimality condition of The definition of optimal controls in terms of the first order optimality condition allows greater flexibility.For example, a more general approximation concept would be given by where R h denotes an approximation of the adjoint operator S * .
The key idea of the discretization concept becomes more transparent in the case U ad = U .Then the discrete optimal control u * h necessarily satisfies where the right hand side lives in the space U but is a discrete object.In fact the discrete control u * h ∈ U is implicitly discretized in terms of the discretized adjoint operator and therefore is discrete.In the following section S h denotes the solution operator of a finite element discretization with linear elements of an elliptic equation.The control u * h in that case is a continuous, piecewise linear finite element function.In Section 4 it is shown that this concept in the presence of control constraints also allows an efficient numerical implementation.In order to prove the main theorem it is assumed that Assumption 2.3.S h , S * h satisfy Theorem 2.4.For h > 0 small enough the variational inequality (4) admits a unique solution u * h ∈ U ad , which satisfies Here, u * ∈ U ad denotes the unique solution of problem (1).
Proof: Existence of u * h follows from the fact that u * h = argmin u∈U ad J h (S h u, u).To argue uniqueness let u 1 = u 2 two distinguish solutions to (4).Utilizing u i as test control for u j , (i, j = 1, 2) and adding the corresponding inequalities one gets This implies and thus u 1 = u 2 .To prove the error estimate note that u * h ∈ U ad and therefore is admissible as test function for the variational inequality (3) for u * .Analogously to the proof of uniqueness one has Straightforward estimation yields Assumption 2.3 and α > 0 finally give From the considerations above one may also infer some kind of H −1 -estimate for the error in the controls in the case α = 0 .
Corollary 2.5.Let α = 0 and let the assumptions of Theorem 2.4 be satisfied.Assume further that U ad is also bounded and that Then problems (3) and (4) admit unique solutions u * and u * h , respectively which satisfy the estimate (7) Proof: Existence and uniqueness of solutions may be argued as in the case α > 0. To obtain the error estimate observe that S(u The first addend due to the assumptions is bounded by Ch The estimate for the second addend can be deduced from the proof of the previous theorem.

Application to control of elliptic equations
The approach presented in the previous section applies to a large class of control problems for partial differential equations [6,9].In the present section its application to control problems for linear elliptic partial differential equations is shown.
To begin with let n be a symmetric matrix which is uniformly positive definite on ¯ , i.e.
The solution operator S is the bounded inverse of a self-adjoint operator.For him standard existence and regularity theory [7,8] yields Proposition 3.1.For every f ∈ Y * (8) admits a unique solution y ∈ Y which depends continuously on the data f, i.e. with some positive constant C there holds see [3,8].
In order to formulate the control problem further let , where u a ≤ u b denote constants.Note that U ad is only weakly closed in U .Clearly, U ⊂ Y * so that the definition of S in problem (1) makes sense.
In order to apply the discrete concept introduced in Section 2 to the control problem min it remains to supply a discretization concept for the operator S.Here standard finite element discretizations of S with piecewise linear, globally continuous Ansatz functions on sequences {τ h } h of regular, quasi-uniform triangulations are considered, where h := max{diam(T ); T ∈ τ h }.One has in case (b) := ∪ T ∈τ h T for all h.In case (a) it is assumed that all boundary vertices of h . For the notions utilized see e.g.[5].Now set

action of S h is defined by
The discrete solution operator S h is selfadjoint.Moreover, for f ∈ Y * the function y h ∈ H 1 ( ) and y h 1 ≤ C f Y * .The following error estimates are well known and can be found for example in [5].
Further denote by y, y h the unique solutions of The discrete analogon to problem (9) reads and admits a unique solution u * h ∈ U ad , which satisfies the variational inequality (4).Recall that problem ( 9) also admits a unique solution u * ∈ U ad which satisfies (3).Since with the previous proposition the requirements of Theorem 2.4 are satisfied for the setting of this section one immediately gets ) denote the solutions to (9) and (11), respectively.Then, for h > 0 small enough there holds where C denotes a positive constant independent of h.
Remark 3.4.It should be noted that for piecewise constant control approximations only u * −u * h 0 = O(h) and for piecewise linear, continuous approximations only u * −u * h 0 = O(h 3/2 ) can be expected.The latter is observed in numerical experiments.The estimate of Theorem 3.3 therefore improves these results for the unique solution u * h of problem (11).
The key idea of the discretization concept becomes most transparent in the case U ad = U .In this case the unique discrete optimal control u * h of (4) necessarily satisfies Ĵ (u * h ) = 0, which can be rewritten in the form The right-hand side of this equation lives in the space U but is a discrete object.In fact the discrete control u * h ∈ U is implicitly discretized in terms of the discretized adjoint operator and therefore is discrete.Since in the present section S h denotes the discrete solution operator defined in (10) the control u * h is a continuous, piecewise linear finite element function.
Next consider the control constrained case U ad ⊂ U .Then it is well known that u * h can be characterized as where P [u a ,u b ] denotes the projection onto the admissible set U ad .This means that the control u * h is the projection of a finite element function onto the admissible set.And this control in fact is computable.For example the projected gradient method for every initial control u 0 ∈ U ad yields a sequence {u k } k ⊂ U ad , where with some stepsize ρ > 0 , which is a finite element function.It is proved in the next section that the number of connected components of its active set on each element of the triangulation is bounded from above by three.Moreover, this number can not increase with the iteration counter of the gradient method.Therefore u k is computable for every iteration index k with the same amount of computational overhead.More details are given in the next section.
With the help of the L 2 -estimate (12) it is also possible to provide error estimates in the L ∞ norm for difference of the solutions of the elliptic control problems ( 9) and (11).A further ingredient of the proof are the following discrete Sobolev embeddings, whose prove for example can be found in [14], see also [13] for the case n = 2. Proposition 3.5.Let τ h denote a quasi-uniform, regular triangulation of ⊂ R n (n = 1, 2, 3).Then for every piecewise linear, continuous finite element function v h ∈ H 1 0 ( ) there holds where C > 0 is a generic constant and |•| 1 denotes the H 1 semi-norm.
Theorem 3.6.Let z ∈ L 2 ( ) and let u * , u * h denote the solutions of problems ( 9) and (11), respectively.Then there holds To estimate the third and fourth addend utilize Proposition 3.5.For the third addend one gets in the case n = 2 Similarly for the fourth addend where Theorem 3.3 was utilized.The exposition for the cases n = 1, 3 is similar.This completes the proof.
Remark 3.7.To finalize the L ∞ error estimate for u * − u * h it remains to provide estimates for e 1 := (S * − S * h )Su * ∞ and e 2 := (S * − S * h )z ∞ .However, the approximation order for these terms is restricted by 2. In this sense estimate ( 14) is optimal.For example there holds maximum principle is satisfied for the finite element spaces, and see [5].

Numerical realization
In the present section the discretization concept is described in more detail.Firstly a characterization of the discrete solution of problem ( 9) and the related discrete active set is provided.Then two numerical algorithms are discussed which allow to implement the discretization strategy proposed in the present work.Finally a numerical example is presented which confirms the theoretical investigations.

Discrete solution and numerical algorithms
In order to start the exposition of this section recall that U ad is defined through constant box constraints with bounds u a < u b .Further recall that J (y, u) = 1 2 |y − z| 2 dx + α 2 |u| 2 dx, so that for given u ∈ U there holds Ĵ (u) = αu + μ, where μ = S * (Su − z).To anticipate discussion note, that the subsequent considerations are also valid for bounds which are linear on every simplex of the triangulation.They may also be extended to the case of smooth spatially varying bounds u a , u b , and also to more general cost functionals J (y, u) whose second second partial derivatives J uu satisfy certain monotonicity properties.

Theorem 4.1. Let u *
h ∈ U ad denote the unique solution to (P h ).Then there exists a partition

. , l(h)) is a polynomial either of degree zero or one. For l(h) there holds l(h) ≤ Cnt(h), with a positive constant C ≤ 3 and nt(h) denoting the number of simplexes in τ h . In particular, the vertices of the discrete active set associated to u *
h need not coincide with finite element nodes.Case n = 1: 0 a h ∩ T i is either empty or a point S a i ∈ T i .Every point S a i subdivides T i into two sub-intervals.Analogously 0 b h ∩ T i is either empty or a point

Proof: The solution u
The maximum number of sub-intervals of T i induced by 0 a h and 0 b h therefore is equal to three.Therefore, l(h) ≤ 3nt(h), i.e.C = 3. Case n = 2: 0 a h ∩ T i is either empty or a vertex of τ h or a line L a i ⊂ T i , analogously 0 b h ∩ T i is either empty or a vertex of τ h or a line L b i ⊂ T i .Since u a < u b the lines L a i and L b i do not intersect.Therefore, similar considerations as in the case n = 1 yield C = 3. Case n = 3: 0 a h ∩ T i is either empty or a vertex of τ h or an edge or a part of a plane L a i ⊂ T i , analogously 0 b h ∩ T i is either empty or a vertex or an edge of τ h or a part of a plane L b i ⊂ T i .Since u a < u b the surfaces L a i and L b i do not intersect.Therefore, similar considerations as in the case n = 2 yield C = 3.This completes the proof.
It follows now from the considerations above that commonly utilized convergent solution algorithms for problem (P h ) with discrete initial controls generate sequences of discrete iterates.In the following this is illustrated for the projected gradient method formulated next, and also for a special variant of the primal-dual active set method [2,10,11].
A close inspection of this algorithm shows that

. , l(h)) is a polynomial either of degree zero or
one.The number l(h) is independent of the iteration index k and satisfies the estimate with a positive constant C ≤ 3, where nt(h) again denotes the number of simplexes in τ h .
It is now easy to verify that with the settings of Corollary 4.3 the numerical overhead for Algorithm 4.2 compared to the standard approaches (a-priori discretization of U ad ) consists in managing varying grid points similar to S a i , S b i in the proof of Theorem 4.1.Of course these grid points depend on the iteration counter k of Algorithm 4.2 and may alter with every iteration of the algorithm, but there are at most two such points on every edge of a finite element simplex.
Next investigate the primal-dual active set strategy [2,10,11].Its algorithmical realization relies on the numerical treatment of the complementary system associated with problem (1).Both, the state equation y = Su and the control constraints u ∈ U ad are considered as hard constraints and are supplied with Lagrange multipliers.For the discrete optimal control problem (11) considered in the present work the complementarity system reads where λ = −λ a + λ b with λ a and λ b denoting the Lagrange multipliers associated with the box constraints, and σ > 0 is arbitrary.For the numerical solution of this complementarity system the primal-dual active set strategy works as follows.

Algorithm 4.4. Primal-dual active set strategy
To solve the system in step (4) (a), (b) of the algorithm the conjugate gradient method of Algorithm 4.5 may be utilized.Let H := α I d + S * h S h Algorithm 4.5.CG to solve (4) (1) Initialize (2) Do until convergence (a) t k := As already observed for the projected gradient method of Algorithm 4.2 the particular choice of a parameter, here namely σ = α allows the numerical implementation of this algorithm with only slightly increased numerical overhead when compared to the implementation of the conventional discretization approaches.Indeed, for σ = α in step 2. (a), (b) of the algorithm the active set in iteration step l can be identified by means of the finite element function −r − S * h S h u l−1 − αu a , so that on each simplex of the triangulation the active set in step l contains at most two connected components.These components again may be managed by varying grid points similar to S a i , S b i in the proof of Theorem 4.1.Again note that these grid points may alter with every iteration of Algorithm 4.4, but there are at most two such points on every edge of an finite element simplex.
Both algorithms presented perfectly mimic the decoupling of the discrete active set and the finite element grid.As the following numerical example illustrates the approximation of the active set with the method presented in the present work already is very accurate for coarse discretizations.Moreover, the boundary of the discrete active set seems to converge quadratically to that of the continuous active set.

Numerical example
Consider the following optimal control problem of tracking type; where the associated Lagrange multiplier of the equality constraint is given by μ The solution operator S is discretized with piecewise linear, continuous finite elements on an equidistant grid x i = ih with grid width h = 1 n+1 .The related discrete solution operator is denoted by S h .In all numerical computations presented ũ0 := 0, α = 0.1 and u b = 1 2 ( √ 2 − 1)/(2α), so that the boundary points ) of the continuous active set can never coincide with a point of the finite element grid.As numerical solution algorithm the projected gradient method of Algorithm 4.2 is utilized, where step (3) is replaced by ρ = 1 α .The iteration is stopped if the relative difference of two consecutive iterates and the distance of two consecutive active sets is smaller than 1.e−6.The method for this termination criterion converges after five iterations, where this number is independent of the finite element grid size.Recall that for this choice of ρ the projection step (4) consists of projecting on each element the finite element function −ρ S * h (S h u k − z) onto the admissible set.In figure 2 (left) a comparison is shown of the discrete control on a grid containing 5 points and the exact solution.As further reference to the performance of the generalized discretization concept also the numerical solution is shown which is obtained by the projected gradient method, where the projection in step ( 4) is replaced by is the finite element function with nodal values v i = max{u a , min{w(x i ), u b }}.To anticipate discussion note that for both discrete methods numerically quadratic convergence of the error in both the L 2 and L ∞ norm is observed, see the tables below.However, the overall size of the errors on the different refinement levels is much smaller for the generalized approach.The right picture in figure 2 shows a zoom on the controls at the right contact point p r for the numerical solutions obtained on a grid with 5 points.As one can see the generalized method approximates the exact control already pretty well.The figures illustrate that the numerical solution obtained by the generalized discrete concept already on this coarse grid provides a very good approximation of the continuous solution.Moreover, the active set is approximated very accurately as well, see also the tables below illustrating the convergence behavior of the methods.The numerical solution obtained with projection Q [u a ,u b ] only is capable of resolving the active set on the nodes of the finite element grid which results in linear convergence of the active set only.For the generalized approach numerically quadratic convergence is observed.
The experimental order of convergence for positive error functionals E(h) with h > 0 in this context is defined as follows: Given two grid sizes h 1 = h 2 , let EOC := ln E(h 1 ) − ln E(h 2 ) ln h 1 − ln h 2 .
It follows from this definition that if E(h) = O(h ξ ) (h → 0) then EOC ≈ ξ .The error functionals investigated in the present section are given by E 2 (h) := u − u h 0 , E ∞ (h) := u − u h ∞ and E a (h) := dist p r , p r h .
Table 1 shows the values of the error functionals for the generalized discretization approach for α = 0.1.The functional value and the values of the error functionals for the numerical method with projection Q [u a ,u b ] on the finest grid for u * replaced by its finite element interpolation are J = 2.39486559, E 2 = 6.278354e−08,E ∞ = 1.98161808e−07 and E a = 5.7761306e−05.In particular, compared to the generalized approach the error for E a is three orders of magnitude larger.Table 2 shows the experimental order of convergence for the generalized approach in the case α = 0.1.As one can see the error estimates ( 12) and ( 14) for the errors in the L 2 and L ∞ norms, respectively are numerically confirmed.Moreover, the discrete active set numerically converges quadratically to its continuous counterpart.Note that with projection Q [u a ,u b ] the numerically observed convergence of the L 2 and the L ∞ norm is also quadratic.However, the numerically observed rate of convergence of the discrete active set only is linear, compare the rightmost column in Table 2. Further note that with increasing α the experimental order of convergence for the L 2 and L ∞ norms of the method with projection Q [u a ,u b ] deteriorates, whereas the numerically observed convergence of the generalized approach is robust w.r.t.α.

compare figure 1 .
Now abbreviate ξ ah := − 1 α μ * h − u a , ξ b h := u b − 1 α μ * hand investigate the zero level sets 0 a h and 0 b h of ξ a h and ξ b h , respectively.

Figure 1 .
Figure 1.Solutions u * and u * h together with their active sets for h = 1/3 in the case n = 1 (left).Zoom of the same (right).The decoupling of discrete active set and finite element grid is obvious.

Algorithm 4 . 2 .
Projected gradient algorithm(1) ũ0 ∈ U , u 0 : a finite element function the determination of its active part follows the lines of the proof of the previous Theorem 4.1.This in turn proves Corollary 4.3.Let τ h denote a triangulation of and let u 0 ∈ U such that u 0 | T i for all T i ∈ τ h is either a constant or a linear function.Further set ρ = 1 α in step (3) of Algorithm 4.2.Then for every iterate u k of Algorithm 4.2 there exists a partition κ k

Figure 2 .
Figure 2. Comparison of optimal continuous control (black), control obtained from the generalized discretization concept (red) and by nodal projections (green) for 5 grid points (left), and zoom of the same at the right contact point (right).

Table 1 .
Values of cost and error functionals on different levels, α = 0.1.