Genetic Programming and Domain Knowledge: Beyond the Limitations of Grammar-Guided Machine Discovery

Application of Genetic Programming to the discovery of empirical laws is often impaired by the huge size of the domains involved. In physical applications, dimensional analysis is a powerful way to trim out the size of these spaces This paper presents a way of enforcing dimensional constraints through formal grammars in the GP framework. As one major limitation for grammar-guided GP comes from the initialization procedure (how to find admissible and sufficiently diverse trees with a limited depth), an initialization procedure based on dynamic grammar pruning is proposed. The approach is validated on the problem of identification of a materials response to a mechanical test.


Introduction
This paper investigates the use of Genetic Programming [Koz92] for Machine Discovery (MD), the automatic discovery of empirical laws. In the classical Ma chine Learning framework introduced in the seminal work of Langley [LSB83], MD systems are based on inductive heuristics combined with systematic explo ration of the search space. This approach suffers from severe limitations with real-world problems, due to ill-conditioned data and huge search spaces.
Such limitations are avoided in Genetic Programming (GP) due to its stochas tic search principle. The price to pay is that GP offers no direct way to incor porate expert knowledge, although the knowledge-based issues of Evolutionary Computation are now recognized [Jan93]. In this paper, the emphasis is put on a particular, albeit rather general, expertise: in all application domains, vari ables most often have physical dimensions that cannot be ignored, for exam ple, mass and length can not be added together. The restriction of the search space to dimensionally admissible laws has been tackled by [WM97] in a Machine Learning framework and by [KB99], using dedicated GP operators. On the other hand, an elegant and promising way to encode domain-knowledge is by formal grammars [RC098,H6r96]. One major difficulty with Grammar-Guided Genetic Programming (G3P) lies in the initialization step, the importance of which can not be overestimated [Dai99]. Finding admissible trees within a maximum depth might be challenging enough to result in poorly diversified populations. This paper investigates the use of grammars to restrict the GP search space to dimensionally admissible laws. The next section briefly presents context-free grammars and Sect. 3 discusses some related works. Sect. 4 describes a class of grammars for dimensionally admissible expressions, Sect. 5 presents its use for generating the population, and finally, Sect. 6 reports on numerical experiments with G3P for the identification of phenomenological laws in materials science. 2

Context-Free Grammars
A Backus-Naur form (BNF) grammar describes the admissible constructs of a language through a 4-tuple {S, N, T, P } , where S denotes the start symbol, N the set of non-terminal symbols, T the set of terminal symbols, and P the pro duction rules. Any expression is built up from the start symbol. Production rules specifie how should the non-terminal symbols, e.g. < expr >, be rewritten into one of their derivations (e.g. (< oper >< expr >< expr >)or< var>) until the expression contains terminal symbols only. Example: The above grammar describes all polynomials of the variable x (R is inter preted as any real-valued constant); hence it is equivalent to the GP search space with the node set .N = { +, *} and terminal set 1 T = { x, R} . One advantage of grammars is to allow fine-grained constraints to be imposed on the search space. Assume for instance that for one particular application, the parent node of an additive node must be a multiplicative node only, and vice versa. This is enforced via grammars by describing two non-terminals, < add -expr > and < mult -expr >, with the following production rules: <add-expr > < mult -expr > ( + <mult-expr><mult-expr>)I <var>; ( * < add -expr > < add -expr >) I < var >; In canonical GP, satisfying this constraint would require either to design a spe cific initialization procedure and evolution operators, or to filter out any non complying individual.

GP and Grammars: Previous Works
Canonical GP relies on the hypothesis of closure of the search space [Koz92], which assumes that the return value of any subtree is a valid argument for any function. This ensures that simple crossover and mutation (respectively swap ping sub-trees and replacing an arbitrary subtree by a random one) shall produce admissible offsprings. What is gained in procedural overhead is lost in expres siveness: neither syntactic nor semantic restrictions are accounted for, and prior knowledge can dictate nothing but the node set. This implies several limitations: -The size of the search space is huge, even for problems of moderate difficulty [Whi95]: it is typically exponential with respect to the number of terminals and nodes and to the maximum depth. -The general shape of the trees is arbitrary.
-Variables are assumed to be dimensionless.
Consequently, the use of canonical GP with typed or dimensioned variables implies the useless generation of a vast majority of irrelevant trees [CY97]. Sev eral authors have addressed this problem using various kinds of bias. A first kind is provided by the expert through domain knowledge. The importance of taking this knowledge into account is now generally admitted [Jan93]. In an MD context, prior knowledge might concern the shape of the solution2. A significant improvement in the success rate of a GP application can be obtained by biasing the shape of the parse trees toward some shapes that are a priori judged inter esting. This can be enforced by syntactic constraints; their beneficial effect have been illustrated by Whigham [Whi95] for the 6-multiplexer problem.
The use of syntactic constraints with genetic programming have been sug gested as a potential form of bias by Koza [Koz92] in 1992. More formally, Gruau [Gru96] has shown that syntactic constraints can be used for reducing the size of the search space by allowing only type-consistent parse trees. How ever, a major limitation of Gruau's approach is that no limitation is put on the depth of the trees. This usually results in a severe growth in tree size.
A second kind of bias consists of constraining the types of the variables manipulated by the tree expression. These constraints might be related to the adequacy of the variables and operators (e.g. don't take the square root of a neg ative value), or to the physical dimensionality of the variables. A first step toward dimensionally aware GP was proposed by Keijzer and Babovic [KB99]. The di mensionality of each expression is encoded by a label which consists of a vector of the exponents of the basic units. For example, a variable expressed in The requirements on the label of a subtree is de fi ned from its parent and sibling nodes. This implies that the initialization procedure may have to construct a subtree with any label (compound unit). A DimTransform function is defined and produces a terminal of the required units. Since no terminal exists for each possible unit, DimTransform might introduce non-physically meaningful constructs, precluding the physical relevance of the final tree. Therefore, an auxiliary fitness measure is introduced in order to favor trees with few calls to DimTransform.
Type constraints are closely related to the strongly typed GP (STGP) pro posed by Montana (Mon95] and extended by Haynes et al. (HSW96]. In STGP, a type label is associated to every terminal, argument, and return value. The initial population is created by restricting the random choices to terminals or functions having the appropriate type label. Crossover operates by swapping a subtree with another subtree of the same type, and mutation replaces a subtree with a random subtree of the same type. STGP does not address, however, the problem of the dimensional consistency of the expressions.
Formal grammars have been implemented in a GP system by Horner [Hor96], with crossover and mutation using procedures that are similar to those of the STGP. However, Homer's system suffered limitations from the difficulty of ini tializing valid parse trees, as was pointed out by Ryan [RC098]. 4 Dimensionalization through formal grammars The new concepts presented in this paper are twofold. The first part consists of using grammar rules for incorporating dimensionality constraints into a GP framework. Second, the limitations of Grammar-Guided GP are broken down by a new initialization procedure based on a dynamic pruning of the grammar, in order to generate only feasible trees of prescribed derivation depth. This approach is illustrated by a problem of mechanical behavior law identi fi cation. The elementary units involved are mass, length and time. The charac terization of any compound unit as an n-tuple giving its exponent with respect to the elementary units is borrowed from [KB99]. The allowed compound units are specified by the user. The present study is restricted to integer powers of the basic units in the range { -2 ... 2} . This excludes operators that returns fractional units (e.g., the square root). The domain of allowed units therefore contains 53 = 125 possible combinations. A non-terminal symbol is defined for each allowed compound units, together with the corresponding derivation rules to express all the admissible ways of resolving this symbol. Such a large number of combinations makes necessary the use of an automatic grammar generator. It might be objected that the size of this grammar makes it unpractical for real world applications. Indeed, its memory complexity is exponential with respect to the number of elementary units, but no extra housekeeping is devoted to the GP kernel for units management. Therefore, the computational cost of this approach is no larger than other grammar-guided GP systems, and the use of a standard GP engine is allowed with no internal modifications. For instance, the results presented in this paper use Homer's GP kernel as a basic engine [Hor96].
The grammar generator builds up each production rule with all the dimen sionally coherent derivations. The initialization procedure has to build up trees based on the provided gram mar. A major difficulty arises with the dimensioned grammar since most deriva tion rules can not be resolved directly into a terminal. The fraction of terminal derivations can be so small that there is almost no chance for a random process to select a terminal symbol. This implies, as noted by Ryan [RC098], that the trees tend to be very deep. On the other hand, if the user specifies a maximum tree depth, the initialization proceeds by massively rejecting oversized trees. The problem is similar to what occurs in constrained optimization whenever the feasible region is very small. Some mechanisms for controlling the derivation depth must therefore be in corporated in the initialization procedure. The proposed approach is intended to bound the initialization operator to the domain of dimensionally-feasible trees of depth equal or inferior to a prescribed value Dmax. During grammar generation, to each non-terminal symbol <NT> is associated an integer d( <NT>), giving the depth of the smallest tree needed to rewrite <NT> into terminal symbols. The depth associated to each terminal symbol (operators, variables and constants) is set to 1. The depth of each <NT>, initially set to infinity, is recursively computed according to the following relations: During the tree-generation phase, depth labels are employed in order to en force the bound on tree size. Given a non-terminal node at a depth D in a tree, and assuming a maximum tree depth of Dmax, the remaining allowed depth Dmax -D is computed. The chosen derivation is randomly drawn among the subset of the derivations for which d( <NT>) � Dmax -D. This way, it is impos sible for the algorithm to engage into a path that has no fully terminal solution in less than Dmax-D steps, and by the way, all the generated trees are feasible.

6
Numerical Experiments The test-case presented herein is a simplified real-world application where an al gebraic law is expected to be found for modeling experimental data correspond ing to the constitutive law of a material during an indentation test. Figure 1 presents a schematic view of the experimental setup. A hard indenter of a pre scribed shape (usually conical or tetrahedral) is pressed against the surface of the material to be tested out. The experimenter records the reaction force F along time t and displacement u. For simple constitutive laws, the analytical relations between force, displace ment, and materials properties are well known [Joh87]. For complex constitutive laws, finite elements models allow one to simulate the material reaction force. However, this simulation is rather expensive (3 hours on an HP350 workstation). For ill-known materials, only experimental data are available. This pinpoints the need for a simple analytical model in the two latter cases.  (2) where A and Pare unknown functions of the materials properties. The avail able physical quantities and their associated units are presented on Table 1. Due to the noisy nature of the examples, it is not expected that GP, nor any other machine discovery algorithm, will find out a solution that exactly fits the data. Machine Discovery experiments have been conducted with the GP parame ters given on Ta ble 2. The crossover consists of swapping two arbitrary subtrees from two parents, with a choice restricted to subtrees having a root node of the same type. Tree mutation consists of crossing over one individual with a ran dom admissible tree. The point mutation replaces one terminal node by another terminal of the same type. This operator is analogous to a local improvement operator, and has been observed, for the present problem, to be less destructive than the tree mutation. Six grammars were devised and are described as follows:

Material sample
1. universal-non-dim: The most general case, with no a priori knowledge or dimensional constraints. This grammar is equivalent to canonical GP: This grammar enforces two constraints on the search space: the highest level (root) operator is necessarily an exp operator, and this exp operator is multiplied by an arbitrary expression (first argument) but exponentiates an expression multiplied by the time t.

3.
[Au2ePt]-non-dim: The complete shape constraint [Au2 exp(Pt )] is now enforced in a way similar to the previous case. 4. universal-dim: Dimensional constraints but no shape constraint. The so lution is expressed in Newtons, so the start symbol is defined a priori as: S : = <NT+l+l-2>; 5.
[AePt]-dim: Dimensional constraints plus the partial shape constraint of the second grammar. 6.
[Au2ePt]-dim: Dimensional constraints plus the complete shape constraint as in the third grammar. Figure 3 presents the size of the search space computed as a function of the allowed derivation depth, with the universal grammar (case 1), and the dimensionally-constrained grammar (case 4). These curves show that in both cases, the number of solutions grows exponentially, but the search space can be reduced by several order of magnitude with the use of dimensional constraints. Average best fitness value over 20 independent runs, and standard deviation are presented on Table 3, while the evolution of the average best fitness with respect to the number of evaluations is plotted on Fig. 4 for the non-dimensional grammars and on Fig. 5 for the dimensional grammars. Comparisons based on the number of evaluations are fair benchmarks since no significant variation in total computation time have been noticed between the grammars.
Figures 4 and 5 ask for two comments. First of all, giving the expected shape of the equation does not necessarily improve the results3. It partially does so in the case of non-dimensional grammars. But this might be due to the fact that the shape constraint prevents the search from being trapped in the same local optimum the universal grammar always falls in, which causes the null standard deviation observed for this case. This local optimum corresponds to the function F = t2 e 2e u . For the dimensional grammars, shape constraints are detrimental to the quality of the results in both cases. Second, the dimensional constraints appear to be clearly beneficial since the results obtained with dimensional gram mars always supersede those obtained with untyped grammars, by an average of 6 standard deviations.    The innovations presented in this paper are twofold. First, a novel approach for the management of dimensionality constraints by the means of an automatic grammar has been presented. Second, the point of designing an admissible and still sufficiently diversified initial population has been addressed through dy namic pruning of the grammar, depending on the maximum tree depth allowed, and the current position in the tree. So far, the initialization step was a major limitation to the use of formal grammar for constraining a GP search space.
The main limitation of the presented approach is its dependence over a lim ited range of allowed units. Using fractional units can be made possible by the use of rational instead of integer numbers. This would allow the use of a broader range of operators (square root, powers, ... ), but would be equivalent to having twice as many basic units. Further research will be devoted to the simultaneous evolution of the grammar and the GP trees, in order to evolve grammars that facilitate the discovery of fitter individuals.