Induction of constraint logic programs

Inductive Logic Programming (ILP) is concerned with learn­ ing hypotheses from examples, where both examples and hypotheses are represented in the Logic Programming (LP) language. The application of ILP to problems involving numerical information has shown the need for basic numerical background knowledge (e.g. relation "less than"). Our thesis is that one should rather choose Constraint Logic Programming (CLP) as the representation language of hypotheses, since CLP contains the extensions of LP developed in the past decade for handling numerical variables. This paper deals with learning constrained clauses from positive and negative examples expressed as constrained clauses. A first step, termed small induction, gives a computational characterization of the solution clauses, which is sufficient to classify further instances of the problem do­ main. A second step, termed exhaustive induction, explicitly constructs all solution clauses. The algorithms we use are presented in detail, their complexity is given, and they are compared with other prominent ILP approaches.


Introduction
Inductive .Logic Programming {ILP) is concerned with supervised learning from examples, and it can be considered a a subfield of Logic Programming (LP): it uses a subset of the definite clause language (e.g.used in Prolog) sometimes extended with some form of negation, to represent both the examples and the hypotheses to be learned (14]. The application of ILP to problems involving numerical information, such as chemistry [7], has shown the need for handling basic numerical knowledge, e.g.relation less than.This has often been met by supplying the learner with some ad hoc declarative knowledge [23].However, one cannot get rid of the inherent limitations of LP regarding numerical variables: functions are not interpreted, i.e. they act as functors in terms.The consequences for that are detailed in section 2.1.Other possibilities are to use built-in numerical procedures [17], or An equally important difference is that our approach is rooted in the Version Space framework [11].More precisely the set of solution clauses Th here consists of all hypotheses partially complete (covering at least one example) and consis tent (admitting no exceptions) with respect to the examples [19].This contrasts with other learners retaining a few hypotheses in Th, optimal or quasi-optimal with regards to some numerical criterion such as the quantity of information for FOIL, or the Minimum Description Length for PROGOL.This paper presents a 2-step approach.A computable characterization of Th is constructed in a first step, termed small induction; this characterization is suf ficient for classification purposes.The explicit characterization of Th is obtained in a second step, termed exhaustive induction, which is much more computation ally expensive than small induction.This 2-step approach allows one to check whether the predictive accuracy of the theory is worth undergoing the expensive process of explicit construction.Further, we show that exhaustive induction can be reformulated as an equivalent constraint solving problem; thereby, the burden of inductive search can be delegated to an external tool, purposely designed for combinatorial exploration of continuous domains or finite sets.
The rest of the paper is organized as follows.Next section briefly presents CLP.Then the induction setting is extended from LP to CLP: the notions of completeness and consistency of constrained clauses are defined.Section 4 is devoted to building constrained clauses consistent with a pair of examples.This is used in section 5, to characterize the set of solution clauses via small induction.Exhaustive induction is described in section 6, and section 7 is devoted to a complexity analysis of both algorithms.We conclude with some comparison with previous works and directions for future research.This section describes the formalism of constraint logic programming, for it both subsumes logic programming [5] and handles clauses that would require an additional background knowledge to be discovered in ILP.

The need for CLP
As said above, functions are not interpreted in LP; they are only treated as functors for Herbrand terms.It follows that an equation such as X -Y = O will never be true in a LP program: as sign 1-1 is not interpreted, the two sides of the equation cannot be unifi ed.
In practice, Prolog systems offer a limited form of interpreted functions, using the is programm ing construct.This construct evaluates a ground term built with numerical constants and arithmetic functors, and returns the corresponding numerical value.However, this evaluation only applies to ground terms.Again, the goal Z is X -Y will not succeed unless both X and Y are instantiated with numerical values.Prolog systems also provide some predicates over numerical constants, e.g.=<, which suffer from the same limitations.
Thus, in order to handle numerical variables without extending unification, one must carefully design predicate definitions, and use the interpretation of functions when some ground terms are found.Here is a clever example of such a defi nition, reported from [23].The goal is to define the less_than predicate.First thing is to handle the ground case: Then, in order to handle the non ground variables, one must introduce ex plicitly a way to bind the variables.The approach presented in [23] consists in introducing a predicate float, that represents a finite set of numerical constants: The definition of the inequality predicate can then be extended as follows: Such a clever intensional definition still depends on (and is limited by) an extensional definition of floating point constants.

Notations and definitions
The key idea of CLP stems from the observation that unification is an algorithm for solving equality constraints between Herbrand terms.Hence, new computa tion domains can be added to LP if adequate constraint solvers are provided.An alternative to special purpose definitions of predicates and extensional definition of numerical domains, precisely consists of developing an adequate constraint solver, that extends deduction through built-in interpretation of numerical con stants and constructs.The CLP scheme thus generalizes the LP scheme as equa tion solving is a special case of constraint solving.
This requires the introduction of an algebraic semantics.Of course, our aim is not to present here an exhaustive state of the art in CLP (see [241), but rather to define the basic CLP notions with respect to the classical LP and ILP terminology [9,14].
Let C = Ca U Cc, be a definite clause language without function symbols other than constants, where Ca (respectively Cc) defines the set of uninterpreted {resp.interpreted) predicate symbols.
Definition 1.In the following, a constraint denotes a literal built on a predicate symbol in Cc• An atom denotes a literal built on a predicate symbol in Ca.A constrained logic program is a finite set of constrained clauses.
A constrained goal is a clause of the form: where B1, .••, Bm are atoms and c1, .••, Cn are constraints.

Operational Semantics of CLP language
In LP, an answer to a query G with respect to a logic program P is a substitution <J' (expressed as a set of equalities on variables of G) such that G<J' belongs to the least Herbrand model of P.An answer to a query G with respect to a CLP program P is not a substitution any more, but a set of consistent constraints such that all atoms in G have been resolved.We refer to (24] for a formal defi nition of the inference rule used in CLP, as this is beyond the scope of this paper.where P is a constraint logic program, S is a structure, T is the theory axioma tizing S and ('v')F denotes the universal closure of F.
The operational semantics of a CLP language can be defined either in terms of logical consequences or in an algebraic way (25] (see [5] for a detailed discus sion).From now on, after (24], we use the only notation V f=, which may be read both as the logical or algebraic version of logical entailment.A constraint c is consistent with a set (i.e.conjunction) of constraints a if 1) I= (3)(0' Ac).

Domains of computation
Practically, we require the type of any variable X to be set by a domain con straint (equivalent to a selector in the Annotated Predicate Calculus terminology (10]).This domain constraint gives the initial domain of instantiation ilx of the variable.We restrict ourselves to numerical, hierarchical and nominal variables, where ilx respectively is (an interval of) Nor R, a tree, or a (finite or infinite) set.
Domain constraints are of the form (X E dom(X)), where dom(X) denotes a subset of ilx.The domain constraints considered throughout this paper are summarized in Table 1.

Type of X
Initial domain ilx Domain constraint X E dom ( X) numerical (interval of) R or N dom(X) interval of R or N .hierarchical tree dom(X) subtree of ilx nominal finite or infinite set dom(X) subset of Dx Table 1: Domains of computation and domain constraints A binary constraint involves a pair of variables X and Y having same do mains of instantiation.The advantage of binary constraints is to allow for com pact expressions: (X = Y) replaces page-long expression of the form (X E {red}) and (Y E {red}) or (X E {blue}) and (Y E {blue}) or ... The binary constraints considered in this paper are summarized in Table 2.
Type of X and Y Binary constraints numerical linear inequality nominal equality and inequality Our constraint language is restricted to conjunctions of domain constraints and binary constraints as above.Two reasons explain our choice: this language is sufficient to deal with most real world problems, and it is supported by complete constraint solvers [4].

Induction settin g in CLP
This section briefly recalls the basic induction setting and the Disjunctive Version Space approach.The key definitions of inductive learning, namely completeness and consistency, are then extended from LP to CLP.

Learning setting and Disjunctive Version Space
Let the positive and negative examples of the concept to be learned be expressed in the language of instances Ci, and let Ch denote the language of hypotheses.Let two boolean relations of coverage and discrimination be defined on Ch x Ci, respectively telling whether a given hypothesis covers or discriminates a given example.The basic solutions of inductive learning consist of hypotheses that are com plete (cover the positive examples) and consistent (discriminate the negative examples).
The Version Space (VS) framework gives a nice theoretical characterization of the set of solutions [11].Unfortunately noisy examples and disjunctive target concepts lead VS to fail, which implies that VS is not applicable to real-world problems1• The Disjunctive Version Space (DiVS) algorithm overcomes these limitations via relaxing the completeness requirement [19].More precisely, DiVS constructs the set Th of all hypotheses that are partially complete (cover at least one example) and consistent.This is done by repeatedly characterizing the set Th( E) of consistent hypotheses covering E, for each training example E.
The 3.2 From ILP to CLP When the current training example E is a definite clause, we proposed to express E as CO, where C is the definite clause built from Eb y turning every occurrence of a term ti in E into a distinct variable Xj, and 0 is the substitution given by {Xi/ti} [18]: E=CO This decomposition allows induction to independently explore the lattice of definite clauses generalizing C, and the lattice of substitutions or constraints over the variables in C, that generalize 0: as a matter of fact, a substitution is a particular case of constraint (a set of equality constraints between Herbrand terms).
When training examples are described by constrained clauses, we must first get rid of the fact that one constrained clause may admit several equivalent ex pressions.
Definition 5. Let g be a constrained clause.The canonical form of g is defined aS Gl I Where • G is the definite clause built from g by deleting the constraints and turning every occurrence of a term ti in g into a distinct variable X;; • I is the maximally specific conjunction of constraints entailed by the constraint part of g and the constraints (X; =ti).
Example: Let g be a constrained clause describing some poisonous chemical molecules: The canonical expression of g is G1, with In the remainder of this paper, "constrained clause" is intended as "constrained clause in canonical form".
Let E = CB hereafter denote the constrained clause to generalize.The lan guage of hypotheses Ch is that of constrained clauses G'Y where G is a definite clause generalizing C in the sense of 8-subsumption [14], noted C � G, and I is a conjunction of constraints set on variables in C, such that (} entails I (Definition 4): Di VS thus explores a bound logical space with bottom C, and a bound constraint space with bottom 8.

Completeness and Consistency in CLP
The generality order on constrained clauses is extended from the generalization order on logical clauses defined by 8-subsumption [14], and from the generaliza tion order defined by constraint entailment [6].Negative examples are also represented as constrained clauses.Indeed, there is no standard semantics for the negation in Logic Pro gramming, and even less for CLP.We therefore explicitly introduce the negation of target predicate tc, noted 0PP tc; negative examples are constrained clauses concluding to 0 PP tc.For instance, if active is the target predicate, we introduce the opposite predicate symbol 0PP active (= inactive).
Then, for any constrained clause g, let o pp g be defined as the constrained clause obtained from g by replacing the predicate in the head of g, by the opposite target predicate.
O PP head(g) +-body(g) The consistency of a constrained clause is defined as follows: Let G1 and G'1' be constrained clauses.G1 is inconsistent with respect to G' 1' iff there exists a substitution u on G such that Gu is included into 0PP G ' and I is consistent with "(1 u: Such a substitution a is termed negative substitution on G derived from G'11• G1 discriminates G' 1', if there exists no negative substitution u on G derived from G111• Example: Let g and g' be two constrained clauses as follows: g: poisanous(X) +-atm(X, Y, carbon, T), atm(X, U, carbon, W), (T > W -2) g' : 0PP poisanous(X) +-atm(X, Y,Z,T),atm(X,U,Z, W),(T � W) Then, g is inconsistent wrt to g': u being set to the identity substitution, one sees that a molecule involving two carbon atoms with same valence (T = W) would be considered both poisonous according tog, and non poisonous according to g'.

4
Building discriminant constrained clauses This section focuses on the elementary step of Disjunctive Version Space, namely constructing the set D{E, F) of constrained clauses covering E and discriminat ing F (in the sense of definition 7), where E and F are constrained clauses concluding to opposite target concepts.We assume in this section that E is consistent with respect to F.
Given the chosen hypothesis language, there exists two ways for a candidate hypothesis G1 to discriminate F: The first one, examined in section 4.1, operates on the definite clause part of G1 : G7 discriminates F if G involves a predicate that does not occur in F. The second one, examined in sections 4.2 and 4.3, operates on the constraint part of G1: G1 discriminates F if 'Y is inconsistent with the constraint part of F.

Discriminant predicates
Due to the fact that C involves distinct variables only, any clause G subsuming C discriminates F iff it involves a predicate symbol that does not occur in F, termed discriminant predicate.Predicate-based discrimination thereby amounts to boolean discrimination {presence/absence of a predicate symbol).More for mally, Proposition 1.Let Gpred(F) be the set of clauses head(C) t-Pi()., for Pi rang ing over the set of discriminant predicate symbols.Then, a definite clause that subsumes C discriminates F if! it is subsumed by a clause in Gpred(F).
Gpred(F) thereby sets an upper bound on the set of definite clauses that subsume C and discriminate F. Note this set can be empty: e.g. in the chemistry domain, all example molecules are described via the same predicates (atom and bond), regardless of their class (poisonous or non poisonous).

Discriminant domain constraints
Let G be the generalization of C obtained by dropping all discriminant predicates.With no loss of generality, F can be described 2 as 0PPGp, with p being the constraint part of F.
Hence, G is inconsistent with F; and due to the fact that C (and hence G) involves distinct variables only, any negative substitution on G derived from F (Definition 7) is a permutation of variables in G. Let E denote the set of these negative substitutions.Note that constraints on G are trivially embedded onto constraints on C.
One is finally interested in the following constraints on C: • Constraint 0 which is the constraint part of example E, • Constraint p which is the constraint part of example F, • And the set E of negative substitutions derived from F (being reminded that substitutions are particular cases of constraints).
Let us first concentrate on domain constraints, and assume in this subsection that our constraint language is restricted to domain constraints3.A constraint 'Y is thus composed of a conjunction of domain constraints (Xi E dom.,(Xi)), for Xi ranging over the variables in C. It is straightforward to show that the lattice of constraints on C is equivalent to the lattice Ceq = 'P(fl 1 ) x 'P(fl 2 ) x . . ., where ni denotes the domain of instantiation of Xi, for Xi ranging over the variables of C, and 'P{fli) denotes the power set of ni.An equivalent representation of 'Y is given by the vector of subsets dom.,(Xi)•Building discriminant domain constraints is thus amenable to attribute-value discrimination: two constraints are inconsistent iff they correspond to non over lapping elements in Ceq.
Let us now characterize the constraints discriminating example F. By defi nition, G1 discriminates F iff 1 is inconsistent with pa for all a in E. Definition 8.An elementary discriminant constraint with respect to a negative substitution a and a variable X, is a domain constraint on X that is entailed by () and inconsistent with pa.A maximally general elementary discriminant constraint wrt a and X is called maximally discriminant.
In the considered domain constraint language {section 2.4), there exists at most one maximally discriminant constraint wrt a negative substitution CT and a variable X, noted {XE domu0(X)): -if X is a numerical variable, such a maximally discriminant constraint exists iff domo(X) et domp(X.CT) are disjoint, in which case domu• (X) is the largest interval including domo(X) and excluding domp(X.CT).
-if X is a hierarchical variable, such a maximally discriminant constraint exists iff domo(X) et domp(X.u)are subtrees which are not comparable, in which case domu• (X) is the most general subtree that includes domo(X) and does not include domp(X.a).if X is a nominal variable, such a maximally discriminant constraint exists iff domo(X) et domp(X.CT) do not overlap, in which case domu• (X) is the complementary in Ox of domp(X.CT).For the sake of convenience, domain constraint (XE domu• (X)) is noted (X fl domp(X.u)).
If domu• (X) exists, X is said to be CT-discriminant.
By construction, a domain constraint on X that is entailed by () and dis criminates pCT must entail (X E domu• (X)).An upper bound on the domain constraints that discriminate pa is then given by the disjunction of constraints (X E domu• (X)), for X ranging over the a-discriminant variables in C.More formally, Proposition 3. Let var(C) be the set of variables in C, let CT be a substitution in E, and let lu be the disjunction of constraints (Xi E domu• (Xi)) for Xi ranging over the er-discriminant variables in var(C).Let 'Y be a conjunction of domain constraints on variables in C that is entailed by ().Then, 'Y is inconsistent with per iff Example: Let E and F be as follows: E: poisanaus(X) +-atm(X, Y, carbon,T), atm(X, U,carbon, W),T < 24, W � 25 F: 0PPpoisanaus(X) +-atm(X, Y, hydrogen, 18), atm(X, U, carbon, W'), W' � 21 The definite clause C built from E is given below; variables Z and V are nominal, with domain of instantiation {carbon, hydrogen, oxygen, ... } .Variables T and W are continuous, with domain of instantiation N , (Other variables are discarded as they do not convey discriminant information).
C: poisono us(X) +-atm(X',Y,Z,T) ,atm(X",U, V, W) There is no discriminant predicate (G = C); E includes four negative substitu tions u1, u2, 0'3 and cr 4 which correspond to the four possible mappings of the two literals atm in C onto the two literals atm in F.
Table 3 shows a tabular representation of the constraints 8 and pui, where a case of the matrix is a sub domain of the domain of instantiation of the variable.for the sake of readability):

Discriminant binary constraints
We showed that building discriminant binary constraints is amenable to building discriminant domain constraints, via introducing auxiliary constrained variables, termed relational variables (21] .In the chosen constraint language, all binary constraints can be expressed as domain constraints on such auxiliary variables.Proposition 3 then generalizes as: As an example, the tabular representation (Table 3) is extended to binary constraints as well:   And the disjunctive constraint -y,,.4 entailed by fJ and maximally general such that it is inconsistent with pa4 is given as: -y,,.4 Last, one considers the conjunction of the constraints -y,,.for u ranging in E: Proposition 5. Let G be a generalization of C inconsistent with respect to F, and let IF be the conjunction of constraints -y,,.for u ranging in E. Then G1 discriminates F iff 'Y entails 'YF.
Constraint IF thus defines an upper bound on the constraints discriminating F, like Gpred(F) is the upper-bound on the set of definite clauses that gener alize C and discriminate F. These are combined in the next section in order to characterize all consistent partially complete constrained clauses.

Small induction
Our goal is here to characterize the Disjunctive Version Space learned from positive and negative constrained clauses, and to use this characterization to classify further instances of the problem domain.In other words, the pairs (Gpred(Fi),'YFJ constitute a computational char acterization of Th(E): they give means to check whether any given constrained clause belongs to Th(E).
The Disjunctive Version Space finally is constructed by iteratively character izing Th(E), for E ranging over the training set.
However, looking for consistent hypotheses make little sense when dealing with real-world, hence noisy, data.One is therefore more likely interested in hypotheses admitting a limited number of inconsistencies.Let The(E) denote the set of hypotheses covering E and admitting at most e inconsistencies.Then, we show that The(E) can be characterized from the pairs (Gpred(.Fi), 'YFJ, with no additional complexity [19]: a constrained clause G"f covering E belongs to The: ( E) iff it satisfies condition (1) above, for all but at most e counter-examples Fi to E.
The advantage of this approach is to delay the choice of the consistency bias, from induction to classification, at no additional cost [19): Induction constructs once and for all the pairs ( G pred (Fi), "/F,), or a tractable approximation of these [22]; This allows one to tune the degree of consistency of the hypotheses used during classification, at no extra cost 4 •

Classification in Disjunctive Version Space
One major result of this approach is that the computational characterization of the Disjunctive Version Space is sufficient to classify any further instance of the problem domain.In other words, the explicit construction of Th(E), for E ranging over the training examples, gives no extra prediction power.
The Disjunctive Version Space includes hypotheses concluding to opposite target concepts, since positive and negative examples are generalized.And, though these hypotheses are consistent with the training examples, they usu ally are inconsistent with one another.Classification therefore does not rely on standard logic, but rather on a nearest-neighbor like approach.The instance I to classify is said to be neighbor of a training example E, if I is generalized by a hypothesis in Th(E); I is thereafter classified in the class of the majority of its neighbors.
One shows that I is generalized by a hypothesis in Th(E) iff it is generalized by a hypothesis in D(E, F), for every counter-example5 F. And this can be checked from the computational characterization of D(E,F): Let I be an instance of the problem domain, formalized as a conjunction of constrained atoms.Then I is generalized by the body of a clause in D(E, F) iff there exists a generalization G of C and a constraint 'Y such that the body of G7 generalizes I, and either G is subsumed by a clause in Gpred(F) or 'Y entails IF .
The important distinction compared to Prop 6. is that 'Y is not required to be entailed by 0 any more: Prop 7 only requires to consider the substitutions between C and the definite part of I.

A two-step induction
We thus propose a two step induction scheme.During the first step, called small induction, all pairs of training examples (E, F) satisfying opposite target con cepts are considered; and for each such pair, we build the set of discriminant definite clauses Gpred(F) and the discriminant constraint [F (conjunction of disjunctions).As shown above, this is sufficient to address the classification of unseen examples, and characterize the set of consistent partially complete con strained clauses.
During the second step, called exhaustive induction, all such consistent con strained clauses are explicitly built, and it is shown in the next section that exhaustive induction can be achieved by constraint solving.
The advantage of this scheme is twofold.First, the burden of explicitly con structing the hypotheses can be delegated to constraint solvers, that is, algo rithms external to induction and geared for combinatorial search in discrete and continuous domains.
Second, small induction can be viewed as an on-fly, lazy learning, the com plexity of which is much smaller than that of exhaustive induction (section 7): it constructs theories which are not understandable, but yet operational to clas sify examples.One may then get some idea of the accuracy of a theory, before undergoing the expensive process of making it explicit.
In this scheme, constraint solving is employed to several tasks (indicated with an asterisk): It is used to prune 7p: a partial order noted <E can be defined on the negative substitutions with respect to the positive substitution [20].Minimal substitutions with respect to this partial order can be viewed as "near-misses": all substitutions but the minimal ones, can soundly be pruned.This pruning was explicitly dealt with in previous works [18,20].It turns out to be a special case of constraint entailment (ui <Eu ; is equivalent to 'Yu, --<c 'Yu ; ) and this pruning can therefore be achieved by a constraint solver.
It chiefly allows for building G7, through selecting specialization choices, checking whether the current solution G7 is subsumed by a clause in Gpred(F i ) , and backtracking.
Last, it allows for testing whether G7 is maximally general6 in Th(E). 7

Complexity
Assume that the domain of instantiation of any variable can be explored with a bounded cost.Then, the complexity of building the maximally discriminant constraint "lu that discriminates a negative substitution u, is linear in the number of initial and relational variables in C. In our constraint language, this complexity is quadratic in the number X of variables in C.
H £ denotes an upper bound on the number of negative substitutions de rived from a counter-example (the size of E), the complexity of building 'YF is then O(X2 x £).The complexity of building Gpred(F) (section 4.1) is negligible compared to that of building "IF (it is linear in the number of predicate symbols in E, which is upper-bounded by X).
Finally the computational characterization of D(E, F) has complexity O(X2 x £).
Characterizing the Disjunctive Version Space Th requires all pairs D(Ei, F;) to be characterized; if N denotes the number of training examples, the compu tational characterization of Th has complexity O(X2 x £ x N2).
The complexity of classifying an unseen example I from Th (proposition 7) is the size of the implicit characterization of Th times the number of substitutions derived from I, upper bounded by £; the complexity of classification hence is The complexity of the intentional characterization of Th, via algorithm ICP, is in O(N x (x2x.cxN) ).Needless to say, the learning and classifying processes based on the computational characterization of Th are much more affordable than those based on the explicit characterization of Th.
The typical complexity of first order logic appears through factor £: if M is an upper bound on the number of literals based on a same predicate symbol that occur in an example, and Pis the number of predicate symbols, .C is in M M x P .
For instance, in the mutagenesis problem [7], examples are molecules involving up to 40 atoms; .C is then 4040• We therefore used a specifically devised heuristic to overcome this limitation.The exhaustive exploration of the set E of negative substitutions, was replaced by a stochastic exploration: we limit ourselves to consider a limited number TJ of samples in E, extracted by a stochastic sampling mechanism [22].An approxima tion of D(E, F) was therefore constructed in polynomial time (O(X2 x T/ x N2); to give an order of idea, the number T/ of samples considered in E was limited to 300 (to be compared to 4040).This approach led to outstanding experimental results, compared to the state of the art on the mutagenesis problem [23]. 8

Discussion and Perspectives
This section first discusses our choice of a maximally discriminant induction, then situates this work with respect to some previous works devoted to generalization of constraints (16,12] or reformulation of !LP problems [8,26,27].

Generalization Choices
This work first extends the frame of induction to constraint logic programming; see [22] for an experimental demonstration of the potentialities of this language.
Note that this frame does not allow to learn clauses that could not be learned by state-of-art learners, supplied with an ad hoc knowledge.Rather, it allows to learn simple numerical relations without requirement for additional knowledge.
A second aspect of this work concerns the tractable characterization of the Disjunctive Version Space of consistent partially complete hypotheses.In op position, as mentioned earlier, the theories built by either PROGOL or FOIL include only a few elements in this set.
Like PROGOL, ICP handles non ground examples, in opposition to FOIL [17]; but domain theory {that cannot be put as examples) can be considered only through saturation of the examples: ICP cannot use the domain knowledge in order to guide the exploration of the search space, as ML-Smart [1] or PROGOL do.

Generalization from constraints
As far as we know, the generalization from constraints has only been addressed so far by Page and Frisch [16] and Mizoguchi and Ohwada [12].
In [16], the goal is to generalize constrained atoms.Constrained atoms are handled as definite clauses whose antecedents express the constraints.Con strained generalizations of two atoms are built from the sorted generalizations defined on their arguments.In both [16] and our approach, generalization ulti mately proceeds by building constraints.But different issues are addressed.In [16], the main difficulty arises from the possibly multiple generalizations of two terms, which does not occur in our restricted language (section 4.2).In oppo sition, the main difficulty here comes from the multiple structural matchings among examples (section 7) while such a matching uniquely follows from the unique atom considered in [16].
Another approach of the generalization of constrained clauses is presented by Mizoguchi and Ohwada [12].This work is nicely motivated by geometrical applications (avoiding the collision between objects and obstacles).The region of safe moves of an object can be 'naturally' described through a set of linear constraints; the goal consists in automatically acquiring such constraints from examples.
[12] first extend the definition of some typical induction operators (minimal gen eralization, absorption, lgg) to constrained clauses.Then, an ad hoc domain theory being given, examples are described by constrained atoms which are gen eralized through absorption and lgg, in the line of [15].
In what regards the roles respectively devoted to ILP and CLP, the essential differences can be summarized as follows: the induction of constrained clauses is done (a) by incorporating the structure of constraints into ILP, in (16]; (b) by extending the inverse resolution approach to CLP in [12]; and by interleaving ILP and CLP in our approach.

Reformulation
A strong motivation for reformulating ILP problems into simpler problems, e.g. in propositional form, is that propositional learners are good at dealing with numbers [8,2,26].LINUS [8] achieves such transformation under several as sumptions, which altogether ensure that one first-order example is transformed into one attribute-value example; this transformation thereby does not address the case of multiple structural matchings among examples.LINUS nicely uses the theory of the domain in order to introduce new variables and enrich the attribute-value representation of the examples.
Another approach is that of Zucker and Ganascia [26,27], that focuses on restricting the set of predicates and substitutions relevant to a given level of in duction.Simply put, moriological reformulations rely on a hierarchical descrip tion of the problem domain, where a morion of a given level can be decomposed into one or several morions of a lower level (e.g. the car morion involves the description of four tire morions).One may then restrict oneself to consider pat tern matchings among examples, that preserve the structure (front tires, back tires).Such restrictions allow to drastically decrease the complexity of induction (which could benefit to ICP too); but the machine learning of such restrictions is still an open problem [26] .
Note that [8] and [26] both map an induction problem into another simpler induction problem.In opposition, the mapping presented here enables a shift of paradigm: an induction problem is transformed into a constraint program, which can in turn be solved by an external tool.

Perspectives
This work opens several perspectives of research: New variables (as in [81} and new types of constraints could be considered.Ideally, language bias would be expressed via additional constraints (for instance, requiring the solution clauses to be connected could be expressed via additional constraints).
Also, the user could supply some optimality function in order to guide the selection of the admissible solutions.Selective discriminant induction could then be reformulated as a constrained optimization problem (finding the optimum of the objective function still satisfying the constraints).
But many promising tracks are opened by current experimental validations of this scheme [22].

2
Constraint Logic Programmin g

Definition 2 .
A constrained clause is a clause of the form: H f-B1 A ... A Bm A c1 A ... A Cn where H, B1, ... , Bm are atoms and c1, ... , Cn are constraints.In the following, c1 A .. .A Cn is referred to as the constraint part of the constrained clause, and H f-B1 A ... A Bm as to the definite part of the constrained clause.

Definition 4 .
A constraint c is consistent (or satisfi able) if there exists at least one instantiation of variables of c in V such that c is true, noted V I= (3)c.
elementary step of Disjunctive Version Space actually consists of con structing the set D(E, F) of hypotheses covering E and discriminating some other training example F: if Fi, F 2 , .. ., Fn denote the training examples not belonging to the same target concept as E, termed counter-examples to E, then by construction Th(E) = D(E, F 1 ) /\ ... /\ D(E, Fn)

Definition 6 .
Let G1 and G 1 11 be constrained clauses; G1 generalizes G'1', noted G' 1' �h G11 if there exists a substitution er on G such that Ger is included in G', and 1' er entails I: G 1 1 1 �h G1 iff there exists a /Ger � G' and 1 1 a � c I It follows from Definition 6, that any constrained clause G1 in the search space lh, generalizes E (er being set to the identity substitution on C): implies Positive examples are represented as constrained clauses concluding to the predicate to be learned tc.
carbon [O, 24) --carbon (25,oo) p cr 1 --hydrogen 18 --carbon [0,21] pa2 --carbon [O, 21) -hydrogen 18 pa3 --hydrogen 18 -hydrogen 18 pa4 --carbon [0,21] --carbon (0,21] As an example, let us consider binary equality or inequality constraints X = Y or X f-Y.One associates to any pair of variables X and Y having same domain of instantiation, the relational variable (X=Y) , interpreted for any substitution a of C as: (X=Y).cr= true if X.a = Y.a, (X=Y).a=false if X.a and Y.cr are distinct constants, and (X=Y).cr is not bound otherwise.Equality constraint (X = Y) (respectively inequality constraint (X f-Y)) is equivalent to domain constraints on relational variable (X=Y) given as ( (X=Y) true) (resp.((X=Y) =false)).Binary arithmetic constraint can similarly be built as domain constraints on rela tional numerical variables: let (X-Y) be the constrained variable interpreted as the difference of numerical variables X and Y, the domain constraint ( ( x-Y) E [a, b]) is equivalent to the binary constraint on X and Y : (Y +a� X $ Y + b) ).

Proposition 4 .
Let var* ( C) be the set of initial and relational variables in C, let u be a negative substitution in E, and let 1,,.now denote the disjunction of constraints (X Edam,,.. (X)) for X ranging over the u-discriminant variables in var*(C).Let I be a conjunction of domain constraints on variables in var*(C) that is entailed by fJ.Then, 'Y is inconsistent with pu iff Constraint ':/<T hence is the upper-bound on the set of constraints on C that are entailed by fJ and are inconsistent with pa.

5. 1 1 )
Characterizing Th(E) Let all notations be as in the previous section, and let G'Y be a constrained clause in the hypothesis language.By recollecting results in sections 4.1 and 4.3, G'Y discriminates F iff either G is subsumed by a clause in Gpred(F) or 'Y entails "IF: Proposition 6.Let D(E, F) be the set of constrained clauses that generalize E and discriminate F, and let G"f be a constrained clause generalizing E. Then G'Y belongs to D(E, F) if and only if or ( "/ -<c "/ F ) (And the set Th(E) of consistent constrained clauses covering E can be char acterized from the set of constrained clauses covering E and discriminating F, for F ranging over the counter-examples Fi, ... Fn to E (i.e. the training examples concluding to the concept opposite to that of E); by construction, Th(E) = D(E,F1) A ... A D(E,Fn)

Table 2 :
Domains of computation and binary constraints

Table 3 :
Tabular representation of domain constrai nts And the (disjunctive) constraint/ u 1 entailed by 8 and maximally general such that it is inconsistent with pa1 is given as (with [W E (21, oo)] written [W > 21]

Table 4 :
domain constraints and binary constraints