Implementing Open Call-by-Value

. The theory of the call-by-value λ -calculus relies on weak evaluation and closed terms, that are natural hypotheses in the study of programming languages. To model proof assistants, however, strong evaluation and open terms are required. Open call-by-value is the intermediate setting of weak evaluation with open terms, on top of which Gr´egoire and Leroy designed the abstract machine of Coq. This paper provides a theory of abstract machines for open call-by-value. The literature contains machines that are either simple but ineﬃcient, as they have an exponential overhead, or eﬃcient but heavy, as they rely on a labelling of environments and a technical optimization. We introduce a machine that is simple and eﬃcient: it does not use labels and it implements open call-by-value within a bilinear overhead. Moreover, we provide a new ﬁne understanding of how diﬀerent optimizations impact on the complexity of the overhead.


Introduction
The λ-calculus is the computational model behind functional programming languages and proof assistants.A charming feature is that its definition is based on just one macro-step computational rule, β-reduction, and does not rest on any notion of machine or automaton.Compilers and proof assistants however are concrete tools that have to implement the λ-calculus in some way-a problem clearly arises.There is a huge gap between the abstract mathematical setting of the calculus and the technical intricacies of an actual implementation.This is why the issue is studied via intermediate abstract machines, that are implementation schemes with micro-step operations and without too many concrete details.
Closed and Strong λ-Calculus.Functional programming languages are based on a simplified form of λ-calculus, that we like to call closed λ-calculus, with two important restrictions.First, evaluation is weak, i.e. it does not evaluate function bodies.Second, terms are closed, that is, they have no free variables.The theory of the closed λ-calculus is much simpler than the general one.
Proof assistants based on the λ-calculus usually require the power of the full theory.Evaluation is then strong, i.e. unrestricted, and the distinction between open and closed terms no longer makes sense, because evaluation has to deal with the issues of open terms even if terms are closed, when it enters function bodies.We refer to this setting as the strong λ-calculus.
Historically, the study of strong and closed λ-calculi have followed orthogonal approaches.Theoretical studies rather dealt with the strong λ-calculus, and it is only since the seminal work of Abramsky and Ong [1] that theoreticians started to take the closed case seriously.Dually, practical studies mostly ignored strong evaluation, with the notable exception of Crégut [13] (1990) and some very recent works [19,6,3].Strong evaluation is nonetheless essential in the implementation of proof assistants or higher-order logic programming, typically for type-checking with dependent types as in the Edinburgh Logical Framework or the Calculus of Constructions, as well as for unification in simply typed frameworks like λ-prolog.
Open Call-by-Value.In a very recent work [8], we advocated the relevance of the open λ-calculus, a framework in between the closed and the strong ones, where evaluation is weak but terms may be open.Its key property is that the strong case can be described as the iteration of the open one into function bodies.The same cannot be done with the closed λ-calculus because-as already pointed out-entering into function bodies requires to deal with (locally) open terms.
The open λ-calculus did not emerge before because most theoretical studies focus on the call-by-name strong λ-calculus, and in call-by-name the distinction open/closed does not play an important role.Such a distinction, instead, is delicate for call-by-value evaluation, where Plotkin's original operational semantics [22] is not adequate for open terms.This issue is discussed at length in [8], where four extensions of Plotkin's semantics to open terms are compared and shown to be equivalent.That paper then introduces the expression Open Call-by-Value (shortened Open CbV ) to refer to them as a whole, as well as Closed CbV and Strong CbV to concisely refer to the closed and strong call-by-value λ-calculus.
The Fireball Calculus.The simplest presentation of Open CbV is the fireball calculus λ fire , obtained from the CbV λ-calculus by generalizing values into fireballs.Dynamically, β-redexes are allowed to fire only when the argument is a fireball (fireball is a pun on fire-able).The fireball calculus was introduced without a name by Paolini and Ronchi Della Rocca [21,23], then rediscovered independently first by Leroy and Grégoire [20], and then by Accattoli and Sacerdoti Coen [2].Notably, on closed terms, λ fire coincides with Plotkin's (Closed) CbV λ-calculus.
Coq by Levels.In [20] (2002) Leroy and Grégoire used the fireball calculus to improve the implementation of the Coq proof assistant.In fact, Coq rests on Strong CbV, but Leroy and Grégoire design an abstract machine for the fireball calculus (i.e.Open CbV) and then use it to evaluate Strong CbV by levels: the machine is first executed at top level (that is, out of all abstractions), and then re-launched recursively under abstractions.Their study is itself formalized in Coq, but it lacks an estimation of the efficiency of the machine.
In order to continue our story some basic facts about cost models and abstract machines have to be recalled (see [4] for a gentle tutorial).
Interlude 1: Size Explosion.It is well-known that λ-calculi suffer from a degeneracy called size explosion: there are families of terms whose size is linear in n, that evaluate in n β-steps, and whose result has size exponential in n.The problem is that the number of β-steps, the natural candidate as a time cost model, then seems not to be a reasonable cost model, because it does not even account for the time to write down the result of a computation-the macro-step character of β-reduction seems to forbid to count 1 for each step.This is a problem that affects all λ-calculi and all evaluation strategies.
Interlude 2: Reasonable Cost Models and Abstract Machines.Despite size explosion, surprisingly, the number of β-steps often is a reasonable cost model-so one can indeed count 1 for each β-step.There are no paradoxes: λ-calculi can be simulated in alternative formalisms employing some form of sharing, such as abstract machines.These settings manage compact representations of terms via micro-step operations and produce compact representations of the result, avoiding size explosion.Showing that a certain λ-calculus is reasonable usually is done by simulating it with a reasonable abstract machine, i.e. a machine implementable with overhead polynomial in the number of β-steps in the calculus.The design of a reasonable abstract machine depends very much on the kind of λ-calculus to be implemented, as different calculi admit different forms of size explosion and/or require more sophisticated forms of sharing.For strategies in the closed λ-calculus it is enough to use the ordinary technology for abstract machines, as first shown by Blelloch and Greiner [12], and then by Sands, Gustavsson, and Moran [24], and, with other techniques, by combining the results in Dal Lago and Martini's [15] and [14].The case of the strong λ-calculus is subtler, and a more sophisticated form of sharing is necessary, as first shown by Accattoli and Dal Lago [7].The topic of this paper is the study of reasonable machines for the intermediate case of Open CbV.
Fireballs are Reasonable.In [2] Accattoli and Sacerdoti Coen study Open CbV from the point of view of cost models.Their work provides 3 contributions: 1. Open Size Explosion: they show that Open CbV is subtler than Closed CbV by exhibiting a form of size explosions that is not possible in Closed CbV, making Open CbV closer to Strong CbV rather than to Closed CbV; 2. Fireballs are Reasonable: they show that the number of β-steps in the fireball calculus is nonetheless a reasonable cost model by exhibiting a reasonable abstract machine, called GLAMOUr, improving over Leroy and Grégoire's machine in [20] (see the conclusions for more on their machine); 3.And Even Efficient: they optimize the GLAMOUr into the Unchaining GLA-MOUr, whose overhead is bilinear (i.e.linear in the number of β-steps and the size of the initial term), that is the best possible overhead.
This Paper.Here we present two machines, the Easy GLAMOUr and the Fast GLAMOUr, that are proved to be correct implementations of Open CbV and to have a polynomial and bilinear overhead, respectively.Their study refines the results of [2] along three axes: 1. Simpler Machines: both the GLAMOUr and the Unchaining GLAMOUr of [2] are sophisticated machines resting on a labeling of terms.The unchaining optimizations of the second machine is also quite heavy.Both the Easy GLA-MOUr and the Fast GLAMOUr, instead, do not need labels and the Fast GLAMOUr is bilinear with no need of the unchaining optimization.2. Simpler Analyses: the correctness and complexity analyses of the (Unchaining) GLAMOUr are developed in [2] via an informative but complex decomposition via explicit substitutions, by means of the distillation methodology [5].Here, instead, we decode the Easy and Fast GLAMOUr directly to the fireball calculus, that turns out to be much simpler.Moreover, the complexity analysis of the Fast GLAMOUr, surprisingly, turns out to be straightforward.3. Modular Decomposition of the Overhead : we provide a fine analysis of how different optimizations impact on the complexity of the overhead of abstract machines for Open CbV.In particular, it turns out that one of the optimizations considered essential in [2], namely substituting abstractions on-demand, is not mandatory for reasonable machines-the Easy GLAMOUr does not implement it and yet it is reasonable.We show, however, that this is true only as long as one stays inside Open CbV because the optimization is instead mandatory for Strong CbV (seen by Grégoire and Leroy as Open CbV by levels).To our knowledge substituting abstractions on-demand is an optimization introduced in [7] and currently no proof assistant implements it.Said differently, our work shows that the technology currently in use in proof assistants is, at least theoretically, unreasonable.
Summing up, this paper does not improve the known bound on the overhead of abstract machines for Open CbV, as the one obtained in [2] is already optimal.Its contributions instead are a simplification and a finer understanding of the subtleties of implementing Open CbV: we introduce simpler abstract machines whose complexity analyses are elementary and carry a new modular view of how different optimizations impact on the complexity of the overhead.
In particular, while [2] shows that Open CbV is subtler than Closed CbV, here we show that Open CbV is simpler than Strong CbV, and that defining Strong CbV as iterated Open CbV, as done by Grégoire and Leroy in [20], may introduce an explosion of the overhead, if done naively.
A longer version of this paper is available on Arxiv [9].It contains two Appendices, one with a glossary of rewriting theory and one with omitted proofs.

The Fireball Calculus λ fire & Open Size Explosion
In this section we introduce the fireball calculus, the presentation of Open CbV we work with in this paper, and show the example of size explosion peculiar to the open setting.Alternative presentations of Open CbV can be found in [8].
The Fireball Calculus.The fireball calculus λ fire is defined in Fig. 1.The idea is that the values of the call-by-value λ-calculus, given by abstractions and Terms t, u, s, r Fig. 1.The Fireball Calculus λ fire variables, are generalized to fireballs, by extending variables to more general inert terms.Actually fireballs and inert terms are defined by mutual induction (in Fig. 1).For instance, λx.y is a fireball as an abstraction, while x, y(λx.x),xy, and (z(λx.x))(zz)(λy.(zy))are fireballs as inert terms.
The main feature of inert terms is that they are open, normal, and that when plugged in a context they cannot create a redex, hence the name (they are not so-called neutral terms because they might have β-redexes under abstractions).In Grégoire and Leroy's presentation [20], inert terms are called accumulators and fireballs are simply called values.
Terms are always identified up to α-equivalence and the set of free variables of a term t is denoted by fv(t).We use t{x u} for the term obtained by the capture-avoiding substitution of u for each free occurrence of x in t.
Evaluation is given by call-by-fireball β-reduction → β f : the β-rule can fire, lighting up the argument, only when it is a fireball (fireball is a catchier version of fire-able term).We actually distinguish two sub-rules: one that lights up abstractions, noted → β λ , and one that lights up inert terms, noted → βi (see Fig. 1).Note that evaluation is weak (i.e. it does not reduce under abstractions).
Properties of the Calculus.A famous key property of Closed CbV (whose evaluation is exactly → β λ ) is harmony: given a closed term t, either it diverges or it evaluates to an abstraction, i.e. t is β λ -normal iff t is an abstraction.The fireball calculus satisfies an analogous property in the open setting by replacing abstractions with fireballs (Prop.1.1).Moreover, the fireball calculus is a conservative extension of Closed CbV: on closed terms it collapses on Closed CbV (Prop.1.2).No other presentation of Open CbV has these properties.
Proposition 1 (Distinctive Properties of λ fire ).Let t be a term.1. Open Harmony: The rewriting rules of λ fire have also many good operational properties, studied in [8] and summarized in the following proposition.
Proposition 2 (Operational Properties of λ fire , [8]).The reduction → β f is strongly confluent, and all β f -normalizing derivations d (if any) from a term t have the same length |d| β f , the same number |d| β λ of β λ -steps, and the same number |d| βi of β i -steps.
Right-to-Left Evaluation.As expected from a calculus, the evaluation rule → β f of λ fire is non-deterministic, because in the case of an application there is no fixed order in the evaluation of the left and right subterms.Abstract machines however implement deterministic strategies.We then fix a deterministic strategy (which fires β f -redexes from right to left and is the one implemented by the machines of the next sections).By Prop.2, the choice of the strategy does not impact on existence of a result, nor on the result itself or on the number of steps to reach it.It does impact however on the design of the machine, which selects β f -redexes from right to left.
The right-to-left evaluation strategy → rβ f is defined by closing the root rules → β λ and → βi in Fig. 1 In Ω there is an infinite sequence of duplications.In the size exploding family there is a sequence of n nested duplications.We define two families, the family {t n } n∈N of size exploding terms and the family {i n } n∈N of results of evaluating {t n } n∈N : We use |t| for the size of a term, i.e. the number of symbols to write it.
Circumventing Open Size Explosion.Abstract machines implementing the substitution of inert terms, such as the one described by Grégoire and Leroy in [20] are unreasonable because for the term t n of the size exploding family they compute the full result i n .The machines of the next sections are reasonable because they avoid the substitution of inert terms, that is justified by the following lemma.
Lemma 6 (Inert Substitutions Can Be Avoided).Let t, u be terms and i be an inert term.Then, t → β f u iff t{x i} → β f u{x i}.
Lemma 6 states that the substitution of an inert term cannot create redexes, which is why it can be avoided.For general terms, only direction ⇒ holds, because substitution can create redexes, as in (xy){x λz.z} = (λz.z)y.Direction ⇐, instead, is distinctive of inert terms, of which it justifies the name.

Preliminaries on Abstract Machines, Implementations, and Complexity Analyses
-An abstract machine M is given by states, noted s, and transitions between them, noted M ; -A state is given by the code under evaluation plus some data-structures; -The code under evaluation, as well as the other pieces of code scattered in the data-structures, are λ-terms not considered modulo α-equivalence; -Codes are over-lined, to stress the different treatment of α-equivalence; -A code t is well-named if x may occur only in u (if at all) for every sub-code λx.u of t; -A state s is initial if its code is well-named and its data-structures are empty; -Therefore, there is a bijection • • (up to α) between terms and initial states, called compilation, sending a term t to the initial state t • on a well-named code α-equivalent to t; -An execution is a (potentially empty) sequence of transitions t • 0 * M s from an initial state obtained by compiling an (initial) term t 0 ; -A state s is reachable if it can be obtained as the end state of an execution; -A state s is final if it is reachable and no transitions apply to s; -A machine comes with a map • from states to terms, called decoding, that on initial states is the inverse (up to α) of compilation, i.e. t • = t for any term t; -A machine M has a set of β-transitions, whose union is noted β , that are meant to be mapped to β-redexes by the decoding, while the remaining overhead transitions, denoted by o , are mapped to equalities; -We use |ρ| for the length of an execution ρ, and |ρ| β for the number of β-transitions in ρ.
Implementations.For every machine one has to prove that it correctly implements the strategy in the λ-calculus it was conceived for.Our notion, tuned towards complexity analyses, requires a perfect match between the number of β-steps of the strategy and the number of β-transitions of the machine execution.

Definition 7 (Machine Implementation).
A machine M implements a strategy → on λ-terms via a decoding • when given a λ-term t the following holds: 1. Executions to Derivations: for any M-execution ρ : t Theorem 9 (Sufficient Condition for Implementations).Let (M, →, •) be an implementation system.Then, M implements → via •.
The proof of Thm. 9 is a clean and abstract generalization of the concrete reasoning already at work in [5,2,3,4].A machine is reasonable if its complexity is polynomial in |t 0 | and |ρ| β , and it is efficient if it is linear in both parameters.So, a strategy is reasonable (resp.efficient) if there is a reasonable (resp.efficient) machine implementing it.In Sect.4-5 we study a reasonable machine implementing right-to-left evaluation → rβ f in λ fire , thus showing that it is a reasonable strategy.In Sect.6 we optimize the machine to make it efficient.By Prop.2, this implies that every strategy in λ fire is efficient.
Recipe for Complexity Analyses.For complexity analyses on a machine M, overhead transitions o are further separated into two classes: 1. Substitution Transitions s : they are in charge of the substitution process; 2. Commutative Transitions c : they are in charge of searching for the next β or substitution redex to reduce.Then, the estimation of the complexity of a machine is done in three steps: 1. Number of Transitions: bounding the length of the execution ρ, by bounding the number of overhead transitions.This part splits into two subparts: i. Substitution vs β: bounding the number |ρ| s of substitution transitions in ρ using the number of β-transitions; ii.Commutative vs Substitution: bounding the number |ρ| c of substitution transitions in ρ using the size of the input and |ρ| s ; the latter-by the previous point-induces a bound with respect to β-transitions.
where (λy.u) α is any well-named code α-equivalent to λy.u such that its bound names are fresh with respect to those in D, π and E1[x λy.u@ ]E2.
Fig. 2. Easy GLAMOUr machine: data-structures (stacks π, dumps D, global env.E, states s), unfolding t ↓E, decoding • (stacks are decoded to contexts in postfix notation for plugging, i.e. we write t π rather than π t ), and transitions.
level, making some (high-level) assumption on how codes and data-structure are concretely represented.Commutative transitions are designed on purpose to have constant cost.Each substitution transition has a cost linear in the size of the initial term thanks to an invariant (to be proved) ensuring that only subterms of the initial term are duplicated and substituted along an execution.Each β-transition has a cost either constant or linear in the input.3. Complexity of the Overhead : obtaining the total bound by composing the first two points, that is, by taking the number of each kind of transition times the cost of implementing it, and summing over all kinds of transitions.
(Linear) Logical Reading.Let us mention that our partitioning of transitions into β, substitution, and commutative ones admits a proof-theoretical view, as machine transitions can be seen as cut-elimination steps [11,5].Commutative transitions correspond to commutative cases, while β and substitution are principal cases.Moreover, in linear logic the β transition corresponds to the multiplicative case while the substitution transition to the exponential one.See [5] for more details.

Easy GLAMOUr
In this section we introduce the Easy GLAMOUr, a simplified version of the GLAMOUr machine from [2]: unlike the latter, the Easy GLAMOUr does not need any labeling of codes to provide a reasonable implementation.
With respect to the literature on abstract machines for CbV, our machines are unusual in two respects.First, and more importantly, they use a single global environment instead of closures and local environments.Global environments are used in a minority of works [17,24,16,5,2,6,3] and induce simpler, more abstract machines where α-equivalence is pushed to the meta-level (in the operation t α in s in Fig. 2-3).This on-the-fly α-renaming is harmless with respect to complexity analyses, see also discussions in [5,4].Second, argument stacks contain pairs of a code and a stack, to implement some of the machine transitions in constant time.
Background.GLAMOUr stands for Useful (i.e.optimized to be reasonable) Open (reducing open terms) Global (using a single global environment) LAM, and LAM stands for Leroy Abstract Machine, an ordinary machine implementing rightto-left Closed CbV, defined in [5].In [2] the study of the GLAMOUr was done according to the distillation approach of [5], i.e. by decoding the machine towards a λ-calculus with explicit substitutions.Here we do not follow the distillation approach, we decode directly to λ fire , which is simpler.
Machine Components.The Easy GLAMOUr is defined in Fig. 2. A machine state s is a quadruple (D, t, π, E) given by: -Code t: a term not considered up to α-equivalence, which is why it is over-lined; -Argument Stack π: it contains the arguments of the current code.Note that stacks items φ are pairs x@π and λx.u@ .These pairs allow to implement some of the transitions in constant time.The pair x@π codes the term x π (defined in Fig. 2-the decoding is explained below) that would be obtained by putting x in the context obtained by decoding the argument stack π.The pair λx.u@ is used to inject abstractions into pairs, so that items φ can be uniformly seen as pairs t@π of a code t and a stack π. -Dump D: a second stack, that together with the argument stack π is used to walk through the code and search for the next redex to reduce.The dump is extended with an entry t♦π every time evaluation enters in the right subterm u of an application tu.The entry saves the left part t of the application and the current stack π, to restore them when the evaluation of the right subterm u is over.The dump D and the stack π decode to an evaluation context.-Global Environment E: a list of explicit (i.e.delayed) substitutions storing substitutions generated by the redexes encountered so far.It is used to implement micro-step evaluation (i.e. the substitution for one variable occurrence at a time).We write E(x) = ⊥ if in E there are no entries of the form [x φ].
Transitions.In the Easy GLAMOUr there is one β-transition whereas overhead transitions are divided up into substitution and commutative transitions.
β-Transition β : it morally fires a → rβ f -redex, the one corresponding to (λx.t)φ, except that it puts a new delayed substitution [x φ] in the environment instead of doing the meta-level substitution t{x φ} of the argument in the body of the abstraction; -Substitution Transition s : it substitutes the variable occurrence under evaluation with a (properly α-renamed copy of a) code from the environment.It is a micro-step variant of meta-level substitution.It is invisible on λ fire because the decoding produces the term obtained by meta-level substitution, and so the micro work done by s cannot be observed at the coarser granularity of λ fire .
-Commutative Transitions c : they locate and expose the next redex according to the right-to-left strategy, by rearranging the data-structures.They are invisible on the calculus.The commutative rule c1 forces evaluation to be right-to-left on applications: the machine processes first the right subterm u, saving the left sub-term t on the dump together with its current stack π.The role of c2 and c3 is to backtrack to the entry on top of the dump.When the right subterm, i.e. the pair t@π of current code and stack, is finally in normal form, it is pushed on the stack and the machine backtracks.O for Open: note condition E(x) = ⊥ in c3 -that is how the Easy GLAMOUr handles open terms.U for Useful : note condition E(x) = y@π in c3 -inert terms are never substituted, according to Lemma 6. Removing the useful sidecondition one recovers Grégoire and Leroy's machine [20].Note that terms substituted by s are always abstractions and never variables-this fact will play a role in Sect.6. Garbage Collection: it is here simply ignored, or, more precisely, it is encapsulated at the meta-level, in the decoding function.It is well-known that this is harmless for the study of time complexity.
Compiling, Decoding and Invariants.A term t is compiled to the machine initial state t • = ( , t, , ), where t is a well-named term α-equivalent to t.Conversely, every machine state s decodes to a term s (see the top right part of Fig. 2), having the shape C s t → E , where t → E is a λ-term, obtained by applying to the code the meta-level substitution → E induced by the global environment E, and C s is an evaluation context, obtained by decoding the stack π and the dump D and then applying → E .Note that, to improve readability, stacks are decoded to contexts in postfix notation for plugging, i.e. we write t π rather than π t because π is a context that puts arguments in front of t.
Example 10.To have a glimpse of how the Easy GLAMOUr works, let us show how it implements the derivation t := (λz.z(yz))λx.x→ 2 rβ f y λx.x of Ex. 4: Note that the initial state is the compilation of the term t, the final state decodes to the term y λx.x, and the two β-transitions in the execution correspond to the two → rβ f -steps in the derivation considered in Ex. 4.
The study of the Easy GLAMOUr machine relies on the following invariants.
Implementation Theorem.The invariants are used to prove the implementation theorem by proving that the hypotheses of Thm. 9 hold, that is, that the Easy GLAMOUr, → rβ f and • form an implementation system.

Complexity Analysis of the Easy GLAMOUr
The analysis of the Easy GLAMOUr is done according to the recipe given at the end of Sect. 3. The result (see Thm. 17 below) is that the Easy GLAMOUr is linear in the number |ρ| β of β-steps/transitions and quadratic in the size |t 0 | of the initial term t 0 , i.e. its overhead has complexity O((1 The analysis relies on a quantitative invariant, the crucial subterm invariant, ensuring that s duplicates only subterms of the initial term, so that the cost of duplications is connected to one of the two parameters for complexity analyses. Lemma 13 (Subterm Invariant).Let ρ : t • 0 * (D, t, π, E) be an Easy GLA-MOUr execution.Every subterm λx.u of D, t, π, or E is a subterm of t 0 .
Intuition About Complexity Bounds.The number |ρ| s of substitution transitions s depends on both parameters for complexity analyses, the number |ρ| β of β-transitions and the size |t 0 | of the initial term.Dependency on |ρ| β is standard, and appears in every machine [12,24,5,2,6,3]-sometimes it is quadratic, here it is linear, in Sect.6 we come back to this point.Dependency on |t 0 | is also always present, but usually only for the cost of a single s transition, since only subterms of t 0 are duplicated, as ensured by the subterm invariant.For the Easy GLAMOUr, instead, also the number of s transitions depends-linearly-on |t 0 |: this is a side-effect of dealing with open terms.Since both the cost and the number of s transitions depend on |t 0 |, the dependency is quadratic.
The following family of terms shows the dependency on |t 0 | in isolation (i.e., with no dependency on |ρ| β ).Let r n := λx.(. . .((y x)x) . ..)x n and consider: Forgetting about commutative transitions, the Easy GLAMOUr would evaluate u n with one β-transition β and n substitution transitions s , each one duplicating r n , whose size (as well as the size of the initial term u n ) is linear in n.
The number |ρ| c of commutative transitions c , roughly, is linear in the amount of code involved in the evaluation process.This amount is given by the initial code plus the code produced by duplications, that is bounded by the number of substitution transitions times the size of the initial term.The number of commutative transitions is then O((1 + |ρ| β ) • |t 0 | 2 ).Since each one has constant cost, this is also a bound to their cost.
Cost of Single Transitions.We need to make some hypotheses on how the Easy GLAMOUr is going to be itself implemented on RAM: 1. Variable (Occurrences) and Environment Entries: a variable is a memory location, a variable occurrence is a reference to it, and an environment entry [x φ] is the fact that the location associated to x contains φ. 2. Random Access to Global Environments: the environment E can be accessed in O(1) (in s ) by just following the reference given by the variable occurrence under evaluation, with no need to access E sequentially, thus ignoring its list structure (used only to ease the definition of the decoding).
With these hypotheses it is clear that β and overhead transitions can be implemented in O(1).The substitution transition s needs to copy a code from the environment (the renaming t α ) and can be implemented in O(|t 0 |), as the subterm to copy is a subterm of t 0 by the subterm invariant (Lemma 13) and the environment can be accessed in O(1).
Summing Up.By putting together the bounds on the number of transitions with the cost of single transitions we obtain the overhead of the machine.
Theorem 17 (Easy GLAMOUr Overhead Bound).Let ρ : t • 0 * s be an Easy GLAMOUr execution.Then ρ is implementable on RAM in O((1 + |ρ| β ) • |t 0 | 2 ), i.e. linear in the number of β-transitions (aka the length of the derivation d : t 0 → * rβ f s implemented by ρ) and quadratic in the size of the initial term t 0 .

Fast GLAMOUr
In this section we optimize the Easy GLAMOUr, obtaining a machine, the Fast GLAMOUr, whose dependency from the size of the initial term is linear, instead of quadratic, providing a bilinear-thus optimal-overhead (see Thm. 21 below and compare it with Thm. 17 on the Easy GLAMOUr).We invite the reader to go back to equation (1) at page 12, where the quadratic dependency was explained.Note that in that example the substitutions of r n do not create β f -redexes, and so they are useless.The Fast GLAMOUr avoids these useless substitutions and it implements the example with no substitutions at all.
Optimization: Abstractions On-Demand.The difference between the Easy GLA-MOUr and the machines in [2] is that, whenever the former encounters a variable occurrence x bound to an abstraction λy.t in the environment, it replaces x with λy.t, while the latter are more parsimonious.They implement an optimization that we call substituting abstractions on-demand : x is replaced by λy.tonly if this is useful to obtain a β-redex, that is, only if the argument stack is non-empty.The Fast GLAMOUr, defined in Fig. 3, upgrades the Easy GLAMOUr with substitutions of abstractions on-demand -note the new side-condition for c3 and the non-empty stack in s .The Slow GLAMOUr has been omitted for lack of space, because it is slow and involved, as it requires the labeling mechanism of the (Unchaining) GLAMOUr developed in [2].It is somewhat surprising that the Fast GLAMOUr presented here has the best overhead and it is also the easiest to analyze.
Abstractions On-Demand: Open CbV is simpler than Strong CbV.We explained that Grégoire and Leroy's machine for Coq as described in [20] is unreasonable.Its actual implementation, on the contrary, does not substitute non-variable inert terms, so it is reasonable for Open CbV.None of the versions, however, substitutes abstractions on-demand (nor, to our knowledge, does any other implementation), despite the fact that it is a necessary optimization in order to have a reasonable implementation of Strong CbV, as we now show.Consider the following size exploding family (obtained by applying s n to the identity I := λx.x), from [4]: The evaluation of s n I produces 2 n non-applied copies of I (in r n ), so a strong evaluator not substituting abstractions on-demand must have an exponential overhead.Note that evaluation is weak but the 2 n copies of I are substituted under abstraction: this is why machines for Closed and Open CbV can be reasonable without substituting abstractions on-demand.
The Danger of Iterating Open CbV Naively.The size exploding example in Prop.22 also shows that iterating reasonable machines for Open CbV is subtle, as it may induce unreasonable machines for Strong CbV, if done naively.Evaluating Strong CbV by iterating the Easy GLAMOUr (that does not substitute abstractions on-demand), indeed, induces an exponential overhead, while iterating the Fast GLAMOUr provides an efficient implementation.
Parameters for Complexity Analyses.By the derivations-to-executions part of the implementation (Point 2 in Def.7), given a derivation d : t 0 → n u there is a shortest execution ρ : t • 0 * M s such that s = u.Determining the complexity of a machine M amounts to bound the complexity of a concrete implementation of ρ on a RAM model, as a function of two fundamental parameters: 1. Input: the size |t 0 | of the initial term t 0 of the derivation d; 2. β-Steps/Transitions: the length n = |d| of the derivation d, that coincides with the number |ρ| β of β-transitions in ρ by the β-matching requirement for implementations (Point 3 in Def. 7).
by right contexts, a special kind of evaluation contexts defined by R ::= • | tR | Rf .The next lemma ensures our definition is correct.Lemma 3 (Properties of → rβ f ).Let t be a term.1. Completeness: t has → β f -redex iff t has a → rβ f -redex.2. Determinism: t has at most one → rβ f -redex.Example 4. Let t := (λz.z(yz))λx.x.Then, t → rβ f (λx.x)(y λx.x) → rβ f y λx.x,where the final term y λx.x is a fireball (and β f -normal).Size Explosion.Fireballs are delicate, they easily explode.The simplest instance of open size explosion (not existing in Closed CbV) is a variation over the famous looping term Ω Derivations to Executions: for every →-derivation d : t → * u there exists a M-execution ρ : t • * M s such that s = u.3. β-Matching: in both previous points the number |ρ| β of β-transitions in ρ is exactly the length |d| of the derivation d, i.e. |d| = |ρ| β .Sufficient Condition for Implementations.The proofs of implementation theorems tend to follow always the same structure, based on a few abstract properties collected here into the notion of implementation system.Definition 8 (Implementation System).A machine M, a strategy →, and a decoding • form an implementation system if the following conditions hold: 1. β-Projection: s β s implies s → s ; 2. Overhead Transparency: s o s implies s = s ; 3. Overhead Transitions Terminate: o terminates; 4. Determinism: both M and → are deterministic; 5. Progress: M final states decode to →-normal terms.