Varieties of Cost Functions

Regular cost functions were introduced as a quantitative generalisation of regular languages, retaining many of their equivalent characterisations and decidability properties. For instance, stabilisation monoids play the same role for cost functions as monoids do for regular languages. The purpose of this article is to further extend this algebraic approach by generalising two results on regular languages to cost functions: Eilenberg’s varieties theorem and proﬁnite equational characterisations of lattices of regular languages. This opens interesting new perspectives, but the speciﬁcities of cost functions introduce diﬃculties that prevent these generalisations to be straightforward. In contrast, although syntactic algebras can be deﬁned for formal power series over a commutative ring, no such notion is known for series over semirings and in particular over the tropical semiring.


Introduction
Quantitative extensions of regular languages have been studied for over 50 years. Most of them rely on the early work of Schützenberger [25,26,27], who extended Kleene's theorem to formal power series over a semiring. A very nice presentation of this theory can be found in the book of Berstel and Reutenauer [5]. In this setting, weighted automata play the role of automata and weighted logic was introduced as an attempt to generalise Büchi's characterisation of regular languages in monadic second order logic. See the handbook [12] for an overview and further references. However, this theory also suffers some weaknesses. For instance, the equality problem for rational series with multiplicities in the tropical semiring is undecidable [15], a major difference with the equality problem for regular languages, which is decidable. To overcome this problem and other related questions, Colcombet introduced the notion of regular cost functions [9], an other quantitative generalisation of regular languages. Cost functions are formally defined as equivalence classes of power series with coefficients in the semiring N ∪ {∞}. This equivalence does not retain the exact values of the coefficients of the series but measures boundedness in some precise way. Thus cost functions are less general than power series, but are still more general than languages, which can be viewed as cost functions associated with their characteristic functions.

Related work
Toruńczyk [30] also established a link between cost functions and profinite words, using a different approach. More precisely, Toruńczyk identifies a regular cost function with the set of profinite words that are limits of infinite sequences of words over which the function is bounded.
It is also interesting to compare these results to similar results on formal power series. Syntactic algebras of formal power series over a commutative ring were introduced by Reutenauer [22,23], but no such notion is known for semirings. Reutenauer also extended Eilenberg's varieties theorem to power series over a commutative field. However, as shown in [24], equational theory only works for power series over finite fields.
Finally, let us mention two new promising approaches to recognisability, using respectively categories [1,2] and monads [7,8]. For the time being, these two approaches do not seem to apply to cost functions, but we hope our paper will serve as a test bench for future developments of this new point of view.

Regular Cost Functions and Stabilisation Monoids
In this section, we introduce the notions of cost functions and of stabilisation monoids. For a more complete and detailed presentation, the reader is referred to [10]. Let A be a finite alphabet and let F be the set of all functions from A * to N ∪ {∞}. Colcombet [9] introduced the following equivalence relation on F: two elements f and g of F are equivalent (denoted f ≈ g) if, for each subset S of A * , f is bounded on S if and only if g is bounded on S. A cost function is a ≈-class. In practice, cost functions are always represented by one of their representatives in F.

30:3
The equivalence relation ≈ behaves well with respect to the operations min and max, defined in the usual way. Indeed for all f, g, h ∈ F, if f ≈ g, then min(f, h) ≈ min (g, h) and max(f, h) ≈ max(g, h) [9]. It follows that the minimum and the maximum of two cost functions are well-defined notions.
Given a word u, let |u| denote the length of u and |u| a the number of occurrences of the letter a in u. Let us define three functions f , g and h from A * to N ∪ {∞} by setting f (u) = |u|, g(u) = |u| a and h(u) = 2|u| a . Then g is equivalent to h and they represent the same cost function, whereas g is not equivalent to f . Indeed g is bounded on b * and f is not since for all n, g(b n ) = 0 and f (b n ) = n.
The characteristic function of a language L on A * is the function χ L : A * → N ∪ {∞} defined by χ L (u) = 0 if u ∈ L and ∞ otherwise. The crucial observation that χ L ≈ χ L if and only if L = L allows one to identify a language with the cost function defined by its characteristic function.
Stabilisation monoids were introduced in [9] in order to extend the classical notion of monoids recognising a language to the setting of cost functions. Recall that an ordered monoid is a set equipped with an associative binary product, a neutral element and an order compatible with the product, i.e., the conditions x 1 x 2 and y 1 y 2 imply x 1 y 1 x 2 y 2 . We let E(M ) denote the set of idempotents of a monoid M .
Following [9], we define a stabilisation monoid as an ordered monoid M together with a stabilisation operator : E(M ) → E(M ) satisfying the following properties: Given two stabilisation monoids M and N , a morphism ϕ from M to N is a monoid morphism which is order-preserving and -preserving: if e ∈ E(M ), then ϕ(e) = ϕ(e ).
Just like finite (ordered) monoids recognise regular languages, finite stabilisation monoids recognise regular cost functions. However, the formal definition of recognition is more involved for cost functions than for languages and relies on the notion of factorisation trees. Let M be a stabilisation monoid and let h : A → M be a function, called the labelling map. Definition 2.2. Let w = a 1 a 2 · · · a k be a word of A * where each a i is a letter. An hfactorisation tree of threshold n for w is a finite tree labelled by the elements of M and such that: (T 1 ) the tree has exactly k leaves, labelled by h(a 1 ), . . . , h(a k ), respectively, (T 2 ) each binary node is labelled by the product of its left child's label by its right child's label, if a node has arity > 2, then all its children are labelled by the same idempotent e. If the arity of the node is n, then the node is labelled by e, otherwise it is labelled by e .  [9,29] guarantees the existence of trees of bounded height to evaluate input words. More precisely, for each labelling function 30:4

Varieties of Cost Functions
h : A → M , there is a positive integer K (= 3|M |) such that for all words w and for all integers n 3, there is an h-factorisation tree of threshold n for w with height at most K.
We can now give the formal definition of a regular cost function recognised by a finite stabilisation monoid. Recall that a subset D of a partially ordered set is a downset if the conditions t ∈ D and s t imply s ∈ D. Then S 1 is a stabilisation monoid that can make a distinction between products with no s (that are 1), products containing "few" s (that are s) and products containing "a lot of" s (that are 0 = s ). The cost function recognised by (S 1 , h, I) is the equivalence class of the function u → |u| a .
For instance the tree from example 2.3 of height 4 and threshold 5 has root labelled by s = 0 because it is a witness that there are more than 5 occurrences of a in the input word. Conversely, such a factorisation tree of threshold n and height k would have its root labelled by 1 if the input word contains no a, and labelled by s if there are at most n k occurrences of a. Because k is a fixed constant, these trees can be used to recognise the cost function u → |u| a , since it is equivalent to u → (|u| a ) k .
Regular cost functions can also be recognised by generalised forms of nondeterministic finite automata, regular expressions or monadic second-order logical formulas. See [11] for a complete introduction. Moreover, every regular cost function f has a unique syntactic stabilisation monoid M , in the sense that:

Stabilisation Algebras
The goal of the present work is to study algebraic properties of stabilisation monoids and cost functions. In particular, we would like to define regular cost functions as particular subsets of a free stabilisation monoid. However, since in a stabilisation monoid, the -operator is only defined on idempotents, the notion of a free stabilisation monoid cannot be defined directly and requires the introduction of a new algebraic structure, in which idempotents are directly defined in the signature of the algebra: stabilisation algebras. Given a countable set of variables X, let T (X) be the free term algebra of signature {·, ω, , 1} over X. An identity over T (X) is an equation of the form s t, where s and t are terms of T (X).
A finite stabilisation monoid M satisfies the identity s t if the equation holds for any instantiation of the variables by elements of M , where 1 is interpreted as the neutral element of M , ω is interpreted as the idempotent power in M , and is replaced with ω (to guarantee that is only applied to idempotents). A finite stabilisation monoid M satisfies the identity s = t if it satisfies the identities s t and t s.
We can now define the structure of a stabilisation algebra in the following way.

Definition 3.1.
A stabilisation algebra is an ordered algebra M with signature 1, , ·, ω, satisfying the following axioms: (A 1 ) all identities that are satisfied by all finite stabilisation monoids, (A 2 ) a description of the behaviour of ω on idempotent elements: the three properties expressing that the order is compatible with the operations ·, ω, : x 1 x 2 and y 1 y 2 imply x 1 y 1 x 2 y 2 , and x y implies x ω y ω and x y .
In particular, (A 1 ) implies that a stabilisation algebra is a monoid with neutral element 1. A morphism between two stabilisation algebras is a monoid morphism which is order-preserving, ω-preserving and -preserving. Let M and N be two stabilisation algebras. Then Recall that in a finite monoid, every element x has a unique idempotent power, denoted x ω . This fact allows one to identify finite stabilisation monoids and finite stabilisation algebras.  This follows from a general result on ordered algebras mentioned without proof in [6].
A recent result [18] states that the equivalence of two -free terms of T (A) is decidable. Actually, the result is more general and also covers the case of ω − 1 powers. However, deciding the equivalence of arbitrary terms in T (A) seems to still be an open problem.
The following theorem shows that F (A) is a free object, by making explicit the corresponding universal property.

Recognisability
We now define the notion of recognisable downsets in the free stabilisation algebra. We will later see how a regular cost function can be identified with a recognisable downset. This will allow us to generalise the classical notions of syntactic congruence and syntactic monoid. We identify terms t ∈ T (A) and their class t ∈ F (A) for more readability.
Let I be a downset of F (A), let M be a stabilisation algebra and let h : is said to be recognisable if it is recognised by some morphism onto a finite stabilisation algebra.

Syntactic congruence and syntactic stabilisation algebra
In other words, a context is a term T (A) with possible occurrences of the free variable x. Given a context C on A and Given a downset I of F (A) and two elements t and s of F (A), we write that s ∼ I t if for The analog of this property in the framework of stabilisation monoids is given in [16].

Regular Cost Functions Versus Recognisable Downsets
We have seen that regular cost functions are recognised by finite stabilisation monoids and that recognisable downsets are recognised by finite stabilisation algebras. Now, Proposition 3.2 shows that finite stabilisation algebras correspond exactly to finite stabilisation monoids. These results indicate that regular cost functions and recognisable downsets are closely related.
One can make this relation a bijection as follows. Let f be a regular cost function and let M be its syntactic stabilisation monoid. Let also (h, I) be the unique pair (where h : A → M is a labelling function and I is a downset of M ) such that f is recognised by (M, h, I).

Varieties
We now generalise the notion of varieties of regular languages and some proofs from [13,19]. Varieties of downsets generalise positive varieties of languages [19], as there is no complementation for downsets. Example 5.1. A recognisable downset I is aperiodic if for all t ∈ F (A), the relation t ω ∼ I t ω t holds. It is not too difficult to show that aperiodic downsets form a variety of recognisable downsets.
We now define varieties of stabilisation algebras.

Definition 5.2. A variety of finite stabilisation algebras is a class of finite stabilisation algebras closed under taking stabilisation subalgebras, quotients and finite products.
Notice that this notion is often called pseudovarieties in the literature, as opposed to Birkhoff varieties which are also closed under arbitrary products.

. Let V be a variety generated by a set S of finite stabilisation algebras, and M be a finite stabilisation algebra. Then M ∈ V if and only if M divides a finite product of elements of S.
Given a variety V of finite stabilisation algebras, let V(A) denote the set of recognisable downsets over A whose syntactic stabilisation algebra belongs to V. The correspondence V → V associates with each variety of finite stabilisation algebras a class of recognisable downsets.
Thus, each variety of recognisable downsets V is associated to the variety of finite stabilisation algebras V generated by the syntactic stabilisation algebras of downsets in V. This defines a correspondence V → V. The analog of the ordered version of Eilenberg's theorem can now be stated as follows: Theorem 5.5. The correspondences V → V and V → V define mutually inverse bijective correspondences between varieties of finite stabilisation algebras and varieties of recognisable downsets.

Profinite Stabilisation Algebra
The free profinite monoid on A, denoted A * , can be defined as the completion of A * for the profinite metric. See [4,20,21] for more information on this space. We now prove the existence of free profinite stabilisation algebras. Taking the construction of free profinite monoids as a model, we define it as the completion F (A) of F (A) for an appropriate metric. (1) d is an ultrametric distance.
(2) The operations on F (A) are uniformly continuous and thus extend by continuity to F (A).

(3) The resulting stabilisation algebra F (A) is compact.
The idempotent power. If M is a finite monoid, then for any m ∈ M and n |M |, we have m ω = m n! (where ω is the idempotent power). Since finite stabilisation algebras are in particular monoids where ω is the idempotent power, we obtain that for any u ∈ F (A) and n > 0, d(u n! , u ω ) 2 −n . Therefore, for any element u ∈ F (A), the sequence (u n! ) n∈N converges in F (A), to u ω .
Notice however that this morphism does not coincide with the interpretation of regular cost functions as subsets of A * as done in [30].
The profinite metric can be relativised to any variety of stabilisation algebras to obtain the so-called pro-V metric. For s, t ∈ F (A) and V a variety of stabilisation algebras, define We now define the pro-V stabilisation algebra F V (A) as the completion of F (A)/∼ V with respect to d V . As before, we can show that F V (A) is compact and can be equipped with a structure of stabilisation algebra. The following result now follows from general results on profinite algebras.

Duality, Equations and Identities
Stone duality tells us that every bounded distributive lattice L has an associated compact Hausdorff space, called its dual space. The dual space of the Boolean algebra of all regular languages of A * [3] is the free profinite monoid on A.
A similar result holds for the lattice of regular cost functions, which, by Proposition 4.3, is isomorphic to the lattice of recognisable downsets under union and intersection.

Equations of Lattices
It is shown in [14] that any lattice of regular languages can be defined by a set of equations of the form u → v, where u and v are profinite words. This result can also be extended to recognisable downsets.
Let u, v ∈ F (A). We say that a recognisable downset I of F (A) satisfies the equation u → v if u ∈ I implies v ∈ I, where I denotes the topological closure of I.
A set L of recognisable downsets is defined by a set E of equations if the following property holds: a recognisable downset belongs to L if and only if it satisfies all the equations of E. We can now state our second main result.

Theorem 7.2. A set of recognisable downsets of F (A) is a lattice of recognisable downsets if and only if it is defined by a set of equations of the form u → v.
The case of lattices of languages closed under quotients was also considered in [14]. The corresponding notion for lattices of downsets is to be closed under contexts.

Identities of Varieties
Condition (V 2 ) of the definition of a variety allows one to use identities instead of equations.
Let B be an alphabet and let u and v be two elements of F (B). We say that a recognisable downset I of F (A) satisfies the profinite identity u v if, for each morphism γ : We use the term identity because, in this case, each letter of B can be replaced (through the morphism γ) by any element of F (A).
In practice, it is more convenient to use the following characterisation. Let I be a recognisable downset and let M be its syntactic stabilisation algebra. Then For instance, the variety of regular cost functions defined by x ω = x ω+1 and x = x ω contains only the characteristic functions of star-free languages [28].

Aperiodic Cost Functions
The variety of aperiodic cost functions is defined by the identity x ω = x ω+1 . It contains recognisable downsets that are not languages, like u → |u| a . This variety has a nice connection with the logics CFO and CLTL, first introduced in [16,17] as a generalisation to cost functions of the logics FO and LTL on words. Indeed, the results of [16,17] can be reformulated as follows:

Theorem 8.2. The variety of aperiodic cost functions coincides with the variety of CFOdefinable cost functions and with the variety of CLTL-definable cost functions.
Note that given a finite stabilisation algebra M , one can effectively test whether it verifies equations like x ω = x ω+1 or x = x ω : it suffices to check that it stands for each x in M . It follows that one can effectively decide whether a regular cost function is CFO-definable (respectively CLTL-definable).

Temporal Cost Functions
Another interesting example is the class of temporal cost functions, first introduced in [11]. These functions allow one to count the number of occurrences of consecutive events. Many equivalent characterizations of these functions are known. In [11], the algebraic characterization is expressed in terms of the interplay between Green relations and stabilisation in the syntactic monoid, but it can be formulated in terms of equations as follows: Proof. Let M be the syntactic stabilisation monoid of a regular cost function f . An idempotent e is called stable if e = e. The algebraic characterization from [11] states that f is temporal if and only if an idempotent J -below a stable idempotent different from 1 is itself stable. Recall that the J -order is defined by e J s if there exist x, y ∈ M such that e = xsy. To show that our set of equations is equivalent to this characterization, it suffices to observe that an element is a stable idempotent if and only if it is of the form s for some s. This means that the characterization from [11] specifies that the idempotents of the form (xs z) ω , with s = 1, are stable. Using Corollary 3.5, one can now lift these properties to F (A), yielding the equations of the statement.

Commutative Cost Functions
The description of the variety of languages corresponding to commutative monoids is one of the first known examples of Eilenberg's correspondence between varieties of languages and varieties of monoids [13]. We prove below a similar result for cost functions.
Let us say that a finite stabilisation algebra M is commutative if for all x, y ∈ M , we have xy = yx. We will say that M is -commutative if it is commutative and for all x, y ∈ M , x y = (xy) . A cost function is called commutative (resp., -commutative) if its syntactic stabilisation algebra is commutative (resp., -commutative). A stabilisation algebra is said to be monogenic if it can be generated by a single element. We will use freely the following useful lemma. The stabilisation monoid S 1 defined in Example 2.3 has three idempotent elements 0 < a < 1 such that 0 = a = 0. It is also the syntactic stabilisation algebra of the function f = u → |u| a . Let U + 1 denote the stabilisation monoid with two idempotent elements 0 1 such that 0 = 0. Proposition 8.6. Let J + 1 be the variety of finite stabilisation algebras defined by the equations x 1, x 2 = x, xy = yx and x y = (xy) . Then the corresponding variety of cost functions is generated by the functions u → |u| a for all letters a.
Proof. Since S 1 is the syntactic stabilisation algebra of the function u → |u| a , it is equivalent to prove that the variety of finite stabilisation algebras V generated by S 1 is equal to J + 1 . Since S 1 satisfies all the equations of J + 1 , the relation V ⊆ J + 1 holds. To prove the opposite inclusion, consider a finite stabilisation algebra M of J + 1 . By Lemma 8.5, M divides the product of its monogenic stabilisation subalgebras. But if m ∈ M , the stabilisation algebra generated by m is {1, m, m }: indeed, the equations m 2 = m, (m ) 2 = m and the properties of a stabilisation monoid imply that mm = m m = m . Thus, this stabilisation algebra is either {1}, U + 1 or S 1 . Since U + 1 is a quotient of S 1 , M actually divides a product of copies of S 1 , and therefore M ∈ V. Thus V = J + 1 .
As stated earlier, if we add the equation x = x ω , we obtain the positive variety of regular languages corresponding to the variety of ordered monoids generated by the ordered monoid U + 1 (see [19]). Proposition 8.7. Let Acom be the variety of finite stabilisation algebras defined by the equations x ω = x ω+1 , xy = yx and x y = (xy) . Then the corresponding variety of cost functions is generated by the functions u → |u| a and χ L a,k where L a,k = {u | |u| a = k} for each k 0 and each letter a. Proposition 8.8. Let Com be the variety of finite stabilisation algebras defined by the equations xy = yx and x y = (xy) . Then the corresponding variety of cost functions is generated by the functions u → |u| a , χ L a,k and χ L k,n , where L a,k,n = {u | |u| a ≡ k mod n}.

Conclusion
We provide a new representation of regular cost functions as downsets of a free stabilisation algebra, an ordered algebraic structure. This new representation allows us to extend Eilenberg's variety theory, in its ordered version: varieties of regular cost functions correspond to varieties of finite stabilisation algebras and are characterised by profinite identities. Furthermore, we also extend the duality approach of [14] to this new setting, leading to profinite equational descriptions of lattices of regular cost functions. Finally, we give several examples of equational characterisations of classes of cost functions related to logic. We also investigate the extensions of commutative languages to regular cost functions. We uncover the role of a new identity, x y = (xy) , in the study of these extensions. These results confirm the pertinence and the usefulness of the theory of regular cost functions as a well-behaved quantitative generalisation of regular languages. They also open new perspective for the study of cost functions.
For instance, it would be interesting to extend other known characterisations of varieties of languages to the setting of cost functions. An emblematic example would be Simon's characterisation of piecewise testable languages.