Polynomial Time Approximation Schemes for Clustering in Low Highway Dimension Graphs

We study clustering problems such as k -Median, k -Means, and Facility Location in graphs of low highway dimension, which is a graph parameter modeling transportation networks. It was previously shown that approximation schemes for these problems exist, which either run in quasi-polynomial time (assuming constant highway dimension) [Feldmann et al. SICOMP 2018] or run in FPT time (parameterized by the number of clusters k , the highway dimension


Introduction
Clustering is a standard optimization task that seeks a "good" partition of a metric space, such that two points that are "close" should be in the same part. A good clustering of a dataset allows to retrieve and exploit data, and is therefore a common routine in data analysis. The underlying data can come from various sources and represent many different objects. In particular, it is often interesting to cluster geographic data. In that case, the metric space can be given by a transportation network, which can be modeled by graphs with low highway dimension.
In this article, we study some popular clustering objectives, namely Facility Location, k-Median, and k-Means, in graphs with constant highway dimension. The two latter problems seek to find a set S of k points called centers in a metric (V, dist) that minimizes the function v∈V (min f ∈S dist(v, f )) p , with p = 1 for k-Median and p = 2 for k-Means. The objective for Facility Location is slightly different: each point f of the metric space has an opening cost w f , and the goal is to find a set S that minimizes f ∈S w f + v∈V min f ∈S dist(v, f ). These problems are APX-hard in general metric spaces [6].
To bypass the hardness of approximation known for these problems, researchers have considered low dimensional input, such as Euclidean spaces of fixed dimension, metrics with bounded doubling dimension, or with bounded genus. Many algorithmic tools were developed for that purpose: in their seminal work, Arora et al. [5] gave the first polynomial time approximation scheme (PTAS) for k-Median in R 2 , which generalizes to a quasi-polynomial time approximation scheme (QPTAS) in R d for fixed d. This result was generalized by Talwar [23], who gave a QPTAS for metrics with bounded doubling dimension, and more recently by Cohen-Addad et al. [12], who gave a near-linear time approximation scheme.
In this work we focus on transportation networks, for which it can be argued that metric spaces with bounded doubling dimension are not a suitable model: for instance, hub-and-spoke networks seen in air traffic networks do not have low doubling dimension. Therefore we study graphs with constant highway dimension, which formalize structural properties of such networks. The following definition is taken from Feldmann et al. [17]. Here the ball β v (r) of radius r around v ∈ V is the set of all vertices at distance at most r from v. Definition 1. The highway dimension of a graph G is the smallest integer h such that, for some universal constant c > 4, for every r ∈ R + , and every ball β v (cr) of radius cr, there are at most h vertices in β v (cr) hitting all shortest paths of length more than r that lie in β v (cr).
For this class of graphs, the only known approximation algorithms for clustering that compute (1 + ε)-approximations for any ε > 0 either run in quasi-polynomial time, i.e., QPTASs [17], or with runtime f (h, k, ε) · n for some exponential function f , i.e., parameterized approximation schemes [8,10]. Thus an open problem is to identify polynomial-time approximation schemes (PTASs) for clustering in graphs of constant highway dimension.

Our results
Our main result is a PTAS for clustering problems on graphs of constant highway dimension. For convenience, we define slightly more general problems than those stated above. The k-Clustering q problem is defined as follows. An instance I consists of a metric (V, dist), a set of facilities (or centers) F ⊆ V , and a demand function χ : V → N 0 . The goal is to find a set S ⊆ F with |S| ≤ k minimizing v∈V χ(v) · min f ∈S dist(v, f ) q . We call all vertices v ∈ V with χ(v) > 0 the clients of I. k-Median and k-Means are special cases of k-Clustering q , where q = 1 and q = 2.
The input to the Facility Location q problem is the same as for k-Clustering q , but additionally each facility f ∈ F has an opening cost w f ∈ R + . The goal is to find a set S ⊆ F minimizing f ∈S w f + v∈V χ(v) · min f ∈S dist(v, f ) q . Facility Location is a special case of Facility Location q , where q = 1.
Our main theorem is the following, where X = max v∈V χ I (v) is the largest demand (note that for k-Median, k-Means, or Facility Location we typically have X = 1).
Theorem 2. For any ε > 0, a (1+ε)-approximation for k-Clustering q and Facility Location q can be computed in (nX) (hq/ε) O(q) time on graphs of highway dimension h.
In particular, this algorithm is much faster than the quasi-polynomial time approximation scheme of Feldmann et al. [17] for k-Median or Facility Location. The runtime of our algorithm also significantly improves over the exponential dependence on k in the approximation schemes of Becker et al. [8] and Braverman et al. [10] for k-Median.
It has so far been open whether these clustering problems are NP-hard on graphs of constant highway dimension. We complement our main theorem by showing that they are NP-hard even for the smallest possible highway dimension. This answers an open problem given in [17]. Here the uniform Facility Location q problem has unit opening costs for all facilities.
Theorem 3. The k-Clustering q and uniform Facility Location q problems are NP-hard on graphs of highway dimension 1.

Related work
On clustering problems. The problems we focus on in this article are known to be APX-hard, even in Euclidean spaces (see e.g. [6]). In general metric spaces, the current best polynomial-time algorithm for Facility Location achieves a 1.488-approximation [22], while the best approximation factor is 2.67 for k-Median ( [11]) and 6.357 for k-Means [4].
When restricting the class of graphs, a near-linear time approximation scheme for doubling metrics was developed in [12]; we will discuss the close relations between our work and this one in Section 1.3. Local search techniques also yield a PTAS in minor-free graphs or with bounded doubling dimension [13,18], and a Θ(q)-approximation for the k-Clustering q problem in general metric spaces [20].
Another technique for dealing with clustering problems is to compute coresets, a compressed representation of the input. An ε-coreset is a weighted set of points such that for every set of centers, the cost for the original set of points is within a (1 + ε)-factor of the cost for the coreset. Braverman et al. [10] recently proved that graphs with highway dimension h admit coreset of size O((k + h) O(1/ε) ). This enables to compute a (1 + ε)-approximation by enumerating all possible solutions of the coreset. However, this coreset does not have small highway dimension, 1 and thus cannot be used to boost our algorithms.
On highway dimension. The highway dimension was originally defined by Abraham et al. [3], who specifically chose balls of radius 4r in the Definition 1. Since the original definition in [3], several other definitions have been proposed. In particular, Feldmann et al. [17] proved that when choosing a radius cr in Definition 1 for any constant c strictly larger than 4, it is possible to exploit the structure of graphs with constant highway dimension in order to obtain a QPTAS for problems such as TSP, Facility Location, and Steiner Tree. As Abraham et al. [3] point out, the choice of the constant is somewhat arbitrary, and we use the above definition so that we may exploit the structural insights of [17] for our algorithm. These structural properties were also leveraged by Becker et al. [8] who gave a PTAS for the Bounded-Capacity Vehicle Routing problem, and a parameterized approximation scheme for the k-Center problem (which is essentially k-Clustering q with q = ∞) and k-Median. In the lower bound side, Disser et al. [15] showed that Steiner Tree and TSP are weakly NP-hard even when the highway dimension is 1, i.e., each of them is NP-hard but an FPTAS exists.
It is worth mentioning that further definitions of the highway dimension exist (for a detailed discussion see Appendix A and [9,17]). In particular, for a more general definition of the highway dimension than the one of Definition 1, Feldmann [16] gave a parameterized 3/2-approximation algorithm with runtime 2 O(kh log h) n O(1) for k-Center.

Our techniques
To obtain Theorem 2, we rely on the framework recently developed by Cohen-Addad et al. [12] for doubling metrics. More precisely, they show that the split-tree decomposition of Talwar [23] has some interesting properties, and exploit them to design their algorithm. Our main contribution is to provide a decomposition with similar properties in graphs with constant highway dimension. This is done relying on some structural properties of such graphs presented by Feldmann et al. [17]. We start by giving the outline of the algorithm from [12], and then explain how to carry the results over to the highway dimension setting.
On doubling metrics. The starting point of many approximation algorithms for doubling metrics is a decomposition of the metric, as presented in the following lemma. 2 A hierarchical decomposition D of a metric (V, dist) is a set of partitions B 0 , B 1 , . . . , B λ , where B i refines B i+1 , i.e., every part B ∈ B i is contained in some part of B i+1 . Moreover, in B 0 every part contains a singleton vertex, while B λ contains only one part, namely V . For a point v ∈ V and a radius r > 0, we say that the ball β v (r) is cut at level i if i is the largest integer for which the ball β v (r) is not contained in a single part of B i . For any subset W ⊆ V we define λ(W ) = log 2 diam(W ) .
Lemma 4 (Reformulation of [7,23] as found in [12]). For any metric (V, dist) of doubling dimension d and any ρ > 0, there exists a polynomial-time computable randomized hierarchical decomposition D = {B 0 , . . . , B λ(V ) } such that: 1. Scaling probability: for any v ∈ V , radius r, and level i, we have Portal set: every part B ∈ B i where B i ∈ D comes with a set of portals P B ⊆ B that is (a) concise: the size of the portal set is bounded by We sketch briefly the standard use of this decomposition. For clustering problems, one can show that there exists a portal-respecting solution with near-optimal cost (see Talwar [23]). In this structured solution, each client connects to a facility via a portal-respecting path that enters and leaves any part B of D only through a node of the portal set P B . Those portals therefore act as separators of the metric. A standard dynamic program approach can then compute the best portal respecting solution.
To ensure that there is a portal-respecting solution with near-optimal cost, one uses the preciseness property of the portal set: the distortion of connecting a client c with a facility f through portals instead of directly is bounded as follows. Let i be the level at which D cuts c and f , meaning that i is the maximum integer for which c and f lie in different parts of B i . At every level j ≤ i, the distortion incurred by using portals is ρ2 j . Hence the total distortion is j≤i ρ2 j = ρ2 i+1 . Now, property (1) of the decomposition ensures that c and f are cut at level i with probability O(dist(c, f )/2 i ). Hence combining those two bounds over all levels ensures that, in expectation, the distortion between c and f is O(dist(c, f ) · ρλ(V )). Since λ(V ) = O(log n), choosing ρ = ε/ log n gives a distortion of O(ε dist(c, f )). Summing over all clients proves that there exists a near-optimal portal-respecting solution.
The issue with this approach is that the number of needed portals is O(log d n), and the dynamic program has a runtime that is exponential in this number. Thus the time complexity is quasipolynomial. The novelty of [12] is to show how to reduce the number of portals to a constant. The idea is to reduce the number of levels on which a client can be cut from its facility.
For this, they present a processing step of the instance, that helps deal with clients cut from their facility at a high level. Roughly speaking, their algorithm computes a constant factor approximation L, and a client c is called badly-cut if D cuts it from its closest facility of L at a level larger than log(dist(c, L)/ε). Every badly-cut client is moved to its closest facility of L. Moreover, every client at distance less than ε dist(c, L) of its closest facility of L can be moved to it as well. It is then shown that this new instance I D has small distortion, which essentially means that any solution to I D can be converted to a solution of the original instance I while only losing a (1 + ε)-factor in quality. In this instance I D , all clients are cut from their closest facility of L at some level between log(ε dist(c, L)) and log(dist(c, L)/ε). Using this property, it can be shown that c and its closest center in the optimal solution are also cut at a level in that range. As there are only O(log(1/ε)) levels in this range, by the previous argument, the number of portals is a constant. (See Section 2 for formal definitions and lemmas.) On highway dimension. The above arguments for doubling metrics hold thanks to Lemma 4. In this work, we show how to construct a similar decomposition for low highway dimension: Lemma 5. Given a shortest-path metric (V, dist) of a graph with highway dimension h, a subset W ⊆ V , and ρ > 0, there exists a polynomial-time computable randomized hierarchical decomposition D = {B 0 , . . . , B λ(W ) } of W such that: 1. Scaling probability: for any v ∈ V , radius r, and level i, we have Our construction relies on the town decomposition from [17], which has the following properties: for a graph with highway dimension h and a given ρ > 0, every part T of the decomposition (called a town) has a set X T of hubs with doubling dimension O(h log 1/ρ), such that for any two vertices u and v in different child towns of T , there is a vertex x ∈ X T such that dist(u, x) + dist(x, v) ≤ (1 + 2ρ) · dist(u, v) -see Theorem 8 for more details.
This hub set X T is similar to the portal set of Lemma 4, but has some fundamental differences: the first one is that the decomposition is deterministic, and so it may happen that a client and its facility are cut at a very high level -something that happens only with tiny probability in the doubling setting thanks to the scaling probability. Another main difference is that the size of X T might be unbounded. As a consequence, it cannot be directly used as a portal set in a dynamic program. To deal with this, we combine the town decomposition with a hierarchical decomposition of each set X T according to Lemma 4, to build an interface as stated in Lemma 5. A further notable difference to portals is that the preciseness property of the resulting interface is weaker. In particular, while there is a portal close to each vertex of a part, the hubs can be far from some vertices as long as they lie close to the shortest path to other vertices, which however can be far (due to Lemma 9). As a consequence no analog of near-optimal portal-respecting paths exist. Instead, when connecting a client c with a facility f we need to use the interface point of I B provided by the preciseness property of Lemma 5 close to the shortest path between c and f , where B contains both c and f . This shifts the perspective from externally connecting vertices of a part to vertices outside a part, as done for portals, to internally connecting vertices of parts, as done here.
As a consequence, we develop a dynamic program, which follows more or less standard techniques as for instance given in [5,21], but needs to handle the weaker preciseness property of the interface. The main idea is to guess the distances from interface points to facilities while recursing on the decomposition D of Lemma 5. Due to the shifted perspective towards internally connecting vertices of parts, the runtime of the dynamic program depends exponentially on the total number of levels. However, it can be shown that it suffices to compute a solution on a carefully chosen subset W of the metric for which only a logarithmic number of levels of the decomposition need to be considered, and thus the runtime is polynomial.

Outline
After defining the concepts we use, and stating various structural lemmas in Section 2, we show how to incorporate our decomposition into the framework of [12]. The proof of Lemma 5 is then presented in Section 3. The formal algorithm is deferred to Section 4. We conclude the main body of this paper with the hardness proof of Theorem 3 in Section 5.

Preliminaries
On doubling metrics. The doubling dimension of a metric is the smallest integer d such that for any r > 0 and v ∈ V , the ball β v (2r) of radius 2r around v can be covered by at most 2 d balls of half the radius r. A doubling metric is a metric space where the doubling dimension is bounded. In those spaces, one can show the existence of small nets: Definition 6. A δ-net of a metric (V, dist) is a subset of nodes N ⊆ V with the property that every node in V is at distance at most δ from a net point of N , and each pair of net points of N are at distance more than δ. Lemma 7 ([19]). Let (V, dist) be a metric space with doubling dimension d. If its diameter is D, and N is a δ-net of V , then |N | ≤ 2 d· log 2 (D/δ) . Moreover, any subset W ⊆ V has doubling dimension at most 2d.
On highway dimension. We note that for simplicity we will set c = 8 in Definition 1 throughout this paper, even if all claimed results are also true for other values of c. When we refer to a metric as having highway dimension h, we mean that it is the shortest-path metric of a graph of highway dimension h. The main result we will use about highway dimension is existence the of the following decomposition: . Given a shortest-path metric (V, dist) of highway dimension h, and ρ > 0, there exists a polynomial-time computable deterministic hierarchical decomposition T , called the town decomposition, such that every part T ∈ T , called a town, has a set of hubs 3 X T ⊆ T with the following properties: a. doubling: the doubling dimension of X T is d = O(log(h log(1/ρ))), and b. precise: for any two vertices u and v in different child parts of T , there is a vertex x ∈ X T such that dist(u, The town decomposition behaves differently from those in Lemmas 4 and 5 in several ways. The main properties we will need here are the following. On how to incorporate our decomposition into the framework of [12]. Assume we are given an instance I of k-Clustering q or Facility Location q on some metric (V, dist), together with a hierarchical decomposition D of the metric with the properties listed in Lemma 5. We start by defining the badly cut clients. In the following, we fix an optimal solution OPT and an approximate solution L, and we define τ (ε, q, σ) = log 2 (σ(q + 1) q /ε q+1 ).
Definition 10 (badly cut [12]). Let (V, dist) be a metric of an instance I of k-Clustering q or Facility Location q , D be a hierarchical decomposition of the metric with scaling probability factor σ, and ε > 0. If L v is the distance from v to the closest facility of an approximate solution L to I, then a client c is badly cut w.r.t. D if the ball β c (3L c /ε) is cut as some level i greater than Similarly, if OPT v is the distance from v to the closest facility of the optimum solution OPT of I, then a facility f ∈ L is badly cut w.r.t. D if β f (3OPT f ) is cut at some level i greater than log 2 (3OPT f ) + τ (ε, q, σ).
Given an instance I of k-Clustering q or Facility Location q and a decomposition D of the metric, a new instance I D is computed to get rid of badly cut clients. The instance I D is built from I by moving clients that are badly cut w.r.t. D to their closest facility in L. 4 For any client c of I D we denote byc the original position of this client in I, i.e., ifc is a badly cut client of I then c = L(c) and otherwise c =c. The set F of potential centers in unchanged, and thus any solution of I is a solution of I D , and vice versa. Note that I D does not contain any badly cut client w.r.t. D, and that the definition of I D depends on the randomness of D.
To describe the properties we obtain for the new instance, given a solution S to any instance I 0 of k-Clustering q or Facility Location q , we define cost I 0 (S) = v∈V χ I 0 (v) · dist(v, S) q to be the cost incurred by only the distances to the facilities. Given some ε > 0 and the computed instance I D from I, we define If B D denotes the set of badly cut facilities (w.r.t D) of the solution L to I from which instance I D is constructed, we say that I D has small distortion w.r.t. I if ν I D ≤ ε cost I (L), and there exists a witness solutionŜ ⊆ F that contains B D and for which cost I D (Ŝ) ≤ (1 + O(ε)) cost I (OPT) + O(ε) cost I (L). Moreover, in the case of Facility Location q , Based on these definitions, we now state the main tool we use from [12], and which exploits the scaling probability of our decomposition in Lemma 5 to obtain the required structure. Lemma 11 ([12]). Let (V, dist) be a metric, and D be a randomized hierarchical decomposition of (V, dist) with scaling probability factor σ. Let I be an instance of k-Clustering q or Facility Location q on (V, dist), with optimum solution OPT and approximate solution L. For any (sufficiently small) ε > 0, with probability at least 1 − ε (over D), the instance I D constructed from I and L as descibed above has small distortion with a witness solutionŜ. Furthermore, every client c of I D is cut by D from its closest facility inŜ at level at most log 2 (3Lc/ε + 4OPTc) + τ (ε, q, σ), wherec is the original position of c in I.
As a consequence of Lemma 11, a dynamic program can compute a solution recursively on the parts of D in polynomial time, as sketched in Section 1.3 and detailed in Section 4.

Decomposing the graph
This section is dedicated to the proof of Figure 1: A town T and its child towns (black circles). The hubs (crosses) are decomposed by X T (indicated by different colours). Parts B ∈ B i+1 (red dashed) are decomposed into parts on level i (pink dashed). Parts of B i−1 can lie in different towns (e.g., the child town of T with subtowns in grey). Lemma 5. The general idea to construct D is as follows. For doubling metrics, to decompose a part at level i, it is enough to pick a random diameter δ ∈ [2 i−2 , 2 i−1 ) and divide the part into child parts of diameter δ. This is not doable in the highway dimension setting: if one wishes to decompose a town T , it cannot divide any of the child towns, since it is not possible to use the approximate core hubs of T to approximate paths inside one of the child towns. The big picture of our decomposition is therefore as follow. To decompose a town at level i, we group randomly (as in the doubling decomposition) the "small" child towns, and put every "big" child town in its own subpart. As we will see, this turns out to be enough.
In order to decompose a town T , we need the following definitions. For each child town T of T we identify the connecting hub x ∈ X T , which is some fixed closest hub of X T to T , breaking ties arbitrarily. Moreover, given a hierarchical decomposition X T = {U 0 , . . . , U λ(X T ) } of X T , we define for every i the connecting i-cluster of a child town T of T to be the set U ∈ U on level = min{i, λ(X T )} containing the connecting hub of T . We then follow the steps below, after choosing µ from the interval (0, 1] uniformly at random (cf. Fig. 1): 1. Given a town T ∈ T , we apply Lemma 4 to find a randomized hierarchical decomposition Using X T , we define a randomized partial decomposition of T ∩ W as follows. For any i and U ∈ U min{i,λ(X T )} , let the set A U i ⊆ T ∩ W be the union of all T ∩ W where T is a child town of T with the following two properties: (a) U is the connecting i-cluster of T , and i contains all towns somewhat close to U , and with small diameter due to Lemma 9. We let A T i be the set containing each non-empty A U i .
3. Now, the hierarchical decomposition D = {B 0 , . . . , B λ(W ) } of W can be constructed inductively as follows. At the highest level λ(W ) of D, W is partitioned in a single set: B λ(W ) = {W }. Now, to decompose a part B ∈ B i+1 at level i + 1, we do the following. Let T ∈ T be the inclusion-wise minimal town for which B ⊆ T . The "small" subtowns of T lying inside B are grouped according to step (2) (note that dist(T , V \ T ) also bounds the diameter of T by Lemma 9), and the other ones form individual subparts. More formally, the set B i contains every part A ∈ A T i for which A ⊆ B, and also every set T ∩ W , where T is a child town of T for which T ∩ W ⊆ B and T ∩ W was not covered by the previously added parts of i . To prove that the constructed decomposition D has the desired properties -i.e. that it is indeed a hierarchical decomposition, with parts of bounded diameter and small scaling probability factor -we begin with some auxiliary lemmas, of which the first one bounds the distance of a town to its connecting hub.
Proof. Let T be the closest sibling town to T , and let u ∈ T and v ∈ T be the vertices defining Since the connecting hub x of T is at least as close to T as y, the claim follows.
Based on the above lemma, we next prove a key property that the diameter of any part of B i ∈ D is bounded.
Lemma 13. If ρ ≤ 1/2, then the diameter of any part of B i ∈ D is less than 2 i+4 .
For any level i < λ(W ), a set in B i is equal to a set A ∈ A T i for some town T ∈ T or it is equal to some set T ∩ W for a child town T of T . In the former case, the set A is equal to a set A U i for some cluster U ∈ U where = min{i, λ(X T )} and U ∈ X T . The set A U i contains the union of sets T ∩ W for child towns T of T , for which their connecting hubs lie in U and dist(T , V \ T ) ≤ µ2 i ≤ 2 i , as µ ≤ 1. Thus from Lemma 12 we get dist(U, T ) ≤ (1 + 2ρ)2 i , and by Lemma 9 we have diam(T ) < dist(T , V \ T ) ≤ 2 i . The cluster U has diameter less than 2 i+1 by Lemma 4, since it is part of the hierarchical decomposition X T and lies on level ≤ i. Let u and v be the vertices of We may reach v from u by first crossing the child town T that u lies in, then passing over to U , then crossing U , after which we pass over to the child town T containing v, and finally crossing this child town as well to reach v. Hence, assuming that ρ ≤ 1/2 the diameter of A U i is bounded by Now consider the other case, when a set B ∈ B i on level i < λ(W ) is equal to some set T ∩ W for a child town T of a town T . For such a child town T there is no enforced upper bound on the distance to other child towns as before, and thus it is necessary to be more careful to bound the diameter of the part. Starting with B = B i , let B i ⊆ B i+1 ⊆ . . . ⊆ B j be the longest chain of parts of increasing levels that are of the same type as B. More concretely, for every ∈ {i, i + 1, . . . , j} we have B ∈ B and B is equal to some set T ∩ W for a child town T of the inclusion-wise minimal town T containing B +1 . Note that in particular j < λ(W ). As we chose the longest such chain, on the next level j + 1 there is no such set containing B j , which means that the set B j+1 ∈ B j+1 for which B j ⊆ B j+1 is either equal to a set A ∈ A T j+1 j+1 for some town T j+1 , or j + 1 = λ(W ). In either case, from above we get diam(B j+1 ) ≤ 2 j+4 .
Note that for any ∈ {i, i + 1, . . . , j − 1} the town T is a descendant town of T +1 , since B +1 is contained in T +1 and T is a child town of the inclusion-wise minimal town T containing B +1 . By Theorem 8 and Lemma 9 we thus get diam( The town T j is the inclusion-wise minimal town containing B j+1 , while at the same time the child town T j of T j contains B j . As B j ⊆ B j+1 , this means that B j+1 both contains vertices inside and outside of T j , and so dist( , and putting all these inequalities together we obtain Using Lemma 13 it is not hard to prove the correctness of D, which we turn to next. Proof. To show that each B i−1 is a partition of W , we prove that for a part B ∈ B i included in town T , part B is partitioned into unions of sets T ∩ W for child towns T of T . Indeed, either B = T ∩ W , and properties of the town decomposition ensures that B is partitioned into unions of sets T ∩ W , or B ∈ A T i . By construction of A T i , part B is therefore a union set sets T ∩ W for child towns T of T . Thus in either case, the claim holds. Now, step (3) of the construction decomposes B into groups of child towns restricted to W . Hence B is indeed partitioned by B i−1 , and therefore B i−1 is a partition of W . That concludes the lemma.
We now turn to proving the properties of Lemma 5, starting with the scaling probability. Proof. To prove the claim, we need to prove that for any v ∈ W , radius r, and level i, the probability that D cuts the ball β v (r) at level i is at most is fully contained in a part at level i + 1: let T ∈ T be the inclusion-wise minimal town containing that part. There are two cases to consider: either β v (r) is cut by "small" parts, i.e. there exist two distinct parts A, A ∈ A T i such that v ∈ A and u ∈ A for some u ∈ W ∩β v (r), or not. We start with the latter case, when β v (r) is not cut by small parts. If D cuts the ball at level i, there are distinct parts B, B ∈ B i such that v ∈ B and u ∈ B for some u ∈ W ∩ β v (r). Assume w.l.o.g. that B / ∈ A T i (which is possible to assume since β v (r) is not cut by small parts). By construction of the decomposition, there must be a child town T of T , for which and hence µ ≤ r/2 i . The decomposition D can therefore only cut β v (r) on level i if µ < r/2 i . Since µ is chosen uniformly at random from the interval (0, 1], the probability is less than r/2 i . We now turn to the other case, when β v (r) is cut by two small parts A 1 and A 2 . The town T must have two child towns T 1 and T 2 for which v ∈ T 1 ∩ W ⊆ A 1 and u ∈ T 2 ∩ W ⊆ A 2 . Let x 1 and x 2 be the connecting hubs of T 1 and T 2 . The decomposition D cuts v and u on level i if and only if X T cuts x 1 and x 2 on level = min{i, λ(X T )}. Indeed, let U 1 and U 2 be the connecting i-clusters of T 1 and T 2 : then Thus D cuts v and u on level i if and only if U 1 = U 2 , i.e., if and only if X T cuts x 1 and x 2 on level = min{i, λ(X T )}.
To compute the probability that x 1 and x 2 are cut, it is necessary to bound the distance between them.
By Lemma 12 the distance between T j and its connecting hub x j ∈ X T is thus at most (1 + 2ρ)r. Also, by Lemma We can reformulate the above as follows: if D cuts the ball β v (r) at level i, and β v (r) is cut by some "small" parts A 1 and A 2 , then X T cuts the ball β x 1 ((5 + 4ρ)r) on level i, where x 1 is the hub defined for v above. We know that the probability of the latter event is at most 2 O(d) (5 + 4ρ)r/2 i by Lemma 4, where d = O(log(h log(1/ρ))) is the doubling dimension of X T by Theorem 8. Hence the probability that D cuts the ball β v (r) at level i is bounded by (h log(1/ρ)) O(1) · r/2 i . Taking a union bound over the two considered cases proves the claim.
To prove the remaining property of Lemma 5 for D, for each B ∈ B i we need to choose an interface I B from the whole vertex set V . For this we use a carefully chosen net (see Definition 6) of the hubs of the inclusion-wise minimal town T containing B, as formalized in the following lemma.
The interface I B has the conciseness and preciseness properties of Lemma 5 for ρ ≤ 1/2.
Proof. We first prove that I B is precise. Consider two vertices u, v ∈ B that are cut at level i − 1 by D. This means there are two distinct parts B , B ∈ B i−1 on this level such that v ∈ B and u ∈ B . By definition, both B and B are unions of sets T ∩ W where T is a child town of the inclusion-wise minimal town T containing B. Also B ∩ B = ∅ by Lemma 14. This means that T has two child towns T 1 and T 2 for which v ∈ T 1 ∩ W ⊆ B and u ∈ T 2 ∩ W ⊆ B . By Theorem 8, there is an approximate core hub x ∈ X T such that dist(u, Theorem 8 says that X T has doubling dimension O(log(h log(1/ρ))), and as Y B ⊆ X T the same asymptotic bound holds for the doubling dimension d of Y B by Lemma 7. Therefore we get |I B | ≤ (h log(1/ρ)) O(log(1/ρ)) = (h/ρ) O(1) , which concludes the proof.

The algorithm
Let an instance I of the k-Clustering q or Facility Location q problem on a shortest-path metric (V, dist) of a graph G with highway dimension h, and maximum demand X = max v∈V χ I (v) be given. The algorithm performs the following steps: 1. compute a town decomposition T together with the hubs for each town as given by Theorem 8. 2. compute a hierarchical decomposition D according to Lemma 5, while simultaneously converting I into a coarse instance w.r.t. D, meaning that there is a subset W ⊆ V for which • the clients and facilities of I are contained in W , i.e., F ∪ {v ∈ V | χ I (v) > 0} ⊆ W , and • every part of D on level at most ξ(W ) = λ(W ) − 2 log 2 (nX/ε) has at most one facility, i.e., |B ∩ F | ≤ 1 for every B ∈ B ξ(W ) . 3. compute the instance I D of small distortion as given by Lemma 11. 4. run a dynamic program on I D as given in Section 4.2, to compute an optimum rounded interface-respecting solution (see Section 4.1 for a formal definition), and output it as a solution to the input instance. In a nutshell, the coarseness of the instances guarantees that only a logarithmic number of levels need to be considered by the dynamic program. This step loses a (1 + ε)-factor in the solution quality. The dynamic program is only able to compute highly structured solutions, which are captured by the notion of rounding and interface-respecting. Due to this, another (1 + ε)-factor in the solution quality is lost. In Section 4.1 we prove that the output of the dynamic program is a near-optimal solution to the input instance (proving Theorem 2), and we also detail step (4) of the algorithm. Then in Section 4.2 we describe the details of the dynamic program.

Approximating the distances
One caveat of the dynamic program is that the runtime is only polynomial if the the recursion depth is logarithmic. However when computing our decomposition on the whole metric (V, dist), the number of levels is λ(V ) + 1 = log 2 diam(V ) + 1, which can be linear in the input size. Standard techniques can be used to reduce the number of levels to O(log(n/ε)) when aiming for a (1 + ε)-approximation by preprocessing the input metric. However, for graphs of bounded highway dimension these general techniques change the hub sets and we would have to be careful to maintain the properties we need, as given by Theorem 8. Therefore we adapt the standard techniques to our setting via the notion of coarse instances.
The following lemma shows that we can reduce any instance to a set of coarse ones, for which, as we will see, our dynamic program only needs to consider the highest 2 log 2 (nX/ε) levels.
Lemma 17. Let I be an instance of k-Clustering q or Facility Location q on a graph G of highway dimension h. There are polynomial-time computable instances I 1 , . . . , I b and respective hierarchical decompositions D 1 , . . . , D b with the properties given in Lemma 5 for any ρ ≤ 1/2, such that for each i ∈ {1, . . . , b} the instance I i is also defined on G and is coarse w.r.t. D i . Furthermore, if an α-approximation can be computed for each of the instances I 1 , . . . , I b in polynomial time, then for any ε > 0 a (1 + O(ε))α-approximation can be computed for I in polynomial time.
Proof. Let us first describe the construction of the instances I 1 , . . . , I b . We begin by computing a constant approximation L to the given instance I of k-Clustering q or Facility Location q , using a γ-approximation algorithm as given in [20] where γ ∈ O(1). Let Λ be the value of the objective function of the approximate solution L, i.e., Λ = cost I (L) if I is an instance of k-Clustering q and Λ = cost I (L)+ f ∈L w f in case of Facility Location q . Let Γ be the objective function value of an optimum solution OPT to I. For every client c of I (for which χ I (c) > 0), we have dist(c, OPT) ≤ Γ 1/q ≤ Λ 1/q . Hence if we consider the subgraph of G spanned by all edges of length at most Λ 1/q , then the closest facility of OPT to c lies in the same connected component of the subgraph as c.
Ideally, we would want each of these components to define an instance I i . However, such a component might not have bounded highway dimension and we would thus not be able to compute a hierarchical decomposition using Lemma 5 for it. Instead we use the same input graph G = (V, E), but restrict the client and facility sets to a component. More formally, let W 1 , . . . , W b ⊆ V be the vertex sets of the connected components of the subgraph of G spanned by all edges of length at most Λ 1/q . Note that diam(W i ) ≤ nΛ 1/q . For each i ∈ {1, . . . , b} we define an instance I i on G with χ I i (v) = χ I (v) for every v ∈ W i and χ I i (v) = 0 otherwise. Initially, the facility set F i of I i is F ∩W i , where F is the facility set of I. We still need to coarsen this set F i though, which we do next.
At this point we compute a hierarchical decomposition D i of W i for each I i using G according to Lemma 5, i.e., the interface sets are from V ⊇ W i . To make I i coarse w.r.t. D i , consider a part B ∈ B ξ(W i ) of D i containing facilities from F i . In case of Facility Location q , let f ∈ F i ∩ B be a facility of minimum weight w f among those in F i ∩ B, and in case of k-Clustering q , fix an arbitrary f ∈ F i ∩ B. We call f the representative facility of I i of the facilities in F i ∩ B, and remove all facilities other than f in F i ∩ B from the set F i . We repeat this for every part of B ξ(W ) . Note that W i contains all facilities and clients of I i , i.e., To prove the second part of the lemma, consider the optimum solution OPT to I. We define a solution S * i to each I i , which for each facility in OPT contains the representative facility of I i . Since in case of Facility Location q the representative facility is the one of minimum opening cost in the respective part in B ξ(W i ) of D i and the facility sets of different instances are disjoint, which means that if I is an instance of k-Clustering q then each I i should be an instance of k i -Clustering q where k i = |S * i |. To bound the connection costs, first note that as λ(W i ) = log 2 (diam(W i )) , diam(W i ) ≤ nΛ 1/q , and 1/q < 1 + 1/q ≤ 2 for q ≥ 1 we have ξ(W i ) ≤ λ(W i ) − 2 log 2 (nX/ε) < log 2 (nΛ 1/q ) + 1 + log 2 ε 1+1/q n 1+1/q X 1/q = log 2 ε 1+1/q (Λ/(nX)) 1/q + 1. Now consider any client c of I i and its closest facilityf ∈ OPT in the optimum solution to I, for which we know that c,f ∈ W i . Let f * ∈ S * i be the representative facility off , which lies in the same part B ∈ B ξ(W i ) asf . By Lemma 13 the diameter of B is less than 2 To bound dist(c, f * ) q we need the following fact taken from [14]. Proposition 18 ([14]). Given x, y, q ≥ 0, and 0 < ε < 1/2 we have (x + y) q ≤ (1 + ε) q x q + (1 + 1/ε) q y q .
For constant q ≥ 1 we have (1+ε) q = 1+O(ε) and (1+1/ε) q = O(1/ε q ) as ε tends to zero. Thus the bound of Proposition 18 can be stated as (x+y) q ≤ (1+O(ε))x q +O(1/ε q )y q if q ≥ 1, and we get Using the definition of For Facility Location q , applying an α-approximation algorithm to each instance I i gives respective solutions S i for which As Λ is the objective function value of a γ-approximation to OPT where γ is constant, this means that by taking b i=1 S i as a solution to I we obtain a (1 + O(ε))α-approximation as required. For k-Clustering q we need to do more work, since we do not know the number of facilities k i to be opened in each instance I i . First, for every k ∈ {0, . . . , k} we compute an αapproximation S i (k ) to k -Clustering q on each instance I i , i.e., S i (k ) ⊆ F i and |S i (k )| = k , and define A i (k ) = cost I i (S i (k )) to be its objective function value. Now let A ≤i (k ) be of the minimum value of i j=1 A j (k j ) over all k 1 , . . . , k i where i j=1 k j = k . To compute A ≤i (k ) in polynomial time, we use the following simple recursion. For i = 1 we clearly have A ≤1 (k ) = A 1 (k ), and for i > 1 we have A ≤i (k ) = min{A ≤i−1 (k − k i ) + A i (k i ) | 0 ≤ k i ≤ k }. Note that it takes O(bk 2 ) time to compute all values A ≤i (k ). Finally, for the input instance I we output the union b i=1 S i (k i ) of solutions that obtain the value A ≤b (k). By definition of A ≤b (k) this is a feasibly solution with k facilities, and we have

Lemma 17 implies that if there is a PTAS for coarse instances, we also have a PTAS in general.
Hence from now on we assume that the given instance I is coarse w.r.t. a hierarchical decomposition D of some subset W of the vertices of the input graph G, where D has bounded scaling probability factor and concise and precise interface sets in G according to Lemma 5 (for some value ρ > 0 specified later) The next step of the algorithm is to compute a new instance I D with small distortion as given by Lemma 11. Recall that I D is obtained from I by moving badly cut clients to facilities of L. In particular, the instance I D is also coarse w.r.t. D, which means that we may run our dynamic program on I D .
The dynamic program exploits the interface sets of D by computing a near-optimum "interfacerespecting" solution to I D , i.e., a solution where clients are connected to facilities through interface points. Moreover, for the dynamic program to run in polynomial time it can only estimate the distances between interface points and facilities to a certain precision. In general, we denote by x i = min{(35 + δ)ρ2 i | δ ∈ N and ρδ2 i ≥ x} the value of x rounded to the next multiple of ρ2 i and shifted by 35ρ2 i . We then define the rounded interface-respecting distance dist (v, u) from a vertex v to another vertex u as follows. If v and u are not cut at any level, i.e., v = u, then dist (v, u) = 0. Otherwise, if i ≥ 1 is the level of D such that there is a part B ∈ B i with v, u ∈ B, and D cuts v and u at level i − 1, we let Note that dist (·, ·) does not necessarily fulfill the triangle inequality, and is also not symmetric. We therefore need the bounds of the following lemma.
Proof. Let B ∈ B i be the part on level i containing both v and u. By Lemma 5 there is an interface The second part is obvious if j = i from the definition of dist (v, u). If j ≥ i+1, we use the above bound on dist (v, u) together with the additive shift of the rounding and the triangle inequality of dist(·, ·) to obtain For any non-empty set S of facilities, we define dist (v, S) = min f ∈S {dist (v, S)}, and for empty sets we let dist (v, ∅) = ∞. Analogous to cost I 0 (S), for a solution S to some instance I 0 we also define cost I 0 (S) using dist (·, ·) as We show the following lemma, which translates between cost I D and cost I , and is implied by the preciseness of the interface sets and the fact that I D has small distortion. Recall that the set of facilities is the same in I and I D , i.e., a solution to one of these instances is also a solution to the other. Lemma 20. Let I be an instance of k-Clustering q or Facility Location q with optimum solution OPT and approximate solution L. Let I D be an instance of small distortion for some 0 < ε < 1/2, computed from L and a hierarchical decomposition D with precise interface sets for ρ ≤ ε q+4+1/q 280σ(q+1) q according to Lemma 5. Proof. To show the first inequality, we consider the rounded connection costs of clients to their closest facility inŜ via some interface point. That is, let c be a client of I D and let f ∈Ŝ be its closest facility (according to dist(·, ·)). If c = f , there is a level i ≥ 1 for which D cuts c and f at level i − 1. By Lemma 19 we have dist (c, f ) ≤ dist(c, f ) + 70 · ρ2 i . Also, by Lemma 11 we know that i − 1 ≤ log 2 (3Lc/ε + 4OPTc) + τ (ε, q, σ), where Lc and OPTc are the respective minimum distances from the original positionc of c to L and OPT in I. Hence using the definitions of τ (ε, q, σ) = log 2 (σ(q + 1) q /ε q+1 ) and ρ ≤ ε q+4+1/q 280σ(q+1) q we get If c = f we have dist (c, f ) = 0 = dist(c, f ), and so the above inequality holds trivially.
To bound dist (c, f ) q , we use the bound of Proposition 18, which can be stated as (x + y) q ≤ (1 + O(ε))x q + O(1/ε q )y q if q ≥ 1. Applying this twice to the bound on dist (c, f ) above, we get To bound cost I D (Ŝ) using this inequality we define Lṽ = OPTṽ = 0 for any non-client v of I D , i.e., whenever χ I D (v) = 0, so that applying the definition of χ I D we obtain The next lemma states the properties of the dynamic program that for any coarse instance I 0 computes an optimal rounded interface-respecting solution, which formally is a subset OPT of facilities that minimizes cost I 0 (OPT ) with |OPT | ≤ k for k-Clustering q , while for Facility Location q it minimizes cost I 0 (OPT ) + f ∈OPT w f . This step of the algorithm exploits the conciseness of the interface sets and the coarseness of the instance to bound the runtime. We prove the following lemma in Section 4.2.
Lemma 21. Let I 0 be an instance of k-Clustering q or Facility Location q that for some ε > 0 is coarse w.r.t. a hierarchical decomposition D with concise interface sets for some 1/2 ≥ ρ > 0 according to Lemma 5. An optimum rounded interface-respecting solution for I 0 can be computed in (nX/ε) (h/ρ) O(1) time.
We are now ready to put together the above lemmas to prove Theorem 2.
Proof Theorem 2. Given an instance of k-Clustering q or Facility Location q we first apply Lemma 17 to make the instance coarse. Lemma 17 also supplies a hierarchical decomposition D with the properties given in Lemma 5. We use this together with a constant approximation L of the coarse instance I to compute a new instance I D with small distortion via Lemma 11. On this instance we apply Lemma 21 to compute an optimum rounded interface-respecting solution OPT in (nX/ε) (h/ρ) O(1) time. Since the facility sets of I and I D are the same, we may output OPT for I, which can then be converted into a solution of the non-coarse input instance using Lemma 17 while only losing a (1 + O(ε))-factor in the objective function. Hence it suffices to show that OPT is a (1 + O(ε))-approximation to I.
, also in this case OPT is a (1+O(ε))-approximation to OPT.
Next we bound the runtime. According to Lemma 20 we need to set ρ ≤ ε q+4+1/q 280σ(q+1) q , while according to Lemma 5 the the scaling probability factor is σ = (h log(1/ρ)) O (1) . Note that the bound on ρ depends on σ and vice versa, which means that we need to be careful when determining a value for ρ respecting the bound from Lemma 20. In particular, substituting σ in the bound and rearranging we get ρ log O(1) (1/ρ) ≤ ε q+4+1/q 280(q+1) q h O(1) . Note that for any constant c and any value x > 0, setting ρ = ) ≤ x, where the last inequality holds if 2 log c (1/x) ≤ 1/x. Since there exists some constant positive upper bound on x for which the latter inequality is true, and ε q+4+1/q 280(q+1) q h O(1) tends to zero, we can find a value such that the inequality of Lemma 20 is fulfilled. The runtime of the dynamic program then is All other steps of the algorithm run in polynomial time, and so the claimed runtime follows.

The dynamic program (proof of Lemma 21)
We describe the algorithm for k-Clustering q , and only mention in the end how to modify the algorithm to compute a solution for Facility Location q .
The solution is computed by a dynamic program recursing on the decomposition D. Let W be the vertex set that D decomposes, and which contains all clients and facilities of the coarse instance I. Roughly speaking, the table of the dynamic program will have an entry for every part B ∈ B i of D on all levels i ≥ ξ(W ), for which it will estimate the distance from each interface point on all higher levels j ≥ i + 1 to the closest facility of the optimum solution. That is, ifB ∈ B j is a higher-level part for which B ⊆B, then the distances from all interface points IB to facilities of the solution inB will be estimated.
Here the estimation happens in two ways. First off, the distances to facilities outside of B have to be guessed. That is, there is an external distance function d + j that assigns a distance to each interface point of IB, anticipating the distance from such a point to the closest facility ofB, if this facility lies outside of B. In order to verify whether the guess was correct, each entry for a part B on level i also provides an internal distance function d − j , which stores the distance from each interface point of IB on level j ≥ i + 1 to the closest facility, if the facility is guessed to lie inside of B.
The other way in which distances are estimated concerns the preciseness with which they are stored. The distance functions d + j and d − j will only take rounded values x j where 0 < x ≤ 2 j+5 , or ∞ if no facility at the appropriate distance exists. In particular, if the facility of the solution inB that is closest to p ∈ IB lies outside of B then d − j (p) = ∞, and if it lies inside of B then d + j (p) = ∞. If there is no facility of the solution inB then both distance functions d + j and d − j are set to ∞ for all p ∈ IB. Note that this means that at least one of d + j (p) and d − j (p) is always set to ∞. Note also that the finite values in the domains of the distance functions admit to store the rounded distance to any facility inB on level j, since the diameter ofB is at most 2 j+4 by Lemma 13, and the distance from any p ∈ IB toB is at most (1 + 2ρ) diam(B) by Lemma 16, i.e., for any f ∈B ∩ F we have dist(p, f ) ≤ (1 + 2ρ)2 j+4 ≤ 2 j+5 using ρ ≤ 1/2.
Additionally, each entry comes with an integer k ∈ {0, . . . , k}, which is a guess on the number of facilities that the optimum solution contains in B.
In an entry T [B, k , j=i+1 ] we store the rounded interface-respecting cost of connecting the clients of B to facilities that adhere to the distance functions. More concretely, let S ⊆ F ∩ B be any subset of facilities in B. We say that S is compatible with an entry T [B, k , j=i+1 ] if |S| = k , and for any j ≥ i + 1 the values of the distance functions for every interface point p ∈ I j B are set to either • d − j (p) = dist(p, S) j and d + j (p) = ∞, or • d + j (p) ≤ dist(p, S) j and d − j (p) = ∞. Recall that dist(v, ∅) = ∞, and so the empty set S = ∅ is compatible with an entry j=i+1 ] if k = 0, and the values of all internal distance functions are set to ∞. Over all sets j=i+1 ] for B ∈ B i , the entry should store the minimum value of C B (S), which is defined as If there is no compatible set S ⊆ F ∩ B for the entry, then On the highest level i = λ(W ), there are no distance functions to adhere to on levels j ≥ i + 1, and thus any set S ⊆ W of facilities is compatible with the entry for B = W and k = |S|. Furthermore, cost I 0 (S) is equal to C W (S), since W contains all clients and facilities of the coarse instance I 0 . In particular, the entry of T for which k = k and B = W , will contain the objective function value of the optimum rounded interface-respecting solution to I 0 . Hence if we can compute the table T we can also output the optimum rounded interface-respecting solution via this entry.
Computing the table. We begin with a part B ∈ B ξ(W ) on the lowest considered level ξ(W ), for which we know that B contains at most one facility, as I 0 is coarse. If B contains no facility, then only S = ∅ can be compatible with the entry T [B, k , j=i+1 ] and computing the value of the entry is straightforward given the definition of C B (S), where all incompatible entries are set to ∞. If B contains one facility f , then any compatible set S is either empty or only contains f . We can thus check whether either of the two options is compatible with the entry T [B, k , j=i+1 ] by checking if k is set to 0 or 1, respectively, and checking that all values of the internal distance function are set correctly. Thereafter we can again use the definition of C B (S) to compute the values for both possible sets S and store them in the respective compatible entries. All incompatible entries are set to ∞.
Now fix a part B ∈ B i that lies on a level i > ξ(W ). We show how to compute all entries j=i+1 ] for all values k and distance functions. By induction we have already computed the correct values of all entries of T for parts B ∈ B i−1 where B ⊆ B. We order these parts arbitrarily, so that B 1 , . . . , B b are the parts of B i−1 contained in B. We then define an auxiliary tablê T that is similar to the table T , but should compute the best compatible facility set in the union B ≤ = h=1 B h of the first subparts of B. Accordingly,T has an entry for each union of parts B ≤ , each k ∈ {0, . . . , k}, and distance functions d Here, naturally, I i B = I B , i.e., the entry also takes the interface set of B into account.
Analogous to before, a set S ⊆ F ∩ B ≤ of facilities in the union is compatible with an entrŷ j=i ] if |S| = k , and for any j ≥ i the values of the distance functions for every interface point p ∈ I j B are set to either j=i ], the entry should store the minimum value ofĈ ≤ (S), which is defined aŝ j=i ] = ∞. To compute T using the auxiliary tableT , note that since j=i ] for some internal distance function d − i on level i. Furthermore, if d + i (p) = ∞ for all p ∈ I i , then C B (S) =Ĉ ≤b (S) for such a set S. Therefore we can easily compute the entry j=i+1 ] fromT by setting Computing the auxiliary table. Also computing an entry ofT for B ≤1 is easy using the entries of T for B 1 , since B 1 = B ≤1 and so (taking the index shift of i into account) we havê To compute entries ofT for some B ≤ where ≥ 2, we combine entries of table T for B with entries of tableT for B ≤ −1 . However we will only combine entries with distance functions that imply compatible solutions. More concretely, we say that distance functions j=i for B ≤ −1 are consistent if for every level j ≥ i and p ∈ I j B we have one of The algorithm now considers all sets of consistent distance functions to compute an entrŷ j=i are consistent (1) We now prove the correctness using two lemmas. The following lemma implies that if we only consider consistent distance functions to compute entries recursively, then the entries will store values for compatible solutions.
j=i for B ≤ −1 be consistent distance functions, and let S 1 = B ∩ F and S 2 = B ≤ −1 ∩ F be facility sets. If S 1 is compatible Proof. To prove compatibility of S with the entryT [B ≤ , |S|, j=i ], it suffices to show that the distance functions are set correctly. Fix a level j ≥ i and an interface point p ∈ I j B . There are three cases to consider, according to the definition of consistency of the distance functions. In the first case, all three internal distance functions are set to ∞, and all external distance functions are set to the same value. In particular, since S 1 and S 2 are compatible with their respective entries, In the second case, β − j (p) = δ + j (p) = ∞ and so β + j (p) ≤ dist(p, S 2 ) j , since S 2 is compatible with its entry, and δ − j (p) = dist(p, S 1 ) j , since S 1 is compatible with its entry. Since we also have β + j (p) = δ − j (p) we get dist(p, S 1 ) j ≤ dist(p, S 2 ) j , and hence dist(p, S) j = dist(p, S 1 ) j . Consistency furthermore implies d − j (p) = δ − j (p) = dist(p, S) j and d + j (p) = ∞. The third case is analogous to the second, and therefore S is compatible with its entry.
For the second part, we consider the contributions of vertices to the termsĈ ≤ (S), C B (S 1 ), andĈ ≤ −1 (S 2 ), and show that they are the same forĈ ≤ (S) and for C B (S 1 ) +Ĉ ≤ −1 (S 2 ). For this we first fix a vertex v ∈ B ≤ −1 , and in the following distinguish the cases where its contribution tô C ≤ −1 (S 2 ) andĈ ≤ (S) is due to a facility or an interface point.
The first case is that dist (v, S 2 ) ≤ min j≥i, p∈I j B {dist(v, p) + β + j (p)}, i.e., the contribution of v toĈ −1 (S 2 ) is given by a facility of S 2 . Note that the consistency of the distance functions always implies that β + j (p) = d + j (p) or d + j (p) = ∞ for any level j ≥ i and interface point p ∈ I j , and so min j≥i, p∈I j {dist(v, p)+d + j (p)}, i.e., the contribution of v toĈ (S) is also given by a facility of S in this case. Thus to show that the contribution of v toĈ −1 (S 2 ) andĈ (S) is the same, we need to show that dist (v, Thus the following proves the claim, using that the contribution of v toĈ (S) is given by a facility of S.
. The latter inequality implies that the value of dist (v, S) is obtained for some facility f ∈ S 1 ⊆ B . In particular, v ∈ B ≤ −1 and f ∈ B are cut at level i − 1, and so there is an interface point p ∈ I i B such that dist (v, S) = dist(v, p) + dist(p, f ) i , and f is the closest facility to p in S, i.e, dist(p, S) i = dist(p, f ) i . Using the former of the assumed , and so we can conclude that dist(p, f ) i < β + i (p). Using the inequality of the premise of the claim, we also get dist(v, p) In the latter case we would have d + i (p) ≤ dist(p, S) i = dist(p, f ) i < β + i (p), which however cannot happen if the distance functions are consistent. Thus compatibility of S implies d + i (p) = ∞ and d − i (p) = dist(p, f ) i . In particular, we can conclude that d − i (p) has a finite value (as f exists) and β + i (p) differs from d − i (p). This can only mean that the third of the consistency properties applies to p at level i, and so β − i (p) = d − i (p) = dist(p, f ) i . In particular, also β − i (p) has a finite value, and using the compatibility of S 2 with entrŷ j=i ], we can conclude that there exists a facility f ∈ which is a contradiction to dist (v, S) < dist (v, S 2 ).
The next case we consider is that min j≥i, p∈I j B {dist(v, p)+d + j (p)} < dist (v, S), i.e., the contribution of v toĈ (S) is given by an interface point. As observed before, we have min j≥i, p∈I j S 2 ), i.e. in this case the contribution of v toĈ −1 (S 2 ) is also given by an interface point. Note that it also implies dist (v, S) > min j≥i, p∈I j B {dist(v, p)+β + j (p)}, and thus the following claim shows that the contribution of v toĈ (S) andĈ −1 (S 2 ) is the same. {dist(v, p) +d + j (p)}. As observed before, the consistency of the distance functions always implies β + j (p) = d + j (p) or d + j (p) = ∞, and thus we must have min j≥i, p∈I j Let j ≥ i and p ∈ I j B be the level and interface point for which the minimum of the former term of this inequality is obtained. The inequality then implies β + j (p) < d + j (p) for this particular point p and level j, which can only be the case if β + j (p) < ∞ and d + j (p) = ∞. The values of β + j (p) and d + j (p) can only differ if the second of the consistency properties applies to p at level j, and so β + . Let j ≤ i be the level for which v ∈ B ≤ −1 and f ∈ B ≤ are cut at level j − 1 by D. By Lemma 19 we have dist (v, f ) ≤ dist(v, p)+ dist(p, f ) j , since j ≤ j and the part B ∈ B i containing v and f is itself contained in some partB ∈ B j with v, f ∈B and p ∈ IB. But then, However the last term is equal to min j≥i, p∈I j B {dist(v, p) + β + j (p)}, which gives a contradiction to our premise dist (v, S) > min j≥i, p∈I j So far we considered the case when the contribution of v toĈ −1 (S 2 ) is given by a facility, or when the contribution of v toĈ (S) is given by an interface point. Thus the last case we consider is when the contribution of v toĈ −1 (S 2 ) is given by an interface point, and the contribution of v toĈ (S) is given by a facility, i.e., min j≥i, p∈I j {dist(v, p) + d + j (p)}, which however contradicts our assumption to the contrary, i.e., that the contribution of v toĈ (S) is given by a facility. Hence we must instead have dist (v, S) ≤ min j≥i, p∈I j B {dist(v, p) + β + j (p)}. According to Claim 23, our assumption that dist (v, S) ≤ min j≥i, p∈I j In the former case, together with our assumption that the contribution of v toĈ −1 (S 2 ) is given by an interface point, we would get dist (v, S) > min j≥i, p∈I j B {dist(v, p) + β + j (p)}, for which we saw above that this leads to a contradiction via Claim 24. Hence we are left with the other implication of Claim 23, i.e., dist (v, S) ≥ min j≥i, p∈I j B {dist(v, p) + β + j (p)}. This together with our conclusion from above, i.e., dist (v, S) ≤ min j≥i, p∈I j B {dist(v, p) + β + j (p)}, means that the contribution of v toĈ (S) and C −1 (S 2 ) is the same.
By analogous arguments, the contribution of any v ∈ B to C B (S 1 ) is the same as its contribution toĈ ≤ (S). Since B and B ≤ −1 partition the set B ≤ , this means thatĈ ≤ (S) = C B (S 1 ) +Ĉ ≤ −1 (S 2 ), as required.
The next lemma implies that the compatible facility set minimizingĈ ≤ (S) is considered as a solution when recursing over consistent distance functions.
j=i ], and let S 1 = S ∩ B and S 2 = S ∩ B ≤ −1 . Then there exist distance functions (δ + j , δ − j ) j=i are consistent, and Proof. Consider any interface point p ∈ I j B on some level j ≥ i. Since S is compatible with and obtain the first case of the consistency properties. Observe that this also implies the compatibility property of S 1 and S 2 for p. Now assume that d + j (p) = ∞ and dist(p, gives the second case of the consistency properties. Again, this also implies the compatibility property of S 1 and S 2 for p. The remaining case when d + j (p) = ∞ and dist(p, S 1 ) j > dist(p, S 2 ) j is analogous. Here we may set δ + this gives the third case of the consistency properties, and also implies the compatibility property of S 1 and S 2 for p, which concludes the proof.
To argue that the algorithm sets the value ofT [B ≤ , k , j=i ] correctly via (1), consider a set S ⊆ B ≤ that is compatible with this entry and minimizesĈ ≤ (S). By induction, Lemma 25 j=i ] ≤ C B (S 1 ) +Ĉ ≤ −1 (S 2 ). By Lemma 22 only compatible sets are stored in an entry by induction, and j=i ] =Ĉ ≤ (S), as required.
Bounding the runtime. To bound the size of the tables T andT , note that since there are λ(W ) − ξ(W ) + 1 ≤ 2 log 2 (nX/ε) + 2 considered levels i, and each level B i of D is a partition of W where |W | ≤ n, there are at most O(n log(nX/ε)) parts B considered by T in total. The other tableT considers the same number of parts, since a set B ≤ can be uniquely mapped to the part B . The number of possible values for k is k + 1 = O(n). The domain { x j | 0 < x ≤ 2 j+5 } ∪ {∞} of a distance function for level j has at most 2 j+5 /(ρ2 j ) + 1 = O(1/ρ) values, since x j rounds a value to a multiple of ρ2 j . The conciseness of the interface sets means that |I j B | ≤ (h/ρ) O(1) according to Lemma 5. Hence there are at most O(1/ρ) (h/ρ) O(1) = 2 (h/ρ) O(1) possible distance functions. Since each entry of the table stores two distance functions for each of at most 2 log 2 (nX/ε) + 2 levels, the total number of entries of T andT is at most O(n log(nX/ε)) · n · (2 (h/ρ) O(1) ) O(log(nX/ε)) = (nX/ε) (h/ρ) O(1) .
The Facility Location q problem. To compute an optimum rounded interface-respecting solution to Facility Location q , the tables T andT can ignore the number of open facilities k , i.e., they have respective entries j=i ]. Accordingly, compatibility of facility sets with entries is defined as before, but ignoring the sizes of the sets. The value stored in each entry now also takes the opening costs of facilities into account. That is, for any set of facilities S ⊆ F ∩ B in a part B we define j=i+1 ] stores the minimum value of C B (S) over all sets S compatible with the entry, or ∞ if no such set exists. For S ⊆ F ∩ B ≤ in a union of subparts B ≤ we definê j=i ] stores the minimum value ofĈ ≤ (S) over all sets S compatible with the entry, or ∞ if no such set exists.
The entries of the tables can be computed in the same manner as before, but ingoring the sets sizes. In particular, the most involved recursion becomeŝ j=i are consistent .
Note that if S 1 = B ∩F and S 2 = B ≤ −1 ∩F then these two sets are disjoint, and so f ∈S w f = f ∈S 1 w f + f ∈S 2 w f for the union S = S 1 ∪S 2 . Hence when provingĈ ≤ (S) = C B (S 1 )+Ĉ ≤ −1 (S 2 ) for Lemma 22, we can ignore the facility opening costs, and the proof remains the same as before. All other arguments carry over, and thus an optimum rounded interface-respecting solution for an instance of Facility Location q can also be computed in (nX/ε) (h/ρ) O(1) time.

Hardness for graphs of highway dimension 1
For both k-Clustering q and Facility Location q we present the same reduction from the NPhard satisfiability problem (SAT), in which a boolean formula ϕ in conjunctive normal form is given, and a satisfying assignment of its variables needs to be found.
For a given SAT formula ϕ with k variables and clauses we construct a graph G ϕ as follows. For each variable x we introduce a path P x = (t x , u x , f x ) with two edges of length 1 each. The two endpoints t x and f x are facilities of F and the additional vertex u x is a client, i.e., χ(u x ) = 1.
For each clause C i , where i ∈ [ ], we introduce a vertex v i and add the edge v i t x for each variable x such that C i contains x as a positive literal, and we add the edge v i f x for each x for which C i contains x as a negative literal. Every edge incident to v i has length (11c) i for the constant c > 4 due to Definition 1, and v i is also a client, i.e., χ(v i ) = 1. In case of Facility Location q , every facility f ∈ F has cost w f = 1, i.e., we construct an instance of the uniform version of the problem.
Lemma 26. The constructed graph G ϕ has highway dimension 1.
Proof. Fix a scale r > 0 and let i = log 11c (r/5) + 1 . Note that β w (cr) cannot contain any edge incident to a vertex v j for j ≥ i + 1, since the length of every such edge is (11c) j ≥ 11cr/5 > 2cr and the diameter of β w (cr) is at most 2cr. Thus if β w (cr) contains a vertex v j for j ≥ i + 1, then β w (cr) contains only v j , and there is nothing to prove. Note also that any path in β w (cr) that does not use v i has length at most 2 + i−1 j=1 (2(11c) j + 2), since any such path can contain at most two edges incident to a vertex v j and the paths P x of length 2 are connected only through edges incident to vertices v j . The length of such a path is thus strictly shorter than where the first inequality holds since i ≥ 1 and c > 4. Hence the only paths that need to be hit by hubs on scale r are those passing through v i , which can clearly be done using only one hub, namely v i .
To finish the reduction for k-Clustering q , we claim that there is a satisfying assignment for ϕ if and only if there is a solution for G ϕ with cost at most k + i=1 (11c) iq . If there is a satisfying assignment for ϕ we open each facility t x for variables x that are set to true, and we open each facility f x for variables x that are set to false. This opens exactly k facilities and the cost of the solution is k+ i=1 (11c) iq , since each of the k vertices u x is assigned to either t x or f x at distance 1, and vertex v i is assigned to a vertex t x or f x at distance (11c) i that corresponds to a literal of C i that is true.
Conversely, assume there is a solution to k-Clustering q of cost at most k + i=1 (11c) iq in G ϕ . Note that the minimum distance from any u x to a facility is 1, while the minimum distance from any v i to a facility is (11c) i . Thus any solution must have cost at least k + i=1 (11c) iq , so that the assumed solution must open a facility at minimum distance for each client of G ϕ . In particular, for each variable x, at least one of the facilities t x and f x is opened by the solution. Moreover, as only k facilities can be opened and there are k variables, exactly one of t x and f x is opened for each x. Thus the k-Clustering q solution in G ϕ can be interpreted as an assignment for ϕ, where we set a variable x to true if t x is opened, and we set it to false if f x is opened. Since also for each v i the solution opens a facility at minimum distance, there must be a variable in C i that is set so that its literal in C i is true, i.e., the assignment satisfies ϕ. Thus due to the above lemma bounding the highway dimension of G ϕ , we obtain the Theorem 3 for k-Clustering q .
For Facility Location q we claim that there is a satisfying assignment for ϕ if and only if there is a solution for G ϕ of cost at most 2k + i=1 (11c) iq . In fact the arguments are exactly the same as for k-Clustering q above: if there is a satisfying assignment then a solution for Facility Location q of cost 2k + i=1 (11c) iq exists, by opening the k facilities corresponding to the assignment of cost 1 each. Conversely, any solution has cost at least k + i=1 (11c) iq due to the edge lengths, and at least k facilities need to be opened, one for each variable gadget. This gives a minimum cost of 2k + i=1 (11c) iq , and any such solution corresponds to a satisfying assignment of ϕ. This proves Theorem 3 for uniform Facility Location q .

A On definitions of Highway Dimension
We discuss here the different definitions of highway dimension that were successively proposed. In particular, in a follow-up paper to [3], Abraham et al. [1] define a version of the highway dimension, which implies that the graphs also have bounded doubling dimension. Hence for this definition, the algorithm of Cohen-Addad et al. [12] already is a very efficient approximation scheme in metrics of low highway dimension. Definition 1 on the other hand implies metrics of large doubling dimension as noted by Abraham et al. [3]: a star has highway dimension 1 (by using the center vertex to hit all paths), but its doubling dimension is unbounded. While it may be reasonable to assume that road networks have low doubling dimension (which are the main concern in the works of Abraham et al. [1,2,3]), there are metrics modeling transportation networks, for which it can be argued that the doubling dimension is large, while the highway dimension should be small, and thus rather adhere to Definition 1: in networks arising from public transportation, longer connections are serviced by larger and and sparser stations (such as train stations and airports). More concretely, the so-called hub-and-spoke networks that can typically be seen in air traffic networks is much closer to a star-like network and is unlikely to have bounded doubling dimension, while still having small highway dimension. Thus in these examples it is reasonable to assume that the doubling dimension is a lot larger than the highway dimension.