Oracle complexity classes and local measurements on physical Hamiltonians

The canonical problem for the class Quantum Merlin-Arthur (QMA) is that of estimating ground state energies of local Hamiltonians. Perhaps surprisingly, [Ambainis, CCC 2014] showed that the related, but arguably more natural, problem of simulating local measurements on ground states of local Hamiltonians (APX-SIM) is likely harder than QMA. Indeed, [Ambainis, CCC 2014] showed that APX-SIM is P^QMA[log]-complete, for P^QMA[log] the class of languages decidable by a P machine making a logarithmic number of adaptive queries to a QMA oracle. In this work, we show that APX-SIM is P^QMA[log]-complete even when restricted to more physical Hamiltonians, obtaining as intermediate steps a variety of related complexity-theoretic results. We first give a sequence of results which together yield P^QMA[log]-hardness for APX-SIM on well-motivated Hamiltonians: (1) We show that for NP, StoqMA, and QMA oracles, a logarithmic number of adaptive queries is equivalent to polynomially many parallel queries. These equalities simplify the proofs of our subsequent results. (2) Next, we show that the hardness of APX-SIM is preserved under Hamiltonian simulations (a la [Cubitt, Montanaro, Piddock, 2017]). As a byproduct, we obtain a full complexity classification of APX-SIM, showing it is complete for P, P^||NP, P^||StoqMA, or P^||QMA depending on the Hamiltonians employed. (3) Leveraging the above, we show that APX-SIM is P^QMA[log]-complete for any family of Hamiltonians which can efficiently simulate spatially sparse Hamiltonians, including physically motivated models such as the 2D Heisenberg model. Our second focus considers 1D systems: We show that APX-SIM remains P^QMA[log]-complete even for local Hamiltonians on a 1D line of 8-dimensional qudits. This uses a number of ideas from above, along with replacing the"query Hamiltonian"of [Ambainis, CCC 2014] with a new"sifter"construction.


Introduction
The study of the low-energy states of quantum many-body systems is of fundamental physical interest. Of central focus has been the problem of estimating the ground state energy of a k-local Hamiltonian, unlikely). Just how much more difficult than QMA is P QMA [log] ? Intuitively, the answer is "slightly more difficult". Formally, QMA ⊆ P QMA [log] ⊆ PP [GY18] (where QMA ⊆ A 0 PP ⊆ PP was known [KW00,Vya03,MW05] prior to [GY18]; note the latter containment is strict unless the Polynomial-Time Hierarchy collapses [Vya03]).
From a computer science perspective, there is an interesting relationship between APX-SIM and classical constraint satisfaction problems (CSPs). The QMA-complete problem k-LH is a quantum analogue of the NP-complete problem MAX-k-SAT, in that the energy of a state is minimized by simultaneously satisfying as many of the k-local terms as possible. Classically, one might be asked whether the solution to a MAX-k-SAT instance satisfies some easily verifiable property, such as whether the solution has even Hamming weight; such a problem is P NP[log] -complete (see, e.g., [Wag88] for a survey). APX-SIM is a quantum analogue to these problems, in which we ask whether an optimal solution (the ground state) satisfies some property (expectation bounds for a specified measurement), and APX-SIM is analogously P QMA[log] -complete.
High level direction in this work. That APX-SIM is such a natural problem arguably demands that we study its hardness given natural settings. In this regard, the original P QMA[log] -completeness result [Amb14] was for simulating O(log n)-local observables and O(log n)-local Hamiltonians, where n is the number of qubits the Hamiltonian acts on. From a physical perspective, one wishes to reduce the necessary complexity, such as to O(1)-local observables and Hamiltonians. Hardness under this restriction was subsequently achieved [GY18], for single-qubit observables and 5-local Hamiltonians, by combining the "query Hamiltonian" construction of Ambainis [Amb14] with the circuit-to-Hamiltonian construction of Kitaev [KSV02]. Even arbitrary O(1)-local Hamiltonians, however, may be considered rather artificial in contrast to naturally occurring systems. Ideally, one wishes to make statements along the lines of "simulating measurements on a physical model such as the quantum Heisenberg model on a 2D lattice is harder than QMA", or "simulating measurements on a 1D local Hamiltonian is harder than QMA". This is what we achieve in the current paper. Interestingly, to attain this goal, we first take a complexity theoretic turn into the world of parallel versus adaptive oracle queries.

Parallel versus adaptive queries
A natural question for oracle complexity classes is how the power of the class changes as access to the oracle is varied. In the early 1990's, it was shown [BH91,Hem89,Bei91] that a polynomial number of parallel or non-adaptive queries to an NP oracle are equivalent in power to a logarithmic number of adaptive queries. Formally, letting P ||NP be the class of languages decidable by a P machine with access to polynomially many parallel queries to an NP oracle, it holds that P ||NP = P NP [log] .
The direction P C[log] ⊆ P ||C was in fact shown by [Bei91] for all classes C. Briefly, a P machine making a logarithmic number of adaptive queries to a C oracle has the potential to make only polynomially many different queries, each of which can be computed beforehand in polynomial time by simulating the machine's action given each possible sequence of query answers. The values for all such queries can simply be queried in parallel by the P ||C machine. To show the reverse direction, that P ||NP ⊆ P NP [log] , one first performs binary search to determine the total number of YES queries. Then, ask whether there exists at least that number of (provably) YES queries such that setting the corresponding query answers to YES causes the original P machine to accept.
We begin by considering an analogue of this question for P QMA[log] versus P ||QMA (defined as P ||NP but with a QMA oracle). The direction P QMA[log] ⊆ P ||QMA proceeds as described above, but, in contrast, the classical technique for showing the reverse direction does not appear to carry over to the quantum setting, specifically to the setting of promise problems. As explored in [GY18], oracles corresponding to classes of promise problems like QMA may receive queries which violate their promise (such as an instance of k-LH with the ground state energy within the promise gap). By definition [Gol06], in such cases the oracle can respond arbitrarily, even changing its answer given repeated queries. Because of the possibility of invalid queries by the P ||QMA machine, the technique of binary search fails. To show P ||QMA ⊆ P QMA [log] , we take a different approach by instead showing a hardness result. Specifically, we use a modification of the P QMA[log] -hardness construction of [Amb14], for which we require the locality improvements by [GY18], to show that APX-SIM is P ||QMA -hard. Combining with the known fact that APX-SIM ∈ P QMA[log] [Amb14] then yields the desired containment.
This approach includes two benefits: • First, the use of parallel, rather than adaptive, queries simplifies the "query Hamiltonian" construction of [Amb14] significantly, which we later exploit to prove hardness results about physical Hamiltonians (Theorem 1.6) and 1D Hamiltonians (Theorem 1.10). This can also give a simpler proof of Ambainis's original claim that APX-SIM is P QMA[log] -hard. Indeed, we generalize this idea to give the statement: and APX-SIM is P C[log] -complete when restricted to k-local Hamiltonians and observables from F.
(The reason for the form of the expression H cl + m i=1 |1 1| i ⊗ H i in Theorem 1.2 will become clear as we introduce the Hamiltonian constructions we use. In short, the expression suffices to encode our construction while still belonging to several interesting families F.) Applying that k-LH is NP-complete, StoqMA-complete, and QMA-complete when restricted to the families of classical, stoquastic, and arbitrary k-local Hamiltonians, respectively [CM16], Theorem 1.2 yields: Corollary 1.3. P NP[log] = P ||NP , P StoqMA[log] = P ||StoqMA , and P QMA[log] = P ||QMA .
• Second, we base our reduction on the Cook-Levin theorem [Coo72,Lev73], as opposed to Kitaev's circuit-to-Hamiltonian construction [KSV02] as in [GY18]. This allows us to obtain a constant promise gap for the observable 2 A's threshold values (i.e. b − a ≥ Ω(1), as opposed to b − a ≥ 1/poly), even when A = O(1). Further, because the core of this construction is already spatially sparse, it additionally eases proving hardness results about physical Hamiltonians (Theorem 1.6). 2 The constant gap is only for the input thresholds a, b for the expectation value of the observable A. The required "low-energy gap" defined by the parameter δ continues to potentially scale as inverse polynomial, i.e. δ ≥ 1/poly, and we note that the spectral gap of the Hamiltonian H may be arbitrarily small in our constructions unless otherwise noted. Because the improved gap corresponds only to the observable, it is unclear how to apply this result to resolve questions concerning Hamiltonians with improved promise gaps, e.g. the Quantum PCP Conjecture. (As a general note, it is worth stressing here that the Quantum PCP conjecture deals with constant promise gaps, not constant spectral gaps of the Hamiltonian.)

The complexity of APX-SIM for physically motivated Hamiltonians
With the simplifications that moving to parallel queries affords us (i.e. working with P ||QMA versus P QMA [log] ), we proceed to study P ||QMA -hardness for physically motivated Hamiltonians. This requires a shift of focus to simulations, in the sense of [CMP18], i.e. analog Hamiltonian simulations.
Recall that Kitaev originally proved QMA-hardness of k-LH for 5-local Hamiltonians [KSV02]; this was brought down to 2-local Hamiltonians via perturbation theory techniques [KR03,KKR06]. Since then, there has been a large body of work (e.g. [OT08,BDL11,CM16,BH17,PM17,PM18]) showing complexity theoretic hardness results for ever simpler systems, much of which uses perturbative gadgets to construct Hamiltonians which have approximately the same ground state energy as a Hamiltonian of an apparently more complicated form. Here, we wish to enable a similarly large number of results for the problem APX-SIM by using the same perturbative gadget constructions and ideas of analogue simulation.
In [CMP18], the authors define a strong notion of simulation which approximately preserves almost all the important properties of a Hamiltonian, including the properties important for the problem k-LH, and they observe that the perturbative gadget constructions used in the k-LH problem literature are examples of this definition of simulation. They go on to show that there exist simple families of Hamiltonians (such as the 2-qubit Heisenberg interaction) which are universal Hamiltonians, in the sense that they can simulate all O(1)-local Hamiltonians efficiently.
How do simulations affect the complexity of APX-SIM? Ideally, we would like to show that efficient simulations lead to reductions between classes of Hamiltonians for the problem APX-SIM. However, this is apparently difficult, as the definition of APX-SIM is not robust to small perturbations in the eigenvalues of the system. We instead consider a closely related, seemingly easier problem which we call ∀-APX-SIM.
Definition 1.4 (∀-APX-SIM(H, A, k, ℓ, a, b, δ)). Given a k-local Hamiltonian H, an ℓ-local observable A, and real numbers a, b, and δ such that satisfy b − a ≥ n −c and δ ≥ n −c ′ , for n the number of qubits H acts on and c, c ′ > 0 some constants, decide: • If for all |ψ satisfying ψ| H |ψ ≤ λ(H) + δ, it holds that ψ| A |ψ ≤ a, then output YES.
Above, we have a stronger promise in the YES case than in APX-SIM: namely, all low-energy states |ψ are promised to satisfy ψ| A |ψ ≤ a, as opposed to just a single ground state. Thus, ∀-APX-SIM is easier than APX-SIM, in that ∀-APX-SIM reduces to APX-SIM. (The reduction is trivial, in that a valid instance of ∀-APX-SIM is already a valid instance of APX-SIM, with no need for modification.) We conclude that ∀-APX-SIM is contained in P QMA [log] . Furthermore, the proof of Theorem 1.2 is actually sufficient to show that ∀-APX-SIM is P ||C -complete (when restricted to the corresponding family of Hamiltonians for arbitrary class C).
Our second result, Lemma 4.2 in Section 4, is to prove that efficient simulations correspond to reductions between instances of ∀-APX-SIM. As a byproduct, we combine this result with Theorem 1. 2   Via Theorem 1.6, we now obtain many corollaries via the long line of research using perturbative gadgets to prove QMA-completeness of restricted Hamiltonians; for brevity, here we list a select few such corollaries. We note that the locality of the observable input to APX-SIM may increase after simulation, but only by a constant factor which can be easily calculated based on the simulation used. For example, using the perturbative gadgets constructed in [PM17], the following is an immediate corollary of Theorem 1.6: Corollary 1.7. The problem APX-SIM is P QMA[log] -complete even when the observable A is 4-local and the Hamiltonian H is restricted to be of the form: E is the set of edges of a 2D square lattice, a (j,k) ∈ R, and at least two of α, β, γ are non-zero. The case α = β = γ corresponds to XX + Y Y + ZZ, which is known as the Heisenberg interaction.
But, there is not always a blow-up in the locality of A, as is shown by this corollary which follows from Theorem 1.6 and [SV09]: Corollary 1.8. The problem APX-SIM is P QMA[log] -complete even when the observable A is 1-local and the Hamiltonian H is restricted to be of the form: E is the set of edges of a 2D square lattice, and B j is a single qubit operator (that may depend on j).
Finally, we remark that recent work on the simulation power of families of qudit Hamiltonians [PM18] can be used to show the following corollary: Corollary 1.9. Let |ψ be an entangled two qudit state. Then, the problem APX-SIM is P QMA[log] -complete even when the Hamiltonian H is restricted to be of the form where α j,k ∈ R and |ψ ψ| j,k denotes the projector onto |ψ on qudits j and k.
Each of these corollaries follows as the corresponding references show that the described families of Hamiltonians can efficiently simulate all spatially sparse Hamiltonians.

The complexity of APX-SIM on the line
We finally move to our last result, which characterizes the complexity of APX-SIM on the line. Historically, it was known that the NP-complete problem MAX-2-SAT on a line is efficiently solvable via dynamic programming or divide-and-conquer (even for large, but constant, dimension). It hence came as a surprise when [AGIK09] showed that 2-LH on a line is still QMA-complete. This result was for local dimension 13 ([AGIK09] actually claimed a result for 12-dimensional qudits; [HNN13] later identified an error in their construction, and gave a fix requiring an addition dimension). [Nag08] improved this to hardness for 12dimensional qudits by leveraging the parity of the position of qudits (similarly, [Nag08] claimed a result for 11-dimensional particles, but suffered from the same error as [AGIK09]). Most recently, [HNN13] showed QMA-completeness for qudits of dimension 8 by allowing some of the clock transitions to be ambiguous (a similar idea was used in [KKR06] to show QMA-completeness of 2-LH). The complexity of k-LH on a 1D line remains open for local dimension 2 ≤ d ≤ 7.
Returning to the setting of APX-SIM, it is clear that the classical analogue of APX-SIM on a 1D line of bits is also in P; given any 2-local Boolean formula φ : { 0, 1 } n → { 0, 1 }, we simply compute an optimal solution x to φ (which recall can be done in 1D as referenced above), and subsequently evaluate any desired efficiently computable function on x (i.e. a "measurement" on a subset of the bits). This raises the question: is APX-SIM on a line still P QMA[log] -complete? Or does its complexity in the 1D setting drop to, say, QMA? Our final result shows the former.
Theorem 1.10. APX-SIM is P QMA[log] -complete even when restricted to Hamiltonians on a 1D line of 8-dimensional qudits and single-qudit observables.
Thus, even in severely restricted geometries like the 1D line, simulating a measurement on a single qudit of the ground space remains harder than QMA.
Proof techniques for Theorem 1.10. We employ a combination of new and known ideas. We wish to simulate the idea from [GY18] that instead of having the P machine make m queries to a QMA oracle, it receives the answers to the queries as a "proof" y ∈ { 0, 1 } m which it accesses whenever it needs a particular query answer. In [GY18], Ambainis's query Hamiltonian [Amb14] was then used to ensure y was correctly initialized. However, it is not clear how to use Ambainis' query Hamiltonian (or variants of it) while maintaining a 1D layout. We hence take a different approach.
Instead of receiving the query answers, the P machine now has access to m QMA verifiers corresponding to the m queries, and for each of them receives a quantum proof |ψ i in some proof register R i . The P machine then treats the (probabilistic) outputs of each V i as the "correct" answer to the query i. If a query i is a NO instance of a QMA problem, this works well -no proof can cause V i to accept with high probability. However, if query i is a YES instance, a cheating prover may nevertheless submit a "bad" proof to verifier V i , since flipping the output bit of V i may cause the P machine to flip its final output bit. To prevent this, and thus ensure the P machine receives all correct answers with high probability, we use a delicate application of 1-local energy penalties, which we call "sifters", to the outputs of the V i ; just enough to penalize bad proofs for YES cases, but not enough to cause genuine NO cases to incur large energy penalties. Here, we again utilize our result that P QMA[log] = P ||QMA (Corollary 1.3), and choose to begin with a P ||QMA instance; this allows us to apply identical, independent sifters to the output of each verifier V i , significantly easing the subsequent analysis and transition to 1D. We next plug this construction, where the P circuit has many sub-circuits V i , into the 1D 8-dimensional circuit-to-Hamiltonian construction of [HNN13]. Similarly to [GY18], we apply a corollary of the Projection Lemma of [KKR06,GY18] (Corollary 6.2) to argue that any low energy state must be close to a history state |ψ . Combining with our sifter Hamiltonian terms, we show in Lemma 6.4 that for |ψ to remain in the low-energy space, it must encode V i outputting approximately the right query answer for any query i. To then conclude that all query responses are jointly correct with high probability, and thus that the low-energy space encodes the correct final output to the P ||QMA computation, we apply a known quantum non-commutative generalization of the union bound. In fact, our argument immediately shows hardness for both APX-SIM and ∀-APX-SIM. The full proof is given in Sections 6.1, 6.2, and 6.2.1.

Open questions and organization
Our results bring previous P QMA[log] -hardness results for a remarkably natural problem, Approximate Simulation (APX-SIM), closer to the types of problems studied in the physics literature, where typically observables are O(1)-local, allowed interactions physically motivated, and the geometry of the interaction graph is constrained. There are many questions which remain open, of which we list a few here: (1) The coupling strengths for local Hamiltonian terms in Corollary 1.7,1.8,1.9 are typically non-constant, as these corollaries follow from the use of existing perturbation theory gadgets; can these coupling constants be made O(1)? Note this question is also open for the complexity classification of k-LH itself [CM16,PM17]. (2) What is the complexity of P QMA[log] ? It is known that P QMA[log] ⊆ PP [GY18]; can a tighter characterization be obtained? (3) Can similar hardness results for APX-SIM be shown for translationally invariant 1D systems? For reference, it is known that k-LH is QMA exp -complete for 1D translationally invariant systems when the local dimension is roughly 40 [GI13,BCO17]. (QMA exp is roughly the quantum analogue of NEXP, in which the proof and verification circuit are exponentially large in the input size. The use of this class is necessary in [GI13,BCO17], as the only input parameter for 1D translationally invariant systems is the length of the chain.) If a similar hardness result holds for APX-SIM, presumably it would show P QMA exp [log] -hardness for 1D translationally invariant systems. Organization. We introduce notation and definitions in Section 2. We prove that APX-SIM is contained in P C [log] in Section 3.1 for classes C and corresponding restrictions, and that ∀-APX-SIM is P ||C -hard in Section 3.2, thereby proving Theorem 1.2. In Section 4, we introduce a special case of the definition of simulation from [CMP18] and show that simulations correspond to reductions of the problem ∀-APX-SIM, yielding Theorem 1.5; proofs with regard to the general definition are in Appendix A. In Section 5, we give a spatially sparse construction with which ∀-APX-SIM is P ||QMA -hard, thus proving Theorem 1.6. Finally, in Section 6, we study hardness on a 1D line and prove Theorem 1.10.

Preliminaries
Notation. Let λ(H) denote the smallest eigenvalue of Hermitian operator H. For a matrix A, A ∞ := max{ A |v 2 : |v 2 = 1} is the operator norm or spectral norm of A, and A tr := Tr √ A † A the trace norm. Throughout this paper, we will assume generally that both H = m i=1 H i and observable A = m i=1 A i are local Hamiltonians whose local terms H i and A i act non-trivially on at most O(log n) out of n qubits. We also assume m, H i ∞ , A i ∞ ∈ O(poly n) for all i ∈ { 1, . . . , m }, for n the number of qubits in the system. For a subspace S, S ⊥ denotes the orthogonal complement of S. We denote the restriction of an operator H to subspace S as H| S . The null space of H is denoted Null(H).
Definitions. P QMA [log] , defined in [Amb14], is the set of decision problems decidable by a polynomial-time deterministic Turing machine with the ability to query an oracle for a QMA-complete problem O(log n) times, where n is the size of the input. For a class C of languages or promise problems, the class P C[log] is similarly defined, except with an oracle for a C-complete problem. P ||C is the class of problems decidable by a polynomial-time deterministic Turing machine given access to an oracle for a C-complete problem, with the restriction that all (up to O(n c ) for c ∈ Θ(1)) queries to the oracle must be made in one time step, i.e. in parallel. Such queries are labeled non-adaptive, as opposed to the adaptive queries allowed to a P C[log] machine. We note that P ||NP has in the past been denoted ≤ p tt (NP), in reference to polynomial-time truth-table reductions (e.g. [BH91]).
In this article, for P QMA[log] we assume oracle queries made by the P machine are to an oracle for the QMA-complete [KSV02] k-local Hamiltonian problem (k-LH), defined as follows: Given a k-local Hamiltonian H and inverse polynomial-separated thresholds a, b ∈ R, decide whether λ(H) ≤ a (YESinstance) or λ(H) ≥ b (NO-instance) [KKR06]. We shall say an oracle query is valid (invalid) if it satisfies (violates) the promise gap of the QMA-complete problem the oracle answers. (An invalid query hence satisfies λ(H) ∈ (a, b).) For any invalid query, the oracle can accept or reject arbitrarily. A correct query string y ∈ { 0, 1 } m encodes a sequence of correct answers to all of the m queries made by the P machine, and an incorrect query string is one which contains at least one incorrect query answer. Note that for an invalid query, any answer is considered "correct", yielding the possible existence of multiple correct query strings. Nevertheless, the P machine is required to output the same final answer (accept or reject) regardless of how such invalid queries are answered [Gol06]. The above definitions extend analogously when the class QMA is replaced with another class C, with a designated C-complete problem Π C playing the role of k-LH. (In this paper, the complexity classes C we consider have complete problems.)

Parallel versus adaptive queries
We begin by showing Theorem 1.2, i.e. that P C[log] = P ||C for appropriate complexity classes C. Section 3.1 shows containment of the corresponding APX-SIM problem in P C[log] (and thus in P ||C ). Section 3.2 then shows P ||C -hardness (and thus P C[log] -hardness) of APX-SIM. Theorem 1.2 is restated and proven in Section 3.3.

Containment in P C[log]
We begin by modifying the containment proof of [Amb14] to show containment of APX-SIM in classes P C[log] for C beyond just C = QMA.
Lemma 3.1. Let H be a k-local Hamiltonian acting on n qudits, and let A be an observable on the same system of n qudits. If k-LH for αH +βA is contained in complexity class C for any 0 ≤ α, β ≤ poly(n) and for all k ≥ 1, then APX-SIM(H, A, k, ℓ, a, b, δ) ∈ P C[log] for all ℓ ≤ O(log n) and b − a, δ ≥ O(1/poly n).
Proof. We need to show the existence of a poly(n) time classical algorithm to decide APX-SIM while making at most O(log n) queries to an oracle for C. As with the proof in [Amb14], the idea is to use O(log n) oracle queries to determine the ground space energy λ(H) of H by binary search, and then use one final query to determine the answer. In [Amb14] the final query is a QMA query; here we show how this final query can be performed differently so that only an oracle for C is required.
First calculate a lower bound µ for λ(A), the lowest eigenvalue of A. If A acts only on O(1) qudits, then λ(A) can be calculated via brute force (up to, say, inverse exponential additive error) in O(1) time. If A acts on many qudits, then λ(A) can alternatively be approximated to within inverse polynomial additive error by binary search (as in [Amb14]) by querying the C oracle O(log A ) = O(log n) times. Note that without loss of generality, we may assume 0 ≤ b − µ ≤ q(n) for some efficiently computable polynomial q. The lower bound holds since if b < µ ≤ λ(A), we conclude our APX-SIM instance is a NO instance, and we reject. For the upper bound, it holds that µ ≤ A ∞ , and we may assume b ≤ A ∞ , as otherwise our APX-SIM instance is either a YES or invalid instance, and in both cases we can accept. By assumption, A ∞ ≤ q(n) for appropriate polynomial q which can be computed efficiently by applying the triangle inequality to the local terms of A; note A ∞ may hence be replaced by q in the bounds above.
Perform binary search with the oracle for C (an example of how to perform binary search with an oracle for a promise problem is given in [Amb14] . This requires O(log 1/ǫ) = O(log n) queries to the oracle for C. Next perform one final query to the C oracle to solve k-LH with Hamiltonian H ′ with thresholds a ′ and b ′ , where and accept if and only if this final query accepts. Observe this is an allowed query for the C oracle because H ′ is of the form required in the statement of the lemma (recall b − µ ≥ 0), and also and the algorithm accepts as required. Now suppose the input is a NO instance. We will show that ψ| H ′ |ψ ≥ b ′ for any |ψ and so the algorithm rejects as required. First, if |ψ is low-energy with ψ| H |ψ ≤ λ(H) + δ, then it also satisfies ψ| A |ψ ≥ b, and so An additional application of Lemma 3.1 is that it allows us to prove that the APX-SIM problem is easy for certain families of Hamiltonians for which k-LH is known to be easy. For example, the work on ferromagnetic Hamiltonians in [BG17] implies the following corollary: Corollary 3.2. Consider the family of Hamiltonians F of the form:

Then, APX-SIM for Hamiltonians and observables chosen from
Proof. In [BG17], it was shown that for Hamiltonians in F, there exists a FPRAS (fully polynomial randomized approximation scheme) to calculate the partition function of H up to multiplicative error. In particular, it is noted that this gives a corresponding approximation to the ground state energy with additive error. Therefore, there exists a randomized algorithm that runs in polynomial-time and which, with high probability, gives an approximation to the ground state energy of H up to inverse-polynomial additive error. This algorithm shows containment of k-LH restricted to the family F in BPP.
We now wish to consider APX-SIM by applying Lemma 3.1, but first need to check that H ′ = αH +βA is in the family for all α, β ≥ 0. It is clear that H ′ can be written in the form of Equation (1), but not whether it satisfies the required bounds on its coefficients. Following the notation of Equation (1), for an operator F in the family F, let c i,j (F ) be the coefficient of Y i Y j and let b i,j (F ) be the coefficient of −X i X j . Then, a simple application of the triangle inequality shows that Therefore, by Lemma 3.1, APX-SIM is contained in P BPP [log] for H and A in F. Finally, we note that P BPP[log] = BPP, since clearly BPP ⊆ P BPP [log] , and P BPP[log] ⊆ P BPP ⊆ BPP BPP = BPP since BPP is low for itself.

Hardness for P ||C
We next modify the proof that APX-SIM is P QMA[log] -hard to obtain the following lemma. Our modifications include simplifying the "query Hamiltonian" of [Amb14] and improving the construction of [GY18] by using the Cook-Levin theorem, as opposed to Kitaev's circuit-to-Hamiltonian construction. The latter has a nice consequence -in contrast to the QMA-completeness results for k-LH, where the promise gap is inverse polynomial, for APX-SIM we are able to show that the promise gap b − a sufficient for P ||Ccompleteness scales as Ω(1). To show this, we require two tools: in the next two subsections, we show how to simplify [Amb14]'s query Hamiltonian in the context of parallel queries, used to enforce correct query answers, and discuss how to employ the Cook-Levin reduction, which enforces a correct simulation of the circuit given those query answers, respectively.

Simplifying Ambainis' query Hamiltonian
First, we give a simplified version of the "query Hamiltonian" introduced by Ambainis [Amb14], which will be useful in the following lemmas. We note that [GY18] reduced the locality of the construction of [Amb14] by applying the unary encoding trick of Kitaev [KSV02], but due to the simplified structure of parallel queries, here we do not require this unary encoding to achieve O(1)-locality for our Hamiltonian. However, [GY18] also reduced the locality of the observable from O(log n)-local to a single qubit, by deferring the job of simulating the circuit away from the observable and to the Hamiltonian from [KSV02], and this improvement is now crucial, as otherwise a polynomial number of queries would demand an O(poly n)local observable.
Given some P ||C computation U for an appropriate class C, let (H Y i , a i , b i ) be the instance of (without loss of generality) 2-LH corresponding to the i-th query made by U . Then, our "query Hamiltonian" is where single qubit register X i is intended to encode the answer to query i and Y i encodes the ground state of H Y i . Since each query is 2-local, H is 3-local. Notably, because U makes all of its queries in parallel, we are able to weight each of the m terms equally, unlike in [Amb14,GY18] which studied adaptive queries. This significantly eases our later analysis.
The following lemma is analogous to Lemma 3.1 of [GY18], but with an improved spectral gap. The proof is similar to theirs, but is significantly simplified due to our use of parallel queries.
Then, there exists a correct query string x ∈ { 0, 1 } m such that the ground state of H lies in H x 1 ···xm . Moreover, if λ is the minimum eigenvalue of H restricted to this space, then for any incorrect query string y 1 · · · y m , any state in Proof. We proceed by contradiction. Let x ∈ { 0, 1 } m (y ∈ { 0, 1 } m ) denote a correct (incorrect) query string which has lowest energy among all correct (incorrect) query strings against H. (Note that x and y are well-defined, though they may not be unique; in this latter case, any such x and y will suffice for our proof.) For any z ∈ { 0, 1 } m , define λ z as the smallest eigenvalue in H z .
Since y is an incorrect query string, there exists at least one i ∈ {1, . . . , m} such that y i is the wrong answer to a valid query H Y i . If query i is a YES-instance, the smallest eigenvalue of M i corresponds to setting X i to (the correct query answer) |1 , and is at most a i . On the other hand, the space with X i set to |0 has all eigenvalues equaling (a i + b i )/2. A similar argument shows that in the NO-case, the |0 -space has eigenvalues equaling (a i + b i )/2, and the |1 -space has eigenvalues at least b i . We conclude that flipping query bit i to the correct query answer y i allows us to "save" an energy penalty of (b i − a i )/2 against M i , and since all other terms act invariantly on Let y ′ denote y with bit i flipped. If y ′ is also an incorrect query string, we have λ y ′ < λ y , a contradiction due to the minimality of y. Conversely, if y ′ is a correct query string, then we must have λ y ′ ≥ λ x + (b i − a i )/2 ≥ λ + ǫ, as otherwise we contradict the minimality of x. We now show how to model the Cook-Levin construction as a Hamiltonian in our setting. For this, we consider the P machine to be given as a circuit of classical reversible gates U = U m . . . U 1 , in which one gate occurs at each time step. The evolution of the circuit is encoded into a 2D grid of qubits, where the t-th row of qubits corresponds to the state of the system at time step t; the output of the circuit is copied to a dedicated output bit in the final timestep. The overall Hamiltonian is diagonal in the computational basis with a groundspace of states corresponding to the correct evolution of the P machine.

Cook-Levin construction
Let I t be the set of qubits which U t acts non-trivially on. If a qubit i / ∈ I t (i.e. it is not acted on by the circuit at time step t), then there is an interaction |01 01| + |10 10| on qubits (i, t) and (i, t + 1), to penalize states which encode a change on qubit i. To encode a classical reversible gate U t : x → U t (x) acting at time t, we define an interaction h Ut = I − x |x x| t ⊗ |U t (x) U t (x)| t+1 acting non-trivially only on qubits (i, t ′ ) for i ∈ I t and t ′ equal to t or t + 1. See Figure 1 for a pictorial representation of this Hamiltonian. Then is positive semi-definite and has ground space spanned by states of the form: Typically, there is an additional term H in consisting of 1-local |1 1| terms on all qubits in the first (t = 1) row. Then the Hamiltonian H prop + H in has (1) unique ground state |w(0 n ) encoding the action of the circuit on the 0 n string, (2) ground state energy 0, and (3) spectral gap at least 1, since the Hamiltonian is a sum of projectors. We will later show how we adapt H in to our query answer register.

Proof of hardness
We are almost ready to prove the main result of this section, Lemma 3.3. Before doing so, we require a final technical lemma.
Lemma 3.5. Let H be a Hamiltonian and ρ a density matrix satisfying Tr(Hρ) ≤ λ(H) + δ. Let P be the projector onto the space of eigenvectors of H with energy less than λ(H) + δ ′ . Then, Proof. First, bound the trace distance by the fidelity in the usual way (using one of the Fuchs-van de Graf inequalities [FvdG99]): where the third equality follows since ( √ ρP √ ρ) 2 = √ ρP ρP √ ρ and since the latter is positive semidefinite. Now, it remains to bound Tr(P ρ). We note that H has eigenvalues at least λ(H) + δ ′ on the space annihilated by P and eigenvalues at least λ(H) everywhere else, and so H (λ(H)+δ ′ )(I−P )+λ(H)P = (λ(H) + δ ′ )I − δ ′ P . Therefore, using the bound on Tr(Hρ), we have Substituting this back into Equation (4) proves the result.
We are now ready to prove Lemma 3.3: Proof of Lemma 3.3. We split the Hilbert space into three parts W,  (2), and therefore by Lemma 3.4 the space of eigenvectors of H 2 with eigenvalues less than λ(H 2 ) + ǫ is spanned by states of the form: |x X ⊗ |φ Y , where x is a correct string of answers for the queries to the C oracle.
H 1 = H prop + H in is the classical Hamiltonian encoding the evolution of a classical P circuit, using the Cook-Levin construction of Section 3.2.2, where H prop is as defined in Equation (3). For clarity, H prop and H in act on W and W ⊗ X , respectively. We think of W as "laid out in a 2D grid" as in Figure 1, and of X as playing the role of a "message" register passing information between H 1 and H 2 . We modify the Hamiltonian H in which initializes the qubits at the start of the classical circuit. For each qubit X i in X , we initialize a corresponding qubit of the first (t = 0) row of W into the same state with a penalty term All other qubits in the first (t = 0) row of W are initialized to |0 with a penalty |1 1|. The full construction is depicted diagrammatically in Figure 2. Note that as stated in the claim, H is of the form We can argue about the low-energy eigenspace of H as follows. Since the ground spaces of H 1 and Figure 2: The structure of the Hamiltonian H = H 1 + H 2 used in Lemma 3.3, for the case of 3 queries. H 1 acts on the space W ⊗ X and overlap only on the X register, on which they are both diagonal in the standard basis), and since we may assume without loss of generality that λ(H 2 ) + ǫ is inverse polynomially bounded below 1 (otherwise, we can scale H 1 by an appropriate fixed polynomial), we conclude the space of eigenstates of H with eigenvalue less than λ(H) + ǫ, henceforth denoted H low , is spanned by states of the form |Φ = |w W ⊗ |x X ⊗ |φ Y , where x is a string of correct answers to the oracle queries and w is the classical string encoding the correct computation of the P circuit acting on x. The qubit corresponding to the output bit of the P circuit will be in the state |1 (resp. |0 ) in a YES (resp. NO) instance of ∀-APX-SIM.
To complete the proof let the observable A = Z out , a Pauli Z measurement on the qubit corresponding to the output bit of the P circuit, and let δ = ǫ/16 and δ ′ = ǫ. Consider any state |ψ with ψ| H |ψ ≤ λ(H) + δ. Then by Lemma 3.5, there exists a state |ψ ′ ∈ H low such that which implies by Hölder's inequality that ψ| A |ψ is ≤ −1/2 in a YES instance and ≥ 1/2 in a NO instance, as required.  Proof. The containment P C[log] ⊆ P ||C follows directly from the same argument that P NP[log] ⊆ P ||NP of [Bei91], which we summarized in Section 1.1. By Lemma 3.1, APX-SIM is contained in P C[log] for Hamiltonians and observables from F. And by Lemma 3.3 ∀-APX-SIM is P ||C -hard for Hamiltonians from F, even when the observable is a single Pauli Z measurement, which is contained in F by the assumption that F contains any classical Hamiltonian H cl . Since ∀-APX-SIM trivially reduces to APX-SIM, we thus have that APX-SIM is similarly P ||C -hard, and the result follows.

Simulations and APX-SIM for physical classes of Hamiltonians
In order to study the complexity of APX-SIM for physically motivated Hamiltonians in Section 5, we require two tools: first, hardness results for parallel query classes P ||C , given in Section 3, and second, an understanding of how simulations affect the hardness of the problem APX-SIM, which this section focuses on. Specifically, we consider a simplified notion of simulation, defined below, which is a special case of the full definition given in [CMP18]. This simpler case includes all of the important details necessary for the general case. For full proofs with regard to the general definition of simulation, see Appendix A.
We say that a family F ′ of Hamiltonians can simulate a family F of Hamiltonians if, for any H ∈ F and any η, ǫ > 0, and ∆ ≥ ∆ 0 for some ∆ 0 > 0, there exists H ′ ∈ F ′ such that H ′ is a (∆, η, ǫ)-simulation of H. We say that the simulation is efficient if, for H acting on n qudits, H ′ = poly(n, 1/η, 1/ǫ, ∆); H ′ and { V i } are computable in polynomial-time given H, ∆, η and ǫ and provided that ∆, 1/η, 1/ǫ are O(poly n); and each isometry V i maps from at most one qudit to O(1) qudits.
We remark that unlike in [CMP18], here we have the additional requirement that the local isometry V is efficiently computable. This ensures that given some input Hamiltonian H and local observable A, we can use the notion of simulation to efficiently produce a simulating Hamiltonian H ′ and a simulating observable A ′ (see proof of Lemma 4.2 below). As far as we are aware, all known constructions satisfying the notion of efficient simulation from [CMP18] fulfill this additional requirement (see proof of Theorem 1.5 for examples). Note that eigenvalues are preserved up to a small additive factor ǫ in a simulation, but that the YES instance in the definition of APX-SIM is not robust to such perturbations of eigenvalues when the spectral gap is very small. We therefore do not expect to show directly that hardness of APX-SIM is preserved by simulations, and instead we work with the problem ∀-APX-SIM.  Here, we provide a proof only for the special case where the simulation is of the form given in Definition 4.1; for a full proof of the general case, see Appendix A.
Let us leave ∆, η, ǫ arbitrary for now, and assume we have a simulation of the form given in Definition 4.1. Then, there exists an isometry V : H → H ′ (H and H ′ are the spaces H and H ′ act on, respectively) which maps onto the space of eigenvectors of H ′ with eigenvalues less than ∆, i.e. onto S ≤∆ := Span{|ψ : Let |ψ ′ be a low-energy state of H ′ satisfying ψ ′ | H ′ |ψ ′ ≤ λ(H ′ ) + δ ′ for δ ′ to be set later. First, we show that |ψ ′ is close to a state V |ψ where |ψ is a low-energy state of H; then, we will show that there exists an observable A ′ , depending only on A and the isometries V i , such that ψ ′ | A ′ |ψ ′ approximates ψ| A |ψ for any choice of |ψ . Since by Definition 4.1 A is efficiently computable, our choice of A ′ will be as well.
For any local measurement A S acting on subset of S qubits H S (here H S is the Hilbert space for qudits in set S ⊆ [n]), we can define the local measurement where to get to (11), we have used the triangle inequality to bound: Therefore, to ensure that Π ′ is a YES (resp. NO) instance if Π is a YES (resp. NO) instance, we will completes the proof.
As a corollary of our results, we obtain Theorem 1.5, which gives a complete classification of the complexity of APX-SIM when restricted to families of Hamiltonians and measurements built up from a set of interactions S. We restate it here for convenience: Theorem 1.5. Let S be an arbitrary fixed subset of Hermitian matrices on at most 2 qubits. Then the APX-SIM problem, restricted to Hamiltonians H and measurements A given as a linear combination of terms from S and the identity I, is 1. in P, if every matrix in S is 1-local;

P NP[log] -complete, if S does not satisfy the previous condition and there exists U ∈ SU (2) such that
U diagonalizes all 1-qubit matrices in S and U ⊗2 diagonalizes all 2-qubit matrices in S; Proof. We first discuss containment in the claimed complexity classes, and then hardness.

P StoqMA[log] -complete, if S does not satisfy the previous condition and there exists U ∈ SU (2) such that, for each 2-qubit matrix
Containment. In the first case it is trivial to simulate the outcome of 1-local measurements on the ground state of a 1-local Hamiltonian, as the ground state is an easily calculated product state. For the other three cases, it was shown in [CM16] and [BH17], that k-LH for these three families of Hamiltonians is complete for the classes NP, StoqMA, QMA, respectively. Therefore, by Lemma 3. To achieve this, we first apply Lemma 3.3 to conclude that ∀-APX-SIM is hard for classes P ||NP , P ||StoqMA and P ||QMA for the families of classical, stoquastic and arbitrary local Hamiltonians, respectively. (In contrast to the Hamiltonians of cases 2-4 of our claim here, the sets of classical, stoquastic and arbitrary local Hamiltonians do contain all diagonal Hamiltonians, and thus satisfy the preconditions of Lemma 3.3.) We then use simulations, in combination with Lemma 4.2, to reduce the sets of classical, stoquastic, and arbitrary local Hamiltonians to the Hamiltonians in cases 2,3,4 of our claim here, respectively.
Specifically, it was shown in [CMP18] that the three families of Hamiltonians in cases 2-4 of our claim can efficiently simulate all classical, stoquastic and arbitrary local Hamiltonians, respectively, via some local isometry V (see Definition 4.1). It follows by Lemma 4.2 (which states that simulations act like hardness reductions) that ∀-APX-SIM is hard for P ||NP , P ||StoqMA and P ||QMA respectively, with respect to (using the notation of Lemma 4.2) a local observable A ′ (in the larger, simulating, space) such that A ′ = V AV † (where in our case A will equal Pauli Z due to the proof of Lemma 3.3). The only obstacle to achieving our current claim is that we also require A ′ to be chosen as a linear combination of terms from S and I. This is what the remainder of the proof shall show.
Observation (*). To begin, note the proof of Lemma 3.3 used single qubit observable Z, since we encoded the P machine's output in a single bit, which we assumed was set to |0 for "reject" and |1 for "accept". However, without loss of generality, we may alter the starting P machine to encode its output in some more general function on two bits, such as the parity function. (For example, the P machine can be assumed to output a 2-bit string q, such that q has odd parity if and only if the P machine wishes to accept.) We use this observation as follows. Consider any classical observable A with two distinct eigenvalues λ x < λ y corresponding to eigenstates |x and |y , respectively, for distinct strings x, y ∈ { 0, 1 } 2 . Then, assuming the specification of A is independent of the number of qubits in the system (thus, A is specified to within constant bits of precision, and so λ y − λ x ∈ Θ(1)), if we set the P machine to output x when it wishes to accept and y when it wishes to reject, a measurement with observable A suffices to distinguish these two cases. With this observation in hand, we consider cases 2-4 of our claim, in particular with respect to the action of isometry V .
Case 2: P ||NP -completeness. First note that in this case we can assume without loss of generality that all interactions in S are diagonal (by performing a global basis change of U ⊗n if necessary) . Since we are not in the first case we know also that there is a 2-local interaction in S with at least two distinct eigenvalues. By Observation (*), it will suffice to simulate such an observable on a particular pair of qubits in the original system; call this operator A. For the P NP[log] case, the isometry V appends some ancilla qubits in a computational basis state (in the U ⊗n basis) [DlCC16]. We can therefore choose A ′ to be the same 2-local observable A, but acting on the corresponding qubits in the larger, simulating system; that is, if we let A ′ = A ⊗ I (where the identity term acts on the ancilla qubits), then V † A ′ V = A as desired.
Case 3: P ||StoqMA -completeness. For the third case, one can check that the reductions in [BH17] correspond to a simulation with an isometry V which maps each qubit |0 → |0011 and |1 → |1100 and appends some additional ancilla qubits in a computational basis state (see discussion in Section 9.4 of [CMP18]). Thus, a classical 2-local observable Z ⊗ Z + diag(A) ⊗ I + I ⊗ diag(B) (which we may use by Observation (*)) can be simulated in the larger, simulating space on physical qubits 1, 2, 3, 4 (logical qubit 1) and 5, 6, 7, 8 (logical qubit 2) via: where diag(A) denotes the diagonal part of A, i.e. diag(A) = 1 i=0 |i i|A|i i|. Thus, measuring observable (Z 1 Z 5 + A 1 + B 5 ) on the larger, simulating Hamiltonian H ′ (which has the desired form of Case 3 here) is equivalent to measuring Z ⊗ Z + diag(A) ⊗ I + I ⊗ diag(B) on the starting Hamiltonian H in the simulation (again, using notation of Lemma 4.2).
Case 4: P ||QMA -completeness. The final case is slightly more complicated. When showing that these Hamiltonians are universal, the one step with a non-trivial isometry is simulating {X, Z, XX, ZZ}-Hamiltonians with {XX + Y Y }-Hamiltonians or {XX + Y Y + ZZ}-Hamiltonians in Theorem 41 of [CMP18]. In both of these cases, the isometry V maps each qubit via action In the proof of Theorem 41 of [CMP18], it is shown that a single Z observable can be reproduced by The proof is completed by Corollary 1.3 (i.e. logarithmic adaptive queries are equivalent to polynomially many parallel queries).

Spatially sparse construction
We now combine the tools developed in the previous sections to study the complexity of APX-SIM for physical Hamiltonians. Our approach is to show that ∀-APX-SIM is P ||QMA -hard even for Hamiltonians on a spatially sparse interaction graph, defined below:

Lemma 5.2. ∀-APX-SIM is P ||QMA -hard even when b − a = Ω(1), the observable A is 1-local (singlequbit), and the Hamiltonian H is 4-local and is restricted to a spatially sparse interaction graph.
Here, we adapt the proof of Lemma 3.3. Recall that the Hamiltonian H in Lemma 3.3 is composed of two parts H = H 1 + H 2 , where H 2 uses (a simplification of) Ambainis's query Hamiltonian on each of the registers X i ⊗ Y i to encode the answer to that query into the state of X i (see Equation (2)), and H 1 encodes the evolution of the P circuit using the Cook-Levin construction on the W register (controlling on the states of the X i registers). This is represented by Figure 2.
We arrange the qubits of the W register on a square lattice and note that H 1 is already manifestly spatially sparse. This is one of the advantages of using the Cook-Levin construction over the Kitaev history state construction. Furthermore, the Hamiltonian H Y i , corresponding to the i-th QMA query, can be chosen to be spatially sparse -in fact it can be chosen to have its interactions on the edges of a 2D square lattice [OT08], and so we also lay out the qubits of each Y i register on a square lattice.
But the interaction graph of this Hamiltonian is still far from spatially sparse because in (the modified version of) Ambainis's query Hamiltonian H 2 , every qubit of Y i interacts with X i . We will solve this problem by replacing each single qubit X i register with a multi-qubit register of n i qubits labeled by {X i (j)} n i j=1 , for n i the number of qubits of Y i . We spread out the qubits of the X i register in space around the Y i register, and modify H 2 so that each term is controlled only on a nearby qubit in the X i register. To make this work we need to introduce a third term H 3 which ensures that all the qubits in each X i register are either all |0 or all |1 . Proof of Lemma 5.2. We will construct a Hamiltonian on the registers W, X i and Y i for i ∈ {1, . . . m}, for which the problem ∀-APX-SIM encodes the output of a P ||QMA circuit, where m is the number of parallel queries to the QMA oracle.
Let the qubits of W and Y i be arranged on distinct parts of a square lattice. For each qubit of Y i , there is a corresponding qubit in X i , and X i contains a path of qubits leading from Y i to W. See Figure 3 for an example layout in the case m = 3.
Let E i be the set of edges of the square lattice of qubits of Y i (i.e. not including the edges connecting Y i to X i in Figure 3) and let k) be a 2D nearest neighbor Hamiltonian on Y i corresponding to the i-th query. We have used the subscript notation Y i (j, k) to denote the action of an operator on the j-th and k-th qubits of the Y i register. H Y i has ground state energy less than a i if query i is a YES instance and energy greater than b i in a NO instance. Then, let where g(j, k) is the location of the "nearest" qubit in X i to edge (j, k) in Y i . Here, the choice "nearest" is somewhat arbitrary; for concreteness, one can set g(j, k) = j, i.e. pick the vertex in X i which aligns with the first coordinate of the edge (j, k). (In this sense, Figure 3 is not entirely accurate, since it depicts the 3-local constraint |1 1| X i (g(j,k)) ⊗ h i Y i (j,k) as a pair of 2-local constraints. This is done solely for the purpose of simplifying the illustration, as otherwise one would need to draw hyperedges of size 3.) Let H 1 = H prop + H in be the Cook-Levin Hamiltonian where H prop is exactly as in Lemma 3.3. Let H in initialize the qubits of the first (t = 1) row of the qubits in W. For each query i, we have a penalty term |1 1| X i (1) |0 0| + |0 0| X i (1) |1 1| which effectively copies the state of X i (1), the qubit in X i nearest to W, onto the i-th qubit of the first row of W. For all the remaining qubits in the first (t = 1) row of W, we have a penalty term |1 1|, effectively initializing the qubit into the |0 state.
Restricted to the subspace H where each X i register is either all |0 or all |1 , H 1 + H 2 is exactly the same Hamiltonian as in Lemma 3.3. It remains to give a high energy penalty to all other states not in this subspace. We do this with 3 acts on X i : where G i is the set of edges between the qubits of the X i register. G i consists of edges between nearest neighbors on the square lattice E i and on the path of qubits from Y i to W. The overall Hamiltonian H = H 1 + H 2 + H 3 is therefore spatially sparse. H (i) 3 is a classical Hamiltonian, so all of its eigenstates can be taken to be of form |x for some x ∈ {0, 1} n i . Its ground space G i contains |0 ⊗n i and |1 ⊗n i ; and all states in G ⊥ i have energy at least ∆ i .
ensures that all states in G ⊥ i have energy greater than λ(H) + δ. Then H = H 1 + H 2 + H 3 is block diagonal with respect to the split of each subspace G i ⊕ G ⊥ i ; restricted to the spaces G i , H is exactly the Hamiltonian from Lemma 3.3, and all states in spaces G ⊥ i have energy greater than λ(H) + δ. The result then follows just as in the proof of Lemma 3.3.
Finally we restate Theorem 1.6 which shows APX-SIM is hard not only for families of Hamiltonians which are universal -that is, families that can efficiently simulate any k-local Hamiltonian -but also for more restricted families of Hamiltonians which can only efficiently simulate the family of spatially sparse Hamiltonians. As stated in Section 1.2, this then yields the desired hardness results for APX-SIM on physical Hamiltonians such as the Heisenberg interaction on a 2D lattice (see, e.g., Corollary 1.7).
Theorem 1.6. Let F be a family of Hamiltonians which can efficiently simulate any spatially sparse Hamiltonian. Then, APX-SIM is P QMA[log] -complete even when restricted to a single-qubit observable and a Hamiltonian from the family F.

Simulating measurements on a 1D line
In this section, we show that APX-SIM remains P QMA[log] -complete even on a line. Below, we reproduce the statement of the main theorem of this section for convenience.
Theorem 1.10. APX-SIM is P QMA[log] -complete even when restricted to Hamiltonians on a 1D line of 8-dimensional qudits and single-qudit observables.
We prove Theorem 1.10 in three sections. We first describe our construction in Section 6.1. We then show correctness of the construction in Section 6.2, with the proofs of various lemmas deferred to Section 6.2.1.

Our 1D hardness construction
We give a reduction from P ||QMA to ∀-APX-SIM, which by Theorem 1.2 and the fact that ∀-APX-SIM trivially reduces to APX-SIM yields P QMA[log] -hardness of APX-SIM. Let Π be a P ||QMA computation which takes in an input of size n and which consists of a uniformly generated polynomial-size classical circuit C making m = O(log n) 2-LH queries π i := (H i , a i , b i ) to a QMA oracle. As in Section 3.2, we treat the "answer register" in which C receives answers to its m queries as a proof register.
Our high-level approach consists of three steps: (1) construct a "master" circuit V composed of the verification circuits V i corresponding to each query π i and of the circuit C; (2) run V through the 1D circuitto-Hamiltonian construction of [HNN13] to obtain a 1D Hamiltonian G with local dimension 8 constructed such that the low-energy space S of G must consist of history states (of the form described in [HNN13]); and (3) carefully add additional 1-local penalty terms acting on the output qubits corresponding to each verification circuit V i to obtain final Hamiltonian H such that the low-energy space must encode satisfying proofs to each V i whenever possible. This final step of "fine-grained splitting" of S forces the output qubits of the circuits V i to encode correct answers to query π i , and thus the final circuit C receives a correct proof, hence leading the history states of step (2) to encode a correct simulation of Π. The answer to the computation Π can then be read off the ground state of H via an appropriate single qudit measurement.
1. Construction of V . Suppose each query π i has corresponding QMA verification circuit V i . Without loss of generality, we may henceforth assume that the completeness/soundness error of V i is at most p ≤ 2 −n , for p to be set later, by standard error reduction [AN02,MW05]; thus, if a particular query , then either there exists a proof such that V i outputs YES with probability at least 1 − p or no proof causes V i to output YES with probability greater than p. Next, since Π is a P ||QMA computation, all queries and corresponding V i can be precomputed in polynomial-time. We view the "master circuit" V as consisting of two phases: 1. (Verification phase) Given supposed proofs for each query, V runs all verification circuits V i in parallel, where V i acts on space Y i ⊗ W i ⊗ X i , for proof register Y i , ancilla register W i , and single-qubit output register X i .
2. (Simulated classical phase) The simulated P circuit C now receives the query answers X := X 1 ⊗ · · · ⊗ X m as its proof register as well as an ancilla register W 0 . It outputs a single qubit to an output register X 0 .
This completes the construction of V , which acts on Crucially, note that given a set of proofs in register Y, V does not necessarily yield the same answer as Π, since a malicious prover could intentionally send a "bad" proof to a YES query, flipping the final answer of V .

Construction of G.
We now plug V into the circuit-to-Hamiltonian construction of Hallgren, Nagaj, and Narayanaswami [HNN13] to obtain a nearest-neighbor 1D Hamiltonian G ′ = ∆ in H in + ∆ prop H prop + ∆ pen H pen + H out , where ∆ in , ∆ prop , and ∆ pen are at most polynomials in n which we will set as needed; we review this construction more closely below. Set G = G ′ − H out , since in our setting the task of "checking the output" will be delegated to the observable A. Note that as an intermediate step, [HNN13] maps V to a circuit V ′ which it then maps to G ′ ; we describe the role of V ′ in the following review. Our construction will make two trivial assumptions about the behavior of V ′ , including how it arranges its query answers between the verification phase and the simulated classical phase and how it stores its output in the final timestep; we defer details about these assumptions until we define our "fine-grained splitting" in step 3 and when we define our observable.
Review of 1D QMA construction [HNN13]. Suppose an arbitrary circuit U acts on n qubits. Begin by arbitrarily arranging these qubits along a line. The circuit U is then "linearized", meaning it is mapped to a new circuit U ′ which consists of R rounds in which each round applies a sequence of n − 1 two-qubit gates acting on nearest neighbors. The i-th gate in a round acts on qubits (i, i + 1). This "linearization" is achieved in polynomial time by inserting swap and identity gates as needed, and U ′ is at most polynomially larger than U . To reduce U ′ to an instance of k-LH, we wish to design a mapping similar to Kitaev's circuit-to-Hamiltonian construction for showing QMA-hardness of 5-LH on general geometry [KSV02]. In both settings, the goal is to design an H which enforces a structure on any state in its low-energy space. In the construction of [KSV02], H = H in + H prop + H stab + H out , and the minimizing state of H has the form of a history state: Intuitively, H stab forces a structure on the clock register C of basis states |0 , |1 , . . . , such that each will correspond to a timestep of U . Then, H in ensures the ancilla register W is set to the all |0 state when |t = |0 . The term H prop ensures that the workspaces entangled with timesteps |t and |t + 1 are related by the 2-qubit gate U t+1 . Together, these terms ensure that a minimizing state |ψ hist encodes a correct simulation of the circuit U , and that all low-energy states are close to |ψ hist . In fact, a valid |ψ hist lies in the nullspace of H in + H prop + H stab . Finally, H out penalizes the low-energy space if the output qubit has overlap with |0 . Now in the 1D setting, the goal remains the same: design H such that the structure of its low-energy state is a superposition over a sequence of states corresponding to timesteps in the computation of U ′ . But, we now appear unable to entangle the workspace with a separate clock register using nearest neighbor interactions. Instead, the constructions of [AGIK09, HNN13] employ qudits of higher dimension as a means to label the qubits, with each labeling encoding a particular timestep. [HNN13] then doubles the number of qudits in order to lower the necessary number of labels. The construction of [HNN13] thus maps U ′ to a Hamiltonian H = H in + H prop + H out + H pen acting on 2nR qudits of dimension 8, where the qudits are arranged on a 1D line in R blocks of 2n qudits (i.e. one block per round in U ′ ).
Let us further describe the idea of labeling, or "marking", of qudits. For example, a qubit α |0 + β |1 may be encoded as α |A + β |B if that qubit is ready for a gate to be applied or as α |C + β |D if that round's gate has already been applied, where |A , |B , |C , |D are some basis states. The possible configurations, or arrangements, of labels along the line form a set of orthogonal spaces. [HNN13] thus introduces a Hamiltonian term H pen which enforces a set of "legal configurations" of the workspace, penalizing all other configurations. We then map each of the configurations which remain in the low-energy space of H to timesteps in the computation of U ′ , effectively assigning the job of encoding the workspace in a particular timestep to a particular configuration of qudits. We note that the crucial feature of the set of legal configurations developed by [HNN13] is that they are sufficiently identifiable solely by 2-local nearest neighbor checks 3 such that penalties can be correctly assigned when constructing 1D analogs of the terms H in , H prop , H out . Similar to the general geometry case of [KSV02], the construction of [HNN13] enforces that the nullspace of H in + H prop + H pen consists of history states such that |ψ hist is a superposition over states in each legal configuration, |ψ 0 encodes a properly initialized workspace, and each pair |ψ t and |ψ t+1 are related according to the corresponding timestep of U ′ . Finally, again similar to the general geometry case, all low-energy states must be close to |ψ hist (we make these two claims explicit and give proofs in Lemma 6.3).
The full description of the labeling, the legal configurations, and their mapping to timesteps by [HNN13] is rather involved. Here, we introduce sufficient details for our later analysis. We begin with a single block of 2n qudits, where recall each block is used to encode a single round (taken from [HNN13]): Recall the design of U ′ began by arranging the qubits of U arbitrarily on the line; the i-th qubit on that line corresponds to qudits 2i − 1 and 2i in (16). Thus, each qubit of U ′ , henceforth denoted a logical qubit, is encoded into two consecutive qudits. Each pair of qudits representing a logical qubit is depicted as separated by a for clarity. The standard basis for each 8-dimensional qudit is labeled by where, as described earlier, the current state of a qudit can be used to encode a logical qubit and to label the qudit. The first four states should be thought of as 1-dimensional labels; they are used to ensure the correct propagation of the circuit and do not encode a logical qubit. The final four states are used to either label a qudit with ◮ , in which case a logical qubit is encoded as a superposition of | ◮ 0 and | ◮ 1 , or with , in which case a logical qubit is encoded as a superposition of | 0 and | 1 . To make this example more concrete, a product state of (α |0 + β |1 ) ⊗n on n logical qubits could be encoded as Next, here is an example depicting multiple blocks (from Table 2 of [HNN13]): where the blocks are delineated by . The labels × to the left depict "dead" qudits, while the labels to the right depict "unborn" qudits. By construction, all logical qubits are encoded in a block between the dead and unborn labels. In this example, the logical qubits line up with the beginning of a new block, beginning with ◮ and ending with the first . At a high level, the set of legal configurations is mapped to a sequence of timesteps as follows. The first timestep corresponds to a configuration similar to (16), with n logical qubits encoded in the leftmost block of 2n qudits, with no × labels anywhere, and with the "gate" label ◮ on the first qudit. The second configuration has the ◮ label shifted to the right, on the second qudit. Next, the third configuration has the second qudit labeled and the third qudit labeled ◮ . This propagation of the ◮ label rightwards continues, with each step corresponding to another legal configuration, until it reaches the end of the block. As the ◮ passes between logical qubits (i, i + 1), the corresponding configurations map to timesteps i and i + 1 of round 1, and H prop enforces that configurations are related by the application of gate U ′ i . Thus, when we reach a configuration with ◮ at the end of the block, i.e. ◮ , all gates in the current round will have been applied. Next, before encoding the next round of gates, our goal becomes to shift all of the logical qubits encoded in the current block rightwards 2n spots into the second block. To do this, the ◮ label becomes a special label and moves to the left one spot at a time until it reaches the end of the logical qubits (here, the leftwards ). As the label moves left, it shifts each logical qubit to the right one spot, i.e. | → | . This process repeats, with a label propagating rightwards to the end of the logical qubits (now past the rightwards ), then the label propagating to the left, shifting logical qubits to the right, and so on, until the logical qubits have shifted entirely into the second block. Then, the gate label ◮ once again transitions down the line, with successive configurations encoding the second round of gates of U ′ . Throughout this sequence, labels to the right are consumed, while all qudits to the left are labeled × . This procedure continues until the entire circuit has been simulated.
Lastly, we observe that the final timestep of U ′ is encoded by [HNN13] in the following configuration: 3. Adding 1-local "sifters". We now add 1-local Hamiltonian terms which serve to "sift" through bad proofs, or more accurately to split the ground space of G, so as to force low-energy states to encode correct query answers. As previously described, even a correct simulation of the circuit V may not output the correct answer for instance Π if a malicious prover supplies incorrect proofs to the query registers Y i ; in particular, a prover might send a proof which accepts with low probability even though π i is a YES-instance. Intuitively, we wish to penalize states encoding a proof |ψ i which leads verifier V i to reject with high probability when there exists a proof |φ i such that V i would have accepted with high probability (here, query π i is a YES instance). For answer register X i , we add a "sifter" penalty term ǫ |0 0| X i , for ǫ some inverse polynomial to be set later. These terms are similar to the H out term from other Hamiltonian constructions; but, here we are not only concerned about the ground space but also about the low-energy space. As in other constructions, we must penalize NO answers enough to ensure the ground space encodes YES answers when possible. But, given a correct NO answer, the penalty must be small enough that the energy is gapped lower than any state which encodes an incorrect YES, such as those which by encode an invalid computation leading to YES. However, because the encoding enforced by G shifts the block of logical qubits rightwards along the line as the computation progresses, the location of a particular logical qubit's encoding depends on the current timestep. Thus, in order to properly act on logical qubit X i , we must be careful to specify the configuration which the penalty term acts on.
We may assume that once V ′ finishes simulating all of the circuits V i , it arranges each of the outputs in the first m logical qubits on the line, finishing by the end of some round r * −1, such that the i-th logical qubit on the line is the qubit which V stored in X i . (The value of r * can be determined during the construction of V ′ .) We may also assume that V ′ then "pauses" by applying only identity gates in round r * . This round is encoded in block r * , and since each block is comprised of 2n qudits, the answers to queries 1 to m are thus simultaneously stored in qudits q i := (2n)(r * − 1) + (2i − 1).
The m sifter terms are given by where the subscript denotes the qudit which the term acts on and ǫ is to be set later. Note that there is a unique legal configuration in which any given qudit is labeled ◮ , so H out,i will apply to at most one state |ψ t in the history state of Equation (15). Finally, we define H out = m i=1 H out,i .
The final Hamiltonian. Our final Hamiltonian is H := G+H out = ∆ in H in +∆ prop H prop +∆ pen H pen + H out , with ∆ in , ∆ prop , ∆ pen polynomials to be set later.
The observable. Recall the configuration from (19), which corresponds to the final timestep in the computation of a circuit passed to the construction of [HNN13]. Note that this is the unique timestep in which the final qudit is labeled ◮ . We assume, without loss of generality, that V ′ places its final output in the rightmost logical qubit on the line. Thus, we choose single-qudit observable A = | ◮ 0 ◮ 0 | 2nR , where the subscript denotes that A acts on the rightmost qudit on the line, where R is the number of rounds in V ′ .
Setting parameters. Let L denote the number of legal configurations which the history state in (15) is summed over, which is at most polynomial in n. We have that H is k-local and A is ℓ-local for k := 2 and ℓ := 1. Set ǫ = 1/(8m), where recall m is the (polynomial) number of queries. Then, set p, the completeness/soundness error of each V i , to some inverse-exponential in n such that p < ǫ for all n. Set a = 1/(4L) and b = 3/(4L). We will set δ to a sufficiently small fixed inverse polynomial in n in the proof of Lemma 6.4, which will then set ∆ in , ∆ prop , ∆ pen to sufficiently large fixed polynomials in n via the proof of Lemma 6.3. This concludes our deterministic polynomial-time mapping of the input P ||QMA computation Π to the 1D instanceΠ := (H, A, k, ℓ, a, b, δ) of ∀-APX-SIM.

Correctness
We now prove Theorem 1.10 by showing correctness of our construction from Section 6.1. A number of lemmas required in the proof are deferred to Section 6.2.1 to ease the exposition; in particular, we require Lemma 6.3, which explicitly proves two facts about the low-energy space of the construction of [HNN13], Lemma 6.4, which shows that a history state in our construction must simultaneously encode nearly correct answers for all valid queries π i , and Lemma 6.5, which states a Commutative Quantum Union Bound.
Proof of Theorem 1.10. Containment in P QMA [log] was already shown for up to O(log n)-local H by [Amb14], with no restriction on the geometry. Our goal is now to show P ||QMA -hardness, which by Theorem 1.2 yields P QMA[log] -hardness. We show hardness for the problem ∀-APX-SIM, which recall from Section 1.2 trivially reduces to APX-SIM, thus yielding hardness for APX-SIM. Let Π be a P ||QMA computation and map it to the ∀-APX-SIM instanceΠ = (H, A, k, l, a, b, δ) as described in Section 6.1. The proof proceeds in two parts: We first show that low energy states must necessarily encode correct query answers, and subsequently apply this to show correctness in YES and NO cases for Π.
Low energy states approximately encode correct query answers. Recall that H = G + H out . Let δ, γ denote arbitrary inverse polynomials in n which will be set later in Lemma 6.4. Consider any state |ψ such that ψ| H |ψ ≤ λ(H) + δ. Since H out 0, ψ| G |ψ ≤ λ(H) + δ as well. By Lemma 6.3, for sufficiently large fixed polynomials ∆ in , ∆ prop , ∆ pen , two statements thus hold: First, the nullspace S of Hamiltonian G = ∆ in H in + ∆ prop H prop + ∆ pen H pen is the span of all correctly encoded history states, as defined in Equation (15); Second, there exists a correctly encoded history state |ψ hist such that |ψ ψ| − |ψ hist ψ hist | tr ≤ γ.
Combining Equation (21) with the Hölder Inequality and the fact that H out ∞ = mǫ yields that Since |ψ hist is a nullstate of G and ψ| H out |ψ ≤ ψ| H |ψ ≤ λ(H) + δ, we conclude Next, let I ⊆ { 1, . . . , m } be the set of indices corresponding to valid queries π i , and for all i ∈ I define x i = 1 if π i is a YES-instance and x i = 0 if π i is a NO-instance. 4 Recall now from Section 6.1 that at the beginning of round r * , V ′ has encoded the answer to the i-th QMA query in qudit q i (defined in Equation (20)). Let |ψ t * denote the unique (normalized) state in the superposition comprising |ψ hist in which q 1 is labeled ◮ (i.e. the first timestep corresponding to round r * ). Since during round r * , V ′ only applies identity gates, the qubits encoded in qudits q i during timestep t * , in which q 1 is labeled ◮ and all other q i are labeled , are exactly the same as in successive timesteps in which other q i are labeled by ◮ . More formally, | ψ t * | x i q i | 2 = L| ψ hist | ◮ x i q i | 2 for any i ∈ I, and so by Lemma 6.4, where 5 we substitute the label ◮ for when i = 1, and where the factor of L −1 is removed due to the normalization of |ψ t * . This is for any single query π i , i ∈ I; from this, we can obtain that |ψ t * simultaneously encodes nearly correct query answers to all valid queries. To do so, define Γ := Π i∈I | x i x i | q i (where again, we replace label ◮ for when i = 1). Then, by the Commutative Quantum Union Bound (Lemma 6.5), It follows that we may write |ψ t * = α |φ 1 + β |φ 2 for unit vectors |φ 1 , |φ 2 such that Γ |φ 1 = |φ 1 and Γ |φ 2 = 0, and where α, β ∈ C, |α| 2 + |β | 2 = 1, and |α| 2 ≥ 1 − mǫ. Intuitively, |φ 1 is the part of |ψ t * that encodes correct strings of query answers on I, while |φ 2 encodes strings with at least one incorrect query answer in I -for clarity, |φ 1 may encode a superposition of multiple distinct correct strings of query answers, since queries with indices not in I may be answered arbitrarily. 4 Without loss of generality, we may assume at least one query is valid (I = ∅). This is because if all queries are invalid, then all simulations of the P circuit C must output the same answer no matter the sequence of query answers C receives. Thus, all history states will encode the same final answer, and α (defined after (24)) equals 1, satisfying the lower bound found of α ≥ 1 − mǫ. 5 We implicitly apply identity on all qudits other than qi, i.e.
Application to YES versus NO cases for Π. We have shown that for any low energy state |ψ , there exists a history state |ψ hist close to |ψ which has large amplitude on all the correct query answers for set I in round r * . We can now analyze the YES and NO cases for our P QMA[log] problem Π.
Recall that |φ 1 may be a superposition over multiple correct query strings (due to invalid queries π i for i ∈ I). Nevertheless, since the classical circuit C for the P QMA[log] machine is required to output the same answer regardless of how invalid queries are answered (i.e. for any given correct string of query answers), all query strings which |φ 1 is a superposition over lead C to output the same, correct final answer. Thus, setting y = 0 if Π is a YES-instance and y = 1 if Π is a NO-instance, we have where the factor of L −1 is due to the fact A applies only to the final configuration/time step. Combining Equation (21) where recall λ(H 2 | S ) denotes the smallest eigenvalue of H 2 restricted to space S.
We now prove the lemmas required for Theorem 1.10.
Lemma 6.3. Assume the notation of Section 6. Proof. The analysis of G is more subtle than that of, say, the 5-local Kitaev circuit-to-Hamiltonian construction [KSV02]. The latter required the analysis of two orthogonal subspaces acted on invariantly by the Hamiltonian in question; the span of all correctly encoded history states, and the span of all states with an incorrectly encoded clock register (i.e. illegal configurations). In [HNN13], however, due to the restrictions of encoding in 1D, there are two types of illegal configurations which can arise -those which are detectable by local checks, and those which are not -and G does not act invariantly on the spaces of legal and illegal configurations. The soundness analysis of the QMA-hardness construction of [HNN13] (see Section 6 therein, which we follow below) hence independently analyzes three types of subspaces which are acted on invariantly by H prop : (1) The span of legal configurations and certain locally detectable illegal configurations, (2) the span of certain other locally detectable illegal configurations, and (3) the span of illegal configurations which are not locally detectable. We shall henceforth refer to these subspaces as S 1 , S 2 , and S 3 , respectively.
Proof of claim 1. This claim is implicit in [HNN13]; we sketch a proof to make it explicit here. Claim 2 of [HNN13] and the subsequent discussion explicitly show that any valid history state is a null state of G. For the reverse containment, Section 6.2 of [HNN13] shows that for sufficiently large polynomials ∈ Ω(1). That λ(G| S 2 ) ≥ ∆ pen follows since H pen is a sum of pairwise commuting projectors. Thus, Null(G) resides in S 1 . Section 6.1 of [HNN13] shows that Null(H prop | S 1 ∩Null(Hpen) ) is spanned by valid history states. We conclude that the span of all valid history states contains Null(G).
Proof of claim 2. We know from claim 1 that Null(G) is precisely the span of all correctly encoded history states. Let C denote the orthogonal complement of Null(G). Then, we know from the proof of claim 1 that λ(G| C∩S 2 ) ≥ ∆ pen ∈ Ω(1), and that λ((∆ prop H prop + ∆ pen H pen )| C∩S 3 ) ∈ Ω(1). (Here we have used the fact that S 2 ∪ S 3 ⊆ C.) Since δ is assumed to be inverse polynomial in n, and since we know from claim 1 that λ(H) ≤ 0, it follows that no vector |ψ from S 2 or S 3 can attain ψ| G |ψ ≤ λ(G) + δ.
(Note that this requires upper bounding terms of the form K 2 := H 2 ∞ , which is easily done via triangle inequality of the spectral norm and the fact that projections can only decrease maximum eigenvalues.) Lemma 6.4. Assume the notation of Section 6.2. For all i ∈ I, it holds that where recall q i is the index of the qudit which encodes the output corresponding to query π i following the verification phase.
Proof. For clarity, the factor of L −1 comes from the L configurations which |ψ hist is a sum over. Recall there is a unique configuration in which any given qudit is labeled ◮ , implying all history states |ψ hist satisfy We prove our claim by contradiction via an exchange argument. Suppose there exists a valid query 6 π j with correct answer x j such that Since |ψ hist is a correctly encoded history state, we claim π j must be a YES-instance. For if π j were a NO-instance, then all simulations of V j (on any possible proof) output NO with probability at least 1 − p. Thus, |ψ hist always encodes an output qubit such that which would contradict our supposition.
Given that π j is a YES-instance, we have that ψ hist | ◮ 1 q j 2 ≤ (1 − ǫ)/L, and so by Equation (28), ψ hist | H out,j |ψ hist ≥ ǫ 2 /L. Further, since π j is a YES-instance, there exists a QMA proof |ω which causes V j to output YES with probability at least 1 − p. By exchanging the QMA proof which |ψ hist encodes for circuit V j with the proof |ω , we obtain a new history state |ψ ′ hist which satisfies and so ψ ′ hist | H out,j |ψ ′ hist ≤ pǫ/L. Hence, i.e. flipping the incorrect query answer saves a non-trivial energy penalty on H out,j . We now use this to obtain the desired contradiction. Recall that H = G + H out . We make two observations: First, because all the QMA queries are made in parallel, flipping the answer to query π j does not affect the other queries the P machine makes or the answers it receives. Thus, |ψ hist and |ψ ′ hist obtain the same energy on all terms of H out other than H out,j , and Equation (29) holds for H out in place of H out,j . (Analyzing adaptive queries, rather than parallel, would require that penalties for later queries be carefully weighted less than penalties for earlier queries [Amb14], leading to a significantly more involved analysis.) Second, both |ψ hist and |ψ ′ hist are null states of G, and so we may substitute H for H out , yielding Now, recall from Equation (22) that ψ hist | H |ψ hist ≤ λ(H) + δ + mǫγ. Since δ and γ are inverse polynomials which (by Lemma 6.3) we are free to choose as needed (the choice of δ and γ, in turn, will mandate the choices of ∆ in , ∆ prop , ∆ pen via Lemma 6.3), we set δ = γ = 1/(256m 2 L) (where recall L and m are fixed polynomials in n). These choices of δ, γ satisfy δ + mǫγ < (ǫ − p)ǫ/L, which combined with Equation (30) gives that ψ hist | H |ψ hist > λ(H) + δ + mǫγ, i.e. |ψ hist could not have been close to the ground state energy of H. Hence, we have a contradiction, completing the proof.
Finally, we require a known quantum analogue of the union bound for commuting operators (see, e.g. [OMW19]). Generalizations to non-commuting projectors are given in [Sen12,Gao15,OMW19].
The simple proof of Lemma 6.5 is given in Appendix B for completeness. ρ δ = i µ i |φ i φ i | where the |φ i are orthogonal states with energy φ i | H |φ i ≤ λ(H) + δ and thus, for observable A given as part of of F -∀-APXSIM input, Let U = V V † , which satisfies U E(A) = E(A)U for any A, and so E(I)U ρU † = U E(I) ρU † = U ρU † . Now we need to choose A ′ such that A ′ E(I) = E(A). (Two notes: First, E(I) = I necessarily, as P and Q need not sum to identity. Second, setting A ′ = E(A) is not necessarily desirable, as P and Q may be non-local projectors.) For example if We note that the locality of A ′ depends on the number of qudits which V i maps to, which is O(1) by the definition of efficient simulation. Then We note that ρ−U ρU † 1 ≤ 2η follows from U − V V † ≤ η, and that V V † ρ = P ≤∆ ρ = ρ. Therefore we just need to choose ∆, ǫ, η, δ ′ such that this is less than (b − a)/3 and then set a ′ = a + (b − a)/3 and b ′ = b − (b − a)/3. B Proof of commutative quantum union bound Lemma 6.5 (Commutative Quantum Union Bound). Let { P i } m i=1 be a set of pairwise commuting projectors, each satisfying 0 P i I. Then for any quantum state ρ, 1 − Tr(P m · · · P 1 ρP 1 · · · P m ) ≤ m i=1 Tr((I − P i )ρ).
Proof. We proceed by induction on m. The case of m = 1 is trivial. Consider m > 1. Since the P i pairwise commute, Tr(P m · · · P 1 ρP 1 · · · P m ) = Tr(P m · · · P 1 ρ) := Tr(P m M ρ) for brevity, and M is a projector. Then Applying the induction hypothesis completes the proof.