Rapid mixing of path integral Monte Carlo for 1D
stoquastic Hamiltonians
Elizabeth Crosson
1
and Aram W. Harrow
2
1
Center for Quantum Information and Control, University of New Mexico.
2
Center for Theoretical Physics, Massachusetts Institute of Technology.
Path integral quantum Monte Carlo (PIMC) is a method for estimating thermal
equilibrium properties of stoquastic quantum spin systems by sampling from a classical
Gibbs distribution using Markov chain Monte Carlo. The PIMC method has been
widely used to study the physics of materials and for simulated quantum annealing, but
these successful applications are rarely accompanied by formal proofs that the Markov
chains underlying PIMC rapidly converge to the desired equilibrium distribution.
In this work we analyze the mixing time of PIMC for 1D stoquastic Hamiltonians,
including disordered transverse Ising models (TIM) with long-range algebraically decay-
ing interactions as well as disordered XY spin chains with nearest-neighbor interactions.
By bounding the convergence time to the equilibrium distribution we rigorously justify
the use of PIMC to approximate partition functions and expectations of observables
for these models at inverse temperatures that scale at most logarithmically with the
number of qubits.
The mixing time analysis is based on the canonical paths method applied to the
single-site Metropolis Markov chain for the Gibbs distribution of 2D classical spin mod-
els with couplings related to the interactions in the quantum Hamiltonian. Since the
system has strongly nonisotropic couplings that grow with system size, it does not fall
into the known cases where 2D classical spin models are known to mix rapidly.
1 Introduction
The application of Markov chain Monte Carlo methods to thermal states of a limited class of quan-
tum systems was ﬁrst proposed by Suzuki, Miyashita and Kuroda 40 years ago [43]. Their method
applied to Hamiltonians with real and nonpositive oﬀ-diagonal matrix elements in a particular
basis, a property that was called “being free of the sign problem” for many years until the term
“stoquastic” was eventually adopted [12] to emphasize a connection with nonnegative matrices.
Since then there have been a growing number of works on rigorous classical simulations for various
stoquastic systems [8, 9, 11, 15]. Many quantum systems of physical and computational interest
are stoquastic, including spinless particles moving on arbitrary graphs with position dependent
interactions, as well as generalized transverse Ising models, which are particularly notable for their
use in quantum annealing [1, 19, 31].
Elizabeth Crosson: crosson@unm.edu
Aram W. Harrow : aram@mit.edu
1
arXiv:1812.02144v3 [quant-ph] 5 Feb 2021
Statement of results. A general n-qubit 1D stoquastic Hamiltonian with nearest-neighbor in-
teractions has the form,
H =
n
j=1
H
j,j+1
, (1)
where each H
j,j+1
acts non-trivially on (at most) the sites j, j + 1 (with n + 1 identiﬁed with 1)
and each H
j,j+1
1. Deﬁne the “computational basis” {|1, |−1⟩} for each qubit to be the
eigenstates of the σ
z
operator. We assume H is stoquastic in the computational basis [11, 12],
which means that in this basis each H
j,j+1
is a real symmetric matrix with nonpositive oﬀ-diagonal
entries,
z|H
j,j+1
|z
0 for all z, z
{−1, 1}
n
with z ̸= z
. (2)
For a given inverse temperature β we consider the quantum partition function Z
β
and the quantum
Gibbs state ρ
β
,
ρ
β
:=
e
βH
Z
β
, Z
β
:= tr
e
βH
, (3)
and also the expectation value of observables Q in the thermal state, ⟨Q⟩
β
:= tr[Qρ
β
]. For the
special class of systems called 1D generalized transverse Ising models (TIM) we go beyond (1)
and allow systems with long-range interactions that decay suﬃciently quickly (faster than inverse
square) with distance,
H =
n
j=1
Γ
j
σ
x
j
+
n
j=1
K
z
j
σ
z
j
+
n
j,k=1
K
zz
jk
σ
z
j
σ
z
k
, Γ := min
j=1,...,n
Γ
j
> 0, (4)
K
z
j
[1, 1] , and |K
zz
jk
| |i j|
(2+ξ)
for ξ > 0. (5)
The form (5) simpliﬁes both the presentation of the PIMC method and our proof that the mixing
time of PIMC for these models is at most poly(n, e
β
, Γ
1
). We also show the mixing time of PIMC
is at most poly(n, e
β
, Γ
1
) for generalized TIM with σ
x
σ
x
and σ
y
σ
y
interactions,
H =
n
j=1
Γ
j
σ
x
j
n
j=1
K
xx
j
σ
x
j
σ
x
j+1
n
j=1
K
yy
j
σ
y
j
σ
y
j+1
+
n
j=1
K
zz
j
σ
z
j
σ
z
j+1
+ K
z
j
σ
z
j
, (6)
K
zz
j
, K
z
j
[1, 1] , K
xx
j
0 , K
yy
j
[K
xx
j
, K
xx
j
] and Γ := min
j=1,...,n
Γ
j
> 0.
The mixing time bounds can be used to construct an FPRAS (fully polynomial-time randomized
approximation scheme) for the partition function of models of the form (5) and (6) and inverse
temperatures β = O(log n). In addition to the partition function there is a similar FPRAS for the
expectation value of observables that are sparse and row-computable (in the computational basis)
in the thermal state ρ
β
for β = O(log n). For general 1D stoquastic Hamiltonians (1) we prove
that the PIMC mixing time is at most quasi-polynomial in the system size, which yields a quasi-
polynomial time approximation algorithm for the partition function and for the approximation of
observables.
Theorem 1. There is an algorithm which takes as input an inverse temperature β and an n-qubit
1D stoquastic Hamiltonian H and outputs an estimate
˜
Z
β
of the quantum partition function Z
β
.
If also given an observable Q 0 that is sparse and row-computable in the computational basis,
the algorithm outputs an estimate
˜
Q of its expectation value in the state ρ
β
.
2
These estimates satisfy
|
˜
Z
β
Z
β
| δ
mult
Z
β
+ δ
add
(7)
|
˜
Q tr[Qρ
β
]| δ
mult
tr[Qρ
β
] + δ
add
. (8)
with probability 1 δ
fail
. The algorithm runs in time at most
poly(n, e
β
, Γ
1
, δ
1
mult
, log
δ
1
add
, log(δ
1
fail
))
for models of the form (5) and (6), and in time at most
poly(2
β
log
n
δ
mult
2
, log(1
add
), log(1
fail
))
for models of the form (1).
Relation to previous work. A variety of quantum Monte Carlo methods have been devised for
estimating equilibrium properties of stoquastic Hamiltonians, which can be broadly divided into
diﬀusion methods [21, 27, 40] and path integral methods [2, 36, 39, 42, 43]. Diﬀusion Monte Carlo
methods use Hamiltonian matrix elements to deﬁne transition rates for a substochastic random
process, along with birth and death population dynamics or importance sampling to control sta-
tistical ﬂuctuations. Examples of proven eﬃcient convergence for diﬀusion Monte Carlo methods
include simulating frustration-free stoquastic adiabatic computation [11] and simulating the “go
with the winners” method [26]. Disadvantages of diﬀusion methods include quantum amplitude
vs quantum probability obstructions for some variants [13, 27], as well as the need for importance
sampling with trial wave functions that may in general not be available [8].
PIMC methods, in contrast, are based on Markov chains deﬁned on an enlarged state space con-
sisting of many coupled copies of a high-temperature classical system along a so-called “imaginary-
time” direction [43]. PIMC has been used to give an FPRAS for the ferromagnetic case of TIM [8]
on general interaction graphs by a reduction to a classic result on sampling the Gibbs distribution
of classical ferromagnetic Ising models [28]. Also, the Glauber dynamics for an inﬁnite dimensional
variant of PIMC has been shown to have optimal temporal mixing for ferromagnetic TIM at high
temperature [14] and for ferromagnetic TIM on regular trees [35]. Here “ferromagnetic” means of
the form in (5) but with each K
z
j
= 0 and each K
zz
j
0, i.e. lower energy is achieved by aligning
adjacent spins. Since such interactions cannot be frustrated, these models are considered easier
to analyze. Disadvantages of PIMC methods include the signiﬁcant overhead created by adding
the imaginary-time direction, and also the possibility of slow mixing due to topological obstruc-
tions [3, 23]. The general presence of obstructions to rapid mixing for the standard versions of both
diﬀusion and path integral methods also motivates the continuing development of new algorithms
for simulating stoquastic Hamiltonians [2, 9].
To our knowledge, the simulations in this work are the ﬁrst provably eﬃcient Monte Carlo
methods for a large class of generalized TIM systems, which diﬀer from ferromagnetic TIM by
allowing couplings of both signs between neighboring spins, as well as local ﬁelds of both signs.
A generalized TIM in 2D or higher dimensions can encode NP-complete constraint satisfaction
problems, which is the basis for the use of these models in quantum annealing [1, 19, 31]. Besides
their relative simplicity from the standpoint of experimental implementation, generalized TIM on
interaction graphs of degree 3 are also known to be universal for stoquastic quantum annealing [10,
3
17]. The application of Monte Carlo methods to stoquastic quantum annealing is called simulated
quantum annealing, and a signiﬁcant open question is whether some variant of simulated quantum
annealing can eﬃciently simulate stoquastic quantum annealing. This open question has motivated
several recent proposals for non-stoquastic quantum annealing [16, 25, 41, 46]. Some recent progress
has been made by analyzing speciﬁc models for which both quantum annealing and simulated
quantum annealing can be exponentially faster than classical simulated annealing [6, 7, 15, 20, 30,
32, 37, 38].
Another class of simulation methods for 1D quantum systems are those based on matrix product
states [45]. These methods do not require the Hamiltonian to be stoquastic and have been applied
in rigorous classical simulations of systems with limited entanglement [4, 22, 33]. Their runtime
scales exponentially with spectral gap, while our runtime bound for PIMC scales exponentially
with the inverse temperature, which is qualitatively similar. These kinds of simulations are not
expected to obsolete PIMC in general, however, as the latter is used in practice to simulate highly
entangled stoquastic systems. Furthermore, the version of PIMC that we consider is closely related
to the practical implementations of PIMC, and so this work can be seen as a rigorous analysis of
a heuristic approximation algorithm that was originally developed and applied with great success
by the physics community.
Organization of the remaining sections. Section 2 reviews the PIMC method and sketches
the proof of our mixing time analysis. Section 3.1 deﬁnes the Suzuki-Trotter approximation to
the partition function which is used to map 1D stoquastic Hamiltonians onto 2D classical spin
systems. Section 3.2 and Section 3.3 describe the use of samples from the Gibbs distribution of
the 2D classical system to approximate quantum observables and the quantum partition function.
Section 3.4 proves a concentration theorem for PIMC that justiﬁes the analysis of a restricted state
space in later sections. Section 4.1 reviews the canonical paths method that we use to analyze the
PIMC mixing time. This method is then applied to generalized TIM in Section 4.3, generalized
TIM with XX interactions in Section 4.4, and general 1D stoquastic Hamiltonians in Section 4.5.
2 Overview
In the PIMC method the Hamiltonian (1) is related by Suzuki’s quantum-to-classical mapping
[42, 44] to a system of 1} classical spins on a 2D lattice Λ = {1, . . . , L} × {1, . . . , n}. The new
site index {1, . . . , L} is known in physics as the “imaginary-time” direction, and it is often useful
to think of the set of spin conﬁgurations := {−1, 1}
n×L
as consisting of “time slices” along the
L direction, i.e. for z we write z := (z
1
, . . . , z
L
) with each z
i
{−1, 1}
n
. The mapping
introduces a local energy function E
β
: R on the classical spin conﬁgurations in such a way
that an estimate of the classical partition function Z :=
z
e
βE
β
(z)
can be used to estimate the
quantum partition function Z, and so that samples from the Gibbs distribution π(z) := e
βE
β
(z)
/Z
can be used to estimate expectation values in the quantum Gibbs state. For the nearest-neighbor
version of the transverse Ising models in (5) the energy function E
β
is
E
β
(z) =
(i,j)Λ
K
zz
j,j+1
L
z
i,j
z
i,j+1
+
K
z
j
L
z
i,j
β
1
J
i
z
i,j
z
i+1,j
, z
L+1,j
:= z
1,j
, (9)
4
where z
i,j
denotes the spin in conﬁguration z at the site (i, j) Λ, and J
i
:=
1
2
log coth
βΓ
i
L
(see
Section 3.1). The couplings along the imaginary-time direction are ferromagnetic, but the couplings
along the spatial direction can have varying signs as well as local ﬁelds. Our choices of parameters
are such that L = poly(n), and so the general case considers J
i
with magnitude up to O(log n).
Therefore the couplings of the 2D model are highly anisotropic, as in ﬁgure 1.
space
imaginary-time
Figure 1: The local ﬁelds and the couplings along the spatial direction can be either positive
or negative but have magnitudes reduced by 1/L (dotted lines), whereas the couplings along
the imaginary-time direction can have magnitude growing logarithmically with system size
but are always ferromagnetic (solid lines).
The PIMC method proceeds by attempting to sample from π and approximate Z using the
Markov chain Monte Carlo method with single-site Metropolis transition probabilities. With this
in mind, the primary question is to determine the mixing time τ for this Markov chain, which is
the minimum number of steps of this Markov chain needed to sample from a distribution that is
close to π. We will use the method of canonical paths that was originally introduced by Jerrum and
Sinclair (reviewed in [29, 34]) in order to bound the mixing time. The idea of the method is that
for every pair of states x, y the Markov chain P needs to route an amount of “traﬃc” equal to
π(x)π(y) along some path γ
x,y
which connects x to y using the transitions of the Markov chain.
Deﬁne the congestion across a particular transition e = (z, z
) to be the ratio of the total traﬃc
through it (i.e.
x,y:eγ
x,y
π(x)π(y)) to the amount of probability ﬂow across it at equilibrium,
which equals π(z)P (z, z
). The mixing time can then be upper bounded in terms of the maximum
congestion through any transition, using standard arguments which we review in Section 4.1.
Our problem now reduces to ﬁnding good sets of paths between every pair of conﬁgurations
x, y . We use a set of paths inspired by those used to analyze the (unweighted) random walk
on the boolean hypercube [29]. Speciﬁcally we will replace the entries of x with entries of y one
at a time. A crucial requirement is that the energies of the intermediate states do not get too
large, as this would create a large congestion. For the model (9) the paths ﬁrst update the classical
spins corresponding to the site of qubit 1 in consecutive imaginary-time order, and then proceed
to do the same for the sites of qubits 2 through n. For each qubit j {1, . . . , n} the line of
sites {(1, j), (2, j), . . . , (L, j)} is called the worldline of the j-th qubit, and so these canonical paths
have the form of updating each qubit worldline consecutively, always ﬁnishing one worldline before
starting the next.
An example of an intermediate step along this path is depicted in Fig. 2. Notice that intermedi-
ate steps have only introduced a constant number of broken bonds between spins that are neighbors
5
y
1,5
y
1,4
y
1,3
y
1,2
y
1,1
y
2,1
y
2,2
y
2,3
y
2,4
y
2,5
y
3,1
y
3,2
x
3,3
x
3,4
x
3,5
x
4,5
x
4,4
x
4,3
x
4,2
x
4,1
x
5,1
x
5,2
x
5,3
x
5,4
x
5,5
x
6,5
x
6,4
x
6,3
x
6,2
x
6,1
space n
imaginary-time L
Figure 2: An intermediate point along a canonical path from x to y. We have ﬁnished updating qubits
1 and 2 and are in the middle of updating qubit 3. The red line indicates the domain wall formed
between the x and y conﬁgurations. The change in the classical energy created by this domain wall
determines the Markov chain congestion for this set of canonical paths.
in the imaginary time direction, each contributing an amount of energy that is O(β
1
log n), and
at most L broken bonds between spins that are neighbors in the space direction, each contributing
energy O(1/L). The relatively low energy of the intermediate conﬁgurations along the path is the
key to the formal proof of rapid mixing for PIMC applied to 1D generalized TIM in Section 4.3.
For 1D generalized TIM with long-range interactions, the energy of the intermediate conﬁguration
z will include contributions from spins that are arbitrarily far away from the cut. Using the as-
sumption on the rate of decay of the magnitude of these interactions, the total contribution to the
energy is upper bounded by
n/2
j=1
n
k=
n
2
+1
|j k|
(2+ξ)
= O(ξ
2
+ n
ξ
),
where the bound is proven by viewing the summation as a right Riemann sum of the corresponding
integral, which yields an upper bound because the integrand is a monotonically decreasing function.
Since we regard ξ as a constant this contribution to the energy is O(1).
For Hamiltonians of the form (6) the 2-local oﬀ-diagonal terms give rise to a classical energy
function with interactions involving four spins at a time,
E
β
(z) :=
(i,j)Λ
K
zz
j
L
z
i,j
z
i,j+1
+
K
z
j
L
z
i,j
β
1
h
i,j
(z
i,j
, z
i,j+1
, z
i+1,j
, z
i+1,j+1
)
, (10)
The h
i,j
terms (which are implicitly deﬁned by the Gibbs distribution (22)) correspond to strong
ferromagnetic interactions between neighboring pairs of spins in neighboring time slices, and violat-
ing them increases the energy by an amount that is O(log n). If z
i,j
̸= z
i+1,j
and z
i,j+1
̸= z
i+1,j+1
then we say there is a 2-local jump between time slice i and time slice i + 1, because the result-
ing energy contribution from the 4-body interaction will depend on the magnitude of the 2-local
oﬀ-diagonal quantum terms i.e. K
xx
j
and K
yy
j
.
6
The canonical paths we have described so far will lead to intermediate conﬁgurations that
potentially create or destroy a non-constant number of 2-local jumps across the domain wall pictured
in ﬁgure 2. If K
xx
j
is larger than K
x
j
, then conﬁgurations x and y which contain a non-constant
number of 2-local jumps will result in the canonical paths described so far overloading some edges
with large congestion. To overcome this we include additional path segments along the canonical
paths that remove 2-local jumps from the worldlines near the domain wall in ﬁgure 2. This greatly
increases the number of paths passing through each edge, but we are still able to obtain a poly(n, e
β
)
upper bound on the congestion by noticing that most of these paths have endpoints with very small
stationary probabilities.
For Hamiltonians of the form (1) it is possible that the oﬀ-diagonal terms in (1) do not allow
for the PIMC Markov chain with single-site transitions to be ergodic. In this situation some
conﬁgurations z will have π(z) = 0, which creates a new obstacle to the canonical paths we
have described so far. To restore ergodicity under single-site updates we add a ﬁctitious transverse
ﬁeld to the Hamiltonian with a weak coupling that goes to zero with system size:
H H Γ
n
i=1
σ
x
i
(11)
with Γ δ
mult
/n so that the subsequent estimate for the partition function will not be further than
the allowed error tolerance away from its intended value.
Once the ﬁctitious ﬁeld is added it once again makes sense to attempt to use the same canonical
paths that were applied to systems of the form (6). But now the fact that the oﬀ-diagonal matrix
elements of the Hamiltonian are no longer symmetric under the global operation of ﬂipping all
spins means that removing a single (1-local or 2-local) jump from a given worldline can decrease
the stationary probability of that conﬁguration by a factor scaling exponentially in the number of
jumps remaining in that worldline. To overcome this problem we show in Section 3.4 that typical
conﬁgurations distributed according to π will have at most O(β log n) jumps in any particular
worldline. This means that the worst-case intermediate conﬁgurations just described may have a
stationary probability reduced by n
O(β log n)
, and this factor bottlenecks the mixing time estimate.
Using these facts we describe canonical paths in Section 4.5 within a restricted state space
in
which every conﬁguration has O(β log n) jumps per worldline, and by either restricting the PIMC
Markov chain to
or applying the leaky random walk conductance bound [15] we obtain a bound
on the mixing time within the subset
that establishes theorem 1.
3 Path integral Monte Carlo
Sections 3.1 through 3.3 rigorously present the underpinnings of the standard PIMC method: the
quantum-to-classical mapping, and its application to approximating expectation values of observ-
ables as well as the partition function. Section 3.4 proves a general concentration of measure bound
which shows that the number of jumps along any worldline in PIMC is O(β log n) with high proba-
bility. Note that this result applies to general stoquastic Hamiltonians (not just the 1D case). This
result is used to analyze a restricted state space in Section 4.5.
3.1 Quantum-to-classical mapping
The ﬁrst step of the quantum-to-classical mapping is to split H as in (1) into a sum of commuting
Hamiltonians. For each local term H
i
we construct local operators A
i
, B
i
, C
i
such that βH
i
=
7
A
i
+ B
i
+ C
i
and [A
i
, A
j
] = [B
i
, B
j
] = [C
i
, C
j
] = 0 for all i, j. Let A =
i
A
i
, B =
i
B
i
,
C =
i
C
i
, and hereafter let A
i
match the diagonal part of H
i
in the computational basis, while
B
i
and C
i
contain the oﬀ-diagonal terms of H
i
when i is odd or even, respectively. For positive
integers L deﬁne the Suzuki-Trotter approximation of the partition function Z = tr
e
A+B+C
,
Z
β,L
:= tr
e
A
2L
e
B
L
e
A
2L
e
C
L
L
. (12)
The fact that lim
L→∞
Z
β,L
= Z
β
is the content of the 1875 Lie product formula, while for ﬁnite L
the Suzuki-Trotter approximation is related to the quantum partition function (3) by
e
δ
Z
β
Z
β,L
e
δ
Z
β
, (13)
for some δ that is O
H
3
/L
3
. This error bound follows from two facts about the matrix expo-
nential: a variant of the usual Suzuki-Trotter expansion, and a continuity bound. Both appear in
[8] for transverse Ising models, and the proof for three commuting layers A, B, C is similar and so
we omit the details.
Before considering the general case of H as in (1) it is useful to consider the simpliﬁed case of
(12) for generalized TIM of the form (5) by taking A = β diag(H), B = β (H diag(H)), and
C = 0 so that the Suzuki-Trotter approximation reduces to
Z
β,L
= tr
e
A
2L
e
B
L
e
A
2L
L
= tr
e
A
L
e
B
L
L
. (14)
For transverse Ising models, the next step is to expand the Suzuki-Trotter approximation (14)
to the partition function by inserting several complete sets of states,
Z
β,L
=
z
1
∈{−1,1}
n
z
1
|
e
A
L
e
B
L
L
|z
1
(15)
=
z
1
,...,z
L
L
i=1
z
i
|e
A
L
e
B
L
|z
i+1
, z
L+1
:= z
1
(16)
=
z
1
,...,z
L
L
i=1
e
A(z
i
)
L
z
i
|e
B
L
|z
i+1
. (17)
where in the last step we have used the fact that A = β diag(H) is diagonal in the computational
basis and deﬁned A(z
i
) := z
i
|A|z
i
. The Gibbs distribution of the classical 2D spin system is
deﬁned by equating e
βE
β
(z
1
,...,z
L
)
with the summand of (17). The closed form (9) is obtained
applying the identity e
ωσ
x
= cosh(ω)I + sinh(ω)σ
x
to
z
i
|e
B
L
|z
i+1
= z
i
|e
βΓ
L
n
j=1
σ
x
j
|z
i+1
=
n
j=1
z
i,j
|e
βΓ
L
σ
x
j
|z
i+1,j
.
For a general 1D stoquastic Hamiltonian of the form in (1) we will apply (12) with
A = β diag(H) , B = β
iOdd
[H
i
diag(H
i
)] , C = β
iEven
[H
i
diag(H
i
)] , (18)
8
Expanding (12) with 2L complete sets of states,
Z
β,L
=
z
1
∈{−1,1}
n
z
1
|
e
A
2L
e
B
L
e
A
2L
e
C
L
L
|z
1
(19)
=
z
1
,...,z
2L
2L1
iOdd
z
i
|e
A
2L
e
B
L
|z
i+1
⟩⟨z
i+1
|e
A
2L
e
C
L
|z
i+2
(20)
=
z
1
,...,z
2L
e
2L
i=1
A(z
i
)
2L
2L1
iOdd
z
i
|e
B
L
|z
i+1
⟩⟨z
i+1
|e
C
L
|z
i+2
, (21)
where periodic boundary conditions z
2L+1
:= z
1
are taken in (20). At this point the essential
diﬀerence from the transverse Ising case is clear: the oﬀ-diagonal part of the Hamiltonian is split
into two commuting layers, these layers alternate along the imaginary-time direction, and this
causes the coupling between classical spins at neighboring time slices to alternate as well. For ease
of notation deﬁne B
i
to be 2B if i is odd, and 2C if i is even, and rescale L L/2 to obtain a
distribution π from (21),
π(z) :=
1
Z
β,L
e
L
i=1
A(z
i
)
L
L
i=1
z
i
|e
B
i
L
|z
i+1
. (22)
3.2 Approximating observables
In this section we consider an observable O for which we would like to estimate ⟨O
β
:= tr[Oρ
β
].
We justify this by demonstrating closeness of ⟨O
β
to a value O
β,L
that is deﬁned in terms of the
Suzuki-Trotter approximation in equation (34). The expectation ⟨O
β
can be expressed as the
derivative of a partition function by deﬁning
Z
β
(ζ) := tr[e
βH+ζO
], (23)
and using the fact that
⟨O
β
=
1
Z
β
(0)
Z
β
ζ
(0). (24)
The next proposition shows that ⟨O
β
can be approximated by a ﬁnite diﬀerence.
Proposition 2. Suppose that
ˆ
Z
β
(0) and
ˆ
Z
β
(ζ) satisfy
e
δ
ˆ
Z
β
(0)
Z
β
(0)
,
ˆ
Z
β
(ζ)
Z
β
(ζ)
e
δ
,
for ζ =
3δ/∥O. Then the estimator
⟨O
β
:=
ˆ
Z
β
(ζ)
ˆ
Z
β
(0)
ζ
ˆ
Z
β
(0)
. (25)
satisﬁes the bound
|
⟨O
β
⟨O
β
| 2
δ∥O (26)
9
Proof. We use ﬁrst Taylor’s theorem, then the bound tr AB A∥∥B
1
to obtain:
Z
β
(ζ) Z
β
(0) ζ
Z
β
ζ
(0)
ζ
2
2
max
ζ
[0]
2
Z
β
ζ
2
(ζ
) (27a)
ζ
2
∥O
2
max
ζ
[0]
Z
β
(ζ
) (27b)
ζ
2
∥O
2
Z
β
(0)e
ζ∥O∥
. (27c)
Dividing both sides of (27) by ζZ
β
(0) we have
⟨O
β
Z
β
(ζ) Z
β
(0)
ζZ
β
(0)
=
⟨O
β
Z
β
(ζ)
ζZ
β
(0)
1
ζ
ζ∥O
2
e
ζ∥O∥
. (28)
Then
⟨O
β
⟨O
β
=
ˆ
Z
β
(ζ)
ζ
ˆ
Z
β
(0)
1
ζ
⟨O
β
(29)
e
2δ
Z
β
(ζ)
ζZ
β
(0)
1
ζ
⟨O
β
(30)
(e
2δ
1)
Z
β
(ζ)
ζZ
β
(0)
+ ζ∥O∥
2
e
ζ∥O∥
(31)
(3δ + ζ∥O∥
2
)e
ζ∥O∥
(32)
2
δ∥O (33)
In the last step we have used the assumption that δ 1/21 to simplify the numerical constants. A
similar argument shows that
⟨O
β
⟨O
β
2
δ∥O, thus completing the proof.
By (25), (13), and the triangle inequality, the expectation ⟨O
β
can be approximated by
O
β,L
=
1
Z
β,L(0)
ζ
Z
β,L
(0), (34)
where Z
β,L
(ζ) is a Suzuki-Trotter approximation to (23). If O is diagonal use the form in Section 3.1
with A A + ζO,
Z
β,L
(ζ) =
z
e
1
L
L
i=1
(A(z
i
)+ζO(z
i
))
L
i=1
z
i
|e
B
i
L
|z
i+1
. (35)
Evaluating the derivative at ζ = 0,
1
Z
β,L
(0)
ζ
Z
β,L
(0) =
1
Z
β,L
z
1
L
L
i=1
O(z
i
)
e
1
L
L
i=1
A(z
i
)
L
i=1
z
i
|e
B
i
L
|z
i+1
(36)
=
z
π(z)
1
L
L
i=1
O(z
i
)
=
1
L
L
i=1
O(z
i
)
π
= ⟨O(z
1
)
π
(37)
10
where the ﬁnal expression follows by the symmetry of π under cyclic permutations of the time
slices. For oﬀ-diagonal O,
Z
β,L
(ζ) =
z
e
1
L
L
i=1
A(z
i
)
L
i=1
z
i
|e
1
L
(B
i
+ζO/2)
|z
i+1
. (38)
Evaluating the derivative at ζ = 0,
O
β,L
=
1
Z
β,L
(0)
z
e
1
L
L
i=1
A(z
i
)
L
k=1
z
k
|
O
2L
e
B
k
L
|z
k+1
L
i=1
i̸=k
z
i
|e
B
i
L
|z
i+1
(39)
=
z
1
2L
L
k=1
z
k
|Oe
B
k
L
|z
k+1
z
k
|e
B
k
L
|z
k+1
π(z) (40)
=
1
2L
L
k=1
z
k
|Oe
B
k
L
|z
k+1
z
k
|e
B
k
L
|z
k+1
π
(41)
=
1
2
z
z
1
|Oe
B
1
L
|z
2
z
1
|e
B
1
L
|z
2
+
z
2
|Oe
B
2
L
|z
3
z
2
|e
B
2
L
|z
3
π(z) (42)
where the last step follows by symmetry of π under even cyclic permutations of the time slices.
Since the summand of (42) is eﬃciently computable and has bounded magnitude we can estimate
it using a standard Monte Carlo procedure (see lemma 3) once one has an eﬃcient meanns of
obtaining independent samples from π.
3.3 Approximating partition functions
In Section 4 we will prove that for any tolerance δ > 0 the PIMC method can eﬃciently output an
element of sampled from a distribution ˜π which is within |˜π π| δ. Here we explain how these
samples can be used to approximate the partition function with an application of the standard
telescoping product technique [5, 28]
1
. We express the partition function Z(β) as a product of
terms involving the partition function at higher temperatures,
Z(β) = Z(0)
Z(β
1
)
Z(β
0
)
. . .
Z(β
k
)
Z(β
k1
)
(43)
where the temperature schedule β
0
< β
1
< . . . < β
k
with β
0
= 0 and β
k
= β is chosen so that each
ratio in the telescoping product is bounded. Deﬁning w
β
i
(z) := Z(β
i
)π
β
i
(z), note that
w
β
i
w
β
i1
π
β
i1
=
z
w
β
i
(z)
w
β
i1
(z)
π
β
i1
(z) =
Z(β
i
)
Z(β
i1
)
. (44)
1
The techniques used in this section to estimate the partition function (or equivalently the free energy) are
necessary because the entropy term in the free energy cannot be directly estimated as an observable. If one is only
interested in the estimation of speciﬁc observables then it suﬃces to use the end result(42).
11
At inﬁnite temperature the quantum partition function is Z(0) = 2
n
. Further, a uniform schedule
with k = O(βHlog(βH)) ensures that
1
e
Z(β
i
)
Z(β
i1
)
1. (45)
Note that in inﬁnite temperature limit β = 0 the classical eﬀective systems (9) and (10) have
inﬁnite coupling strength between spins in the imaginary-time direction, and so the Markov chain
we analyze in Section 4 will not be ergodic. This is ununsual from the perspective of classical spin
systems, but it happens in this case because of the temperature dependence of the couplings. To
deal with this the ﬁrst expectation w
β
1
/w
β
0
π
β
0
in the telescoping product can be computed using
the fact that π
β
0
is equivalent to the uniform distribution on the subset
classical
consisting of
conﬁgurations (z
1
, . . . , z
L
) which have z
i
= z
j
for all i, j = 1, . . . , L.
Finally, in order to bound the number of samples needed to compute the expectation values in
the telescoping product we will make use of the Hoeﬀding bound.
Lemma 3 ([24]). Let X
1
, . . . , X
t
be independent random variables satisfying |X
i
| 1 and E[X
i
] =
¯
X. Then
P[
1
t
t
i=1
X
i
¯
X
δ] 2e
2
/2
(46)
Deﬁne M (z) = e
2δ
w
β
i
(z)/w
β
i+1
(z), so that |M(z)| 1. Observe that |⟨M
π
M
˜π
|
π ˜π
1
M δ. We will estimate M
˜π
by drawing t samples from ˜π, which we call z
(1)
, . . . , z
(t)
.
Our estimator will simply be the sample mean, namely
ˆ
¯
M :=
1
t
t
i=1
M(z
i
). (47)
By Lemma 3 we have
P[|
ˆ
¯
M M
˜π
| δ] 2e
2
/2
. (48)
We conclude that taking t = O(1
2
) samples yields an O(δ) approximation to M
˜π
with high
probability. Therefore a polynomial number of samples will suﬃce to estimate each term of the
telescoping product.
3.4 Restricting the PIMC state space
In this section we justify restricting PIMC to a subset of the state space that limits the number of
jumps in any particular worldline to be O(β log n) (this restricted state space will be used for the
mixing time analysis in Section 4.5). For a conﬁguration z = (z
1
, ..., z
L
) the number of jumps in
the worldline of the j-th qubit can be expressed in terms of the Pauli operator X
j
,
d
j
(z) := |{i : z
i,j
̸= z
i+1,j
}| =
L
i=1
z
i
|X
j
|z
i+1
.
The strategy is to bound the k-th moment of the random variable d
j
with respect to the QMC
stationary distribution π and use the fact that
Pr[d
j
a]
π
π
d
k
j
a
k
.
12
These moments satisfy
π
d
k
j
:=
z
d
k
j
(z)π(z)
=
z
L
i=1
z
i
|X
j
|z
i+1
k
π(z)
=
z
L
i
1
,...,i
k
=1
k
r=1
z
i
r
|X
j
|z
i
r
+1
π(z)
=
1
Z
β,L
z
L
i
1
,...,i
k
=1
k
r=1
z
i
r
|X
j
|z
i
r
+1
e
L
i=1
A(z
i
)
L
L
i=1
z
i
|e
B
i
L
|z
i+1
⟩⟩
. (49)
where A, B
i
are deﬁned in Section 3.1. To merge the two products in the expression above into a
simpler trace expression, we want to show the following inequality for any neighboring time slices
z
i
and z
i+1
,
z
i
|X
j
|z
i+1
4
β
L
z
i
|X
j
e
B
i
/L
|z
i+1
z
i
|e
B
i
/L
|z
i+1
, (50)
where we assume the local terms are normalized so that for some Γ > 0
Γ |⟨z
i
|H|z
i+1
⟩| 1 z
i
, z
i+1
: |z
i
z
i+1
| = 1.
If z
i
|X
j
|z
i+1
= 0 then (50) is satisﬁed. Otherwise if z
i
|X
j
|z
i+1
= 1 then the numerator on the
RHS of (50) satisﬁes
z
i
|X
j
e
B
i
/L
|z
i+1
= z
i+1
|e
B
i
/L
|z
i+1
e
βH/L
1
2
. (51)
To upper bound the denominator of the RHS of (50) , the fact that B
i
is a commuting Hamiltonian
implies
z
i
|e
B
i
/L
|z
i+1
= z
i,j
z
i,j+1
|e
B
i,j
/L
|z
i+1,j
, z
i+1,j+1
=
q=1
1
q!
z
i,j
z
i,j+1
|(B
i,j
/L)
q
|z
i+1,j
, z
i+1,j+1
e
βH
j
L
1
1
ln(2)
β
L
,
where in the last step we have used H
j
1,
β
L
ln(2) and the convexity of e
x
. Using ln(2)/2
1/4 we have established (50).
To apply this result to (49) we will use the permutation symmetry of the i
1
, ..., i
k
variables to
restrict to a sum over ordered k-tuples (i
1
, ..., i
k
) which satisfy i
1
i
2
. . . i
k
.
π
d
k
j
k!
Z
z
1i
1
...i
k
L
k
r=1
z
i
r
|X
j
|z
i
r
+1
e
L
i=1
A(z
i
)
L
L
i=1
z
i
|e
B
i
L
|z
i+1
⟩⟩
(52)
4
k
k!β
k
ZL
k
z
1i
1
...i
k
L
k
r=1
z
i
r
|X
j
e
B
i
L
|z
i
r
+1
z
i
r
|e
B
i
L
|z
i
r
+1
e
L
i=1
A(z
i
)
L
L
i=1
z
i
|e
B
i
L
|z
i+1
⟩⟩
(53)
13
Deﬁning t
r
= (i
r
i
r1
)β/L (as well as t
1
= r
1
β/L, t
k
= (L i
k
)β/L) this expression can be seen
as the trace of a product of operators,
π
d
k
j
k!(4β)
k
ZL
k
L
t
1
,...,t
k
t
1
+...+t
k
=β
tr [M
t
1
R
t
1
X
j
L
t
2
M
t
2
R
t
2
X
j
...X
j
L
t
k
M
t
k
] (54)
where
M
t
i
:=
e
B
even
/L
e
A/L
e
B
odd
/L
e
A/L
e
B
even
/L
t
i
β/L⌋−2
and L
i
,R
t
i
can each denote one of several products of operators depending on t
i
, which is determined
by where the X
j
operators disrupt the sequence,
L
t
i
=
e
A/L
e
A/L
e
B
odd
/L
e
A/L
e
B
odd
/L
e
A/L
e
A/L
e
B
odd
/L
e
A/L
e
B
even
/L
, R
t
i
=
e
B
even
/L
e
A/L
e
B
even
/L
e
B
odd
/L
e
A/L
e
B
even
/L
e
A/L
e
B
odd
/L
e
A/L
e
B
even
/L
.
The purpose of this decomposition is that each M
t
i
is a PSD operator raised to an positive integer
power, and the rest of the Matrices L
t
i
, R
t
i
, X
j
are all either PSD operators or are explicit products
of 2,3, or 4 PSD operators. Now the traces in (54) can be bounded using classic inequality due to
Ky Fan [18, Theorem 1]. Let P
1
...P
N
satisfy P
i
0 for each i, and let σ(P
i
) be a diagonal matrix
with the eigenvalues of P
i
along the diagonal in descending order. Then
tr [P
1
...P
N
] tr [σ(P
1
)...σ(P
N
)]
For our purposes deﬁne the notation σ(R
t
i
) to be the product of diagonal matrices of eigenvalues
of the operators that make up R
t
i
e.g. if R
t
i
= e
A/L
e
B
even
/L
then σ(R
t
i
) = σ
e
A/L
σ
e
B
even
/L
.
Furthermore we may take B
even
, B
odd
0 without loss of generality (by shifting the overall Hamil-
tonian by a multiple of the identity), so that σ(R
t
i
), σ(L
t
i
) for each t
i
. Therefore each trace in
(54) satisﬁes
tr [M
t
1
R
t
1
X
j
L
t
2
M
t
2
R
t
2
X
j
...X
j
L
t
k
M
t
k
]
tr [σ (M
t
1
) σ (R
t
1
) σ (X
j
) σ (L
t
2
) σ (M
t
2
) σ (R
t
2
) σ (X
j
) ...σ (X
j
) σ (L
t
k
) σ (M
t
k
)] (55)
tr [σ (M
t
1
) σ (M
t
2
) ...σ (M
t
k
)] (56)
= tr
e
B
even
/L
e
A/L
e
B
odd
/L
e
A/L
e
B
even
/L
L2k
= Z
β,L2k
where the fact that σ(R
t
i
), σ(L
t
i
) is used to go from (55) to (56). Now for k L we have
Z
β,L
Z
β,L2k
, so this implies that
π
d
k
j
k!β
k
. Therefore taking k = c log n for some c > 0 and
using k! (k/e)
k
yields
Pr
π
[d
j
log n]
k!β
k
c
k
β
k
log
k
b
=
k!
k
k
e
k
= n
c
.
14
4 Proofs of rapid mixing
Having established the correctness of the PIMC method in section 3 we now procede to analyze
the mixing time of the associated Markov chains. The mixing time analysis is divided into cases.
In section 4.1 we review the main techniques that are common to each case in our analysis. Section
4.3 contains the mixing analysis for transverse Ising models, while sections 4.4 and 4.5 analyze
Hamiltonians of the form (6) and (1) , respectively.
4.1 Markov chains, mixing times, and canonical paths
Let P be the transition matrix of a reversible Markov chain deﬁned on a state space with
stationary distribution π. Let z be the state of the chain at time t = 0, and deﬁne P
t
(z, ·) to be
the distribution of the state of the chain at time t. The variation distance of the chain at time t
starting from the initial state z at time t = 0 is
d
z
(t) := max
A
|P
t
(z, A) π(A)| =
1
2
z
|P
t
(z, z
) π(z
)| (57)
Deﬁne τ(ϵ) to be the worst-case time needed to be within variation distance ϵ of the stationary
distribution,
τ(ϵ) := max
z
min
t
{t : d
z
(t
) ϵ t t
}. (58)
Deﬁne the mixing time to be τ
mix
:= τ(1/4), and note that τ(ϵ) = τ
mix
log ϵ
1
[34].
To bound the mixing time of P we use the method of canonical paths [29]. A path from x
to y is a sequence γ
x,y
= (v
1
, . . . , v
k
) of states with v
1
= x, v
k
= y and P (v
i
, v
i
+ 1) > 0
for i = 1, . . . , k 1. A set P = {γ
x,y
} containing a path that connects every pair of states is
called a set of canonical paths. If = max
γ∈P
|γ| is the maximum length of any path in P and
π
min
:= min
z
π(z), then the mixing time τ
mix
is O(R ln π
1
min
), where the congestion R is deﬁned
as
R := max
(v,v
)E(Ω)
1
π(v)P (v, v
)
γ
x,y
∈P
(v,v
)γ
x,y
π(x)π(y) (59)
In order to compute the congestion we will use a standard technique called encoding. For
an edge e = (v, v
), let P(e) be the set of paths in P which pass through e. An encoding is an
assignment of an injective function η
e
: P(e) to every edge e E(Ω). More generally, it can be
useful to consider injective encodings η
e
: P(e) × Θ that map at most G = |Θ| paths through
the e to each state in [28]. Let η
e
(γ
x,y
) = (η
e
(x, y), θ
e
(x, y)), and suppose there is some M such
that
π(x)π (y) Mπ (v) π (η
e
(x, y)) x, y , (60)
15
then the congestion satisﬁes
R = max
(v,v
)E(Ω)
1
π(v)P (v, v
)
γ
x,y
∈P
(v,v
)γ
x,y
π(x)π(y) (61)
max
(v,v
)E(Ω)
M · G
π(v)P (v, v
)
γ
x,y
∈P
(v,v
)γ
x,y
π(v)π(η
e
(x, y)) (62)
M · G · P
1
min
(63)
where P
min
:= min
(z,z
)E(Ω)
P (z, z
), (62) follows from property (60), and (63) follows from the
injectivity property of the encoding.
4.2 PIMC transition probabilities
To sample from the classical Gibbs distribution (22) the PIMC method uses a Markov chain which
chooses a site in the classical lattice Λ uniformly at random and proposes to ﬂip the bit at that
site to make a transition z z
with an acceptance probability P(z, z
) given by the Metropolis
rule [34],
P (z, z
) :=
1
2nL
min
1,
π(z
)
π(z)
, z ̸= z
1
z
′′
̸=z
P (z, z
′′
), z = z
. (64)
As long as the state space is a connected graph, the transition matrix P with transitions satisfying
(64) will converge to the stationary distribution π.
For the sake of lower bounding the minimum transition probability P
min
from the previous
section it therefore suﬃces to lower bound π(z
)(z) where z and z
diﬀer only on a single site.
Suppose z diﬀers from z
at position (i, j), then after canceling common factors in (22) we have
π(z
)
π(z)
= e
1
L
[
A(z
i
)A(z
i
)
]
z
i1
|e
B
i1
L
|z
i
⟩⟨z
i
|e
B
i
L
|z
i+1
z
i1
|e
B
i1
L
|z
i
⟩⟨z
i
|e
B
i
L
|z
i+1
. (65)
Depending on the oﬀ-diagonal matrix elements in B
i
, the numerator of (65) can potentially be
zero. This means that the state space may not be connected by single bit ﬂips, because the
stationary distribution does not have support on all of . This can be avoided in general by adding
a 1-local transverse ﬁeld ﬁeld with suﬃciently small magnitude Γ to avoid disturbing the resulting
approximation. With this we can lower bound the ratio (65) using the fact that B
i
is a nonnegative
matrix, which implies
z|
I +
B
i
L
|z
z|e
B
i
L
|z
for all z, z
{−1, 1}
n
. (66)
The denominator of (65) is upper bounded by 1, and (66) implies that the numerator is at least
(βΓ/L)
2
, therefore
π(z
)
π(z)
e
4β
L
βΓ
L
2
. (67)
16
4.3 Transverse Ising models
In this section we will describe a set of canonical paths which will be used to show rapid mixing
of PIMC applied to Hamiltonians of the form (5). Let x, y be spin conﬁgurations with
x = (x
1,1
, . . . , x
L,1
, . . . , x
1,n
. . . , x
L,n
) and y = (y
1,1
, . . . , y
L,1
, . . . , y
1,n
. . . , y
L,n
).
It will be convenient to choose an ordering on the space [L] × [n]. We order bits ﬁrst by the
second coordinate and then the ﬁrst, so that (r, l) (s, m) if l m or if r s and l = m. For any
1 r L and 1 m n deﬁne
[ (r, m)] := ({1, . . . , L} × {1, . . . , m 1}) ({1, . . . , r} × {m}) (68)
[> (r, m)] := ({r + 1, . . . , L} × {m}) ({1, . . . , L} × {m + 1, . . . , n}) . (69)
The canonical path γ
x,y
from x to y consists of simply replacing each bit of x with the corre-
sponding bit of y following the above ordering. This can be thought of as n steps, each of which
consists of L substeps that correspond to transitions of the Markov chain. The r-th substep of the
m-th step consists of ﬂipping the bit in the r-th Trotter slice of the m-th qubit if x
r,m
̸= y
r,m
or
leaving it alone if x
r,m
= y
r,m
. In the former case, this contributes an edge (v, v
) E(Ω) with
v = (y
1,1
, . . . , y
r1,m
, x
r,m
, x
r+1,m
, . . . , x
L,n
) (70)
v
= (y
1,1
, . . . , y
r1,m
, y
r,m
, x
r+1,m
. . . , x
L,n
). (71)
In the latter case (i.e. x
r,m
= y
r,m
), this step does not add an edge to the path. Thus the number
of edges in γ
x,y
equals the number of positions on which x and y disagree, which is always nL.
In Section 4.1 we deﬁned an encoding of a path to be an injective map η
e
from the set of paths
through e to the state space . The encoding we’ll use for this path is
η
(v,v
)
(γ
x,y
) := x · y · v, (72)
where · to denote elementwise multiplication. To see that x, y are uniquely determined by v
together with η = η
(v,v
)
(γ
x,y
), observe that v · η = x · y, which speciﬁes the bits that are ﬂipped
along the path γ
x,y
. Moreover, the edge (v, v
) speciﬁes the location (r, m) of the bit being ﬂipped
in the transition from v to v
. Let (v · η)
(r,m)
denote the vector which equals v · η = x · y
on the coordinates in [ (r, m)] and equals 1 on the coordinates in [> (r, m)]; deﬁne (v · η)
>(r,m)
analogously. Then we can recover x by calculating v
·(v ·η)
(r,m)
, and can recover y by calculating
v
· (v · η)
>(r,m)
.
For any subset of the lattice sites A Λ, deﬁne the restriction E
A
β
of the classical energy
function E
β
by restricting the sum in (9) to sites (i, j) A, so that for all z we have
E
A
β
(z) =
(i,j)A
K
z
j
L
z
i,j
β
1
J
i
z
i,j
z
i+1,j
+
kA
K
zz
jk
L
z
i,j
z
i,k
. (73)
Now we compute
E
β
(v) = E
(r,m)
β
(y) + E
>(r,m)
β
(x) + B (74)
E
β
(η) = E
(r,m)
β
(x) + E
>(r,m)
β
(y) + B
(75)
17
where
B := β
1
J
m
(y
1,m
x
L,m
+ y
r,m
x
r+1,m
) +
1
L
r
i=1
K
z
m
y
i,m
+
1
L
L
i=r
K
z
m
x
i,m
+
1
L
r
i=1
K
zz
m,m+1
y
i,m
x
i,m+1
+
1
L
L
i=r
K
zz
m1,m
y
i,m1
x
i,m
+
1
L
L
i=1
m1
j=1
n
k=m
K
zz
jk
y
j
x
k
,
B
:= β
1
J
m
(x
1,m
x
L,m
+ x
r,m
x
r+1,m
) +
1
L
r
i=1
K
z
m
x
i,m
+
1
L
L
i=r
K
z
m
y
i,m
+
1
L
r
i=1
K
zz
m,m+1
x
i,m
y
i,m+1
+
1
L
L
i=r
K
zz
m1,m
x
i,m1
y
i,m
+
1
L
L
i=1
m1
j=1
n
k=m
K
zz
jk
x
j
y
k
.
We also need
E
β
(x) = E
(r,m)
β
(x) + E
>(r,m)
β
(x) + B
X
(76)
E
β
(y) = E
(r,m)
β
(y) + E
>(r,m)
β
(y) + B
Y
(77)
where
B
X
:=
L
i=1
K
z
m
L
x
i,m
+ β
1
J
m
x
i,m
x
i+1,m
+
1
L
r
i=1
K
zz
m,m+1
x
i,m
x
i,m+1
+
1
L
L
i=r
K
zz
m1,m
x
i,m1
x
i,m
+
1
L
L
i=1
m1
j=1
n
k=m
K
zz
jk
x
j
x
k
,
and B
Y
is deﬁned similarly. The value of M needed to satisfy (60) for this encoding can now be
determined by
π(x)π(y)
π(v)π(η)
= e
β
(
E
β
(x)+E
β
(y)E
β
(v)E
β
(η)
)
= e
β(B+B
B
X
B
Y
)
(78)
To upper bound the exponent in the worst-case deﬁne J := max
m=1,...,n
J
m
, then
β(B + B
B
X
B
Y
) 8J + 4β + 4β
n/2
j=1
n
k=
n
2
+1
K
zz
jk
.
To upper bound the sum we use the assumption K
zz
jk
|j k|
(2+ξ)
and regard it as a lower
Riemann sum for an integral,
n/2
j=1
n
k=
n
2
+1
|j k|
(2+ξ)
j=
n
2
j=1
k=n
k=
n
2
+1
djdk |j k|
(2+ξ)
=
1 2
ξ+1
n
ξ
+ (n 1)
ξ
ξ
2
+ ξ
,
18
and since ξ > 0 it suﬃces in (63) to take M = O
e
8J+4β
. If the 1D quantum system has
periodic boundary conditions connecting qubit 1 to qubit n, then the same analysis would show
that M = O
e
8(J+β)
suﬃces for (60). Therefore by (61)-(63) the congestion R satisﬁes
R M · P
1
min
= O
e
8(β+J)
· 2nL · e
2
(
J+
β
L
)
· max
(z,z
)
π(z
π(z)
the factor of 2nL comes from the chain being lazy and the choice of random single-site updates. Note
that max
(z,z
)
(π(z
(z)) is O(L
2
Γ
2
) from (67) and also := max
x,y
|γ
x,y
| = nL and ln π
1
min
=
ln(Z ·e
nLJ+
) nL(J + β + ln 2), using Z 2
nL
. In Section 3.1 we show that L = poly(n, δ
1
mult
)
suﬃces to approximate the partition function with the desired precision. Therefore the mixing
time bounds in Section 4.1 imply that the PIMC method generates samples as was assumed in
Section 3.3, which completes the proof of theorem 1 for models of the form (5).
4.4 Polynomial time mixing in theorem 1
In this section we will prove rapid mixing of PIMC for Hamiltonians of the form (6) by describing
paths between arbitrary states x, y . It will be convenient to represent states in the form
z = [¯z
1
, . . . , ¯z
n
] with ¯z
j
:= (z
1,j
, . . . , z
L,j
) denoting the spin values along the j-th worldline in
conﬁguration z . We say that a “double jump” occurs in conﬁguration z at position (i, j) if
(1 + z
i,j
z
i+1,j
) (1 + z
i,j+1
z
i+1,j+1
) ̸= 0. For any consecutive pair of worldlines j, j + 1, deﬁne the
number of double jumps d(¯z
j
, ¯z
j+1
),
d(¯z
j
, ¯z
j+1
) :=
1
4
L
i=1
(1 + z
i,j
z
i+1,j
) (1 + z
i,j+1
z
i+1,j+1
) . (79)
For any conﬁguration z and worldline j we will deﬁne a path segment to a new conﬁguration
[¯z
1
, . . . , ¯z
j2
, ¯z
j1
, ¯z
′′
j
, ¯z
j+1
, ¯z
j+2
, . . . , ¯z
n
] with all of the double jumps removed from worldline j,
d(¯z
j1
, ¯z
′′
j
) = d(¯z
′′
j
, ¯z
j+1
) = 0 (80)
The double prime notation indicates that the j-th worldline does not contain any double jumps,
while the single prime indicates a worldline that shares no jumps with its double primed neighboring
worldline.
There are many possible paths to choose from for removing double jumps in the i-th world-
line. Due to the form of the Hamitonian (6), changing the placement of a double jump along
the imaginary-time direction has no eﬀect on the energy of the eﬀective classical system, except
possibly for the part which arises from the diagonal part of the Hamiltonian. Similar reasoning
holds for the ordering of the single and double jumps; changing the ordering can only change the
eﬀective classical energy through the diagonal part of the Hamiltonian. Similarly, changing the
imaginary-time position of the double jumps in ¯x
i1
, ¯x
i
will have no eﬀect on the contributions to
the eﬀective classical energy that comes from double jumps occuring in ¯x
i2
, ¯x
i1
. Finally, note
that changing the position of a double jump along the imaginary-time direction can only aﬀect
the classical energy through the diagonal part of the Hamiltonian, reducing the probability of the
conﬁguration by at most e
4β
(the factor of 4 is due to the potential change from the diagonal part
of the Hamiltonian when 3 worldlines are changed arbitrarily).
19
From the above considerations it follows that the imaginary-time position of the double jumps in
a particular worldline can be changed freely with the stationary weight of the resulting conﬁguration
decreased by at most a factor of e
4β
(the factor of 4 bounds the potential change from the
diagonal part of the Hamiltonian when 3 worldlines are changed arbitrarily). Furthermore, since
double jumps arising from the term σ
x
i
σ
x
i+1
are simply bit-ﬂips, two such jumps will cancel out
(“annihilate”) if they coincide. This allows for paths that remove these double jumps in pairs simply
by moving them around, with the additional guarantee that this will not lower the stationary weight
by more than e
4β
. Therefore the path from [¯z
1
, . . . , ¯z
n
] to [¯z
1
, . . . , ¯z
j2
, ¯z
j1
, ¯z
′′
j
, ¯z
j+1
, ¯z
j+2
, . . . , ¯z
n
]
is deﬁned to remove double jumps in pairs by shifting the position of the double jump at the least
imaginary-time position along the imaginary-time direction until it encounters the next double
jump and the pair is annihilated.
The canonical path from x to y proceeds through the worldlines in order 1, . . . , n, prepping
each pair of consecutive worldlines to remove double jumps, then updating the j-th worldline from
¯x
′′
j
to ¯y
′′
j
. The full update of the ﬁrst worldline proceeds as,
[¯x
1
, ¯x
2
, . . . , ¯x
n
] [¯x
′′
1
, ¯x
2
, ¯x
3
, . . . , ¯x
n
] [¯y
′′
1
, ¯x
2
, ¯x
3
, . . . , ¯x
n
] [¯y
′′
1
, ¯x
′′
2
, ¯x
3
, ¯x
4
, . . . , ¯x
n
]
[¯y
′′
1
, ¯y
′′
2
, ¯x
3
, ¯x
4
, . . . , ¯x
n
] [¯y
1
, ¯y
2
, ¯x
3
, ¯x
4
, . . . , ¯x
n
].
The purpose of this series of steps (each of which consists of many single-site updates) is to avoid
passing through intermediate conﬁgurations with too many additional double jumps occuring across
the 1D domain wall formed near the worldlines which are being updated. The worst case interme-
diate conﬁgurations along the paths will have the form,
z = [¯y
1
, . . . , ¯y
j2
, ¯y
j1
, (y
′′
1,j
, . . . , y
′′
r,j
, x
′′
r+1,j
, . . . , x
′′
L,j
), ¯x
j+1
¯x
j+2
, . . . , ¯x
n
]. (81)
This system of paths is highly degenerate (a particular conﬁguration such as (81) will appear in
many paths). To bound the congestion we will use an encoding function η
(z,z
)
: P(z, z
) × Θ.
If η
(z,z
)
(γ
xy
) = (η, θ), then
η = [¯x
1
, . . . , ¯x
j2
, ¯x
j1
, (x
′′
1,j
, . . . , x
′′
r,j
, y
′′
r+1,j
, . . . , y
′′
L,j
), ¯y
j+1
, ¯y
j+2
, . . . , ¯y
n
]. (82)
and θ contains all of the information needed to reconstruct ¯x
j1
, ¯x
j
, ¯x
j+1
, ¯y
j1
, ¯y
j
, ¯y
j+1
from ¯x
j1
, ¯x
′′
j
,
¯x
j+1
, ¯y
j1
, ¯y
′′
j
, ¯y
j+1
. This lost information must specify the location of all of the jumps that have
been removed from these 6 worldlines. This means that the encoding sends exponentially paths to
each state, and so to obtain a polynomial time mixing bound we have to slightly generalize (63)
to account for the fact that the stationary probability on the endpoints of most of these paths is
suﬃciently small.
For any v deﬁne d
j
(v) := d(¯v
j1
, ¯v
j
) + d(¯v
j
, ¯v
j+1
) (which is just the total number of double
jumps in the j-th worldline of v). Let q be the number of additional double jumps in the j-th
worldlines of x and y which are not present in the j-th worldlines of z and η,
q = d
j
(x) + d
j
(y) d
j
(z) d
j
(η) (83)
Deﬁne P
q
z,z
to be the subset of canonical paths passing through (z, z
) with endpoints x and y that
satisfy (83). With these deﬁnitions the congestion (59) becomes
R max
(z,z
)
1
π(z)P (z, z
)
L
q=0
γ
xy
∈P
q
z,z
π(x)π(y), (84)
20
Let Θ =
6L
q=1
Θ
q
where G(q) = |Θ
q
| =
6L
q
is the number of ways to restore q jumps to the
undetermined worldlines. Now suppose we ﬁnd M(q) such that
π(x)π(y) M(q) · π(z)π(η), (85)
for all γ
xy
P
q
z,z
, then this implies
R P
1
min
L
q=0
M(q)
γ
xy
∈P
q
z,z
π(η) (86)
P
1
min
L
q=0
M(q)G(q)
η
π(η) (87)
= P
1
min
L
q=0
M(q)G(q) (88)
To satisfy (85) we need M(q) to satisfy
M(q)
π(x)π(y)
π(z)π(η)
(89)
=
π([¯x
j2
, ¯x
j1
, ¯x
j
, ¯x
j+1
, ¯x
j+2
])π([¯y
j2
, ¯y
j1
, ¯y
j
, ¯y
j+1
, ¯y
j+2
])
π([¯y
j2
, ¯y
j1
, ¯x
j
+ ¯y
j
, ¯x
j+1
, ¯x
j+2
])π([¯x
j2
, ¯x
j1
, ¯x
j
+ ¯y
j
, ¯y
j+1
, ¯y
j+2
])
(90)
e
4β
π([¯x
j1
, ¯x
j
, ¯x
j+1
])π([¯y
j1
, ¯y
j
, ¯y
j+1
])
π([¯y
j
, ¯x
j
+ ¯y
j
, ¯x
j+1
])π([¯x
j1
, ¯x
j
+ ¯y
j
, ¯y
j+1
])
(91)
where (90) to (91) follows because the the terms in the classical energy function which correspond
to oﬀ-diagonal parts of the Hamiltonian involving qubits j 2 and j + 2 are unaﬀected by the
potentially changed spin values in the primed j 1 and j + 1 worldlines. The factor of e
4β
comes
from upper bounding the change in the classical energy function due to the diagonal terms of the
Hamiltonian between qubits j 2, j 1 and j + 1, j + 2. The eﬀect of the diagonal part of the
Hamiltonian on the ratio expression remaining in (91) can be bounded by a factor of e
8β
. The
additional single jumps introduced near the domain wall can increase the ratio by at most L
12
(using (67), and noting that the worst case occurs when 2 new single jumps are introduced in
each of 3 worldlines, in each of z and η). Each of the double jumps in the numerator which are
not present in the denominator decreases the ratio by at least (2β/L). Therefore we may take
M(q) = e
12β
(2β)
q
L