How many qubits are needed for quantum computational
supremacy?
Alexander M. Dalzell
1,2
, Aram W. Harrow
3
, Dax Enshan Koh
4,5
, and Rolando L. La Placa
3
1
Institute for Quantum Information and Matter, California Institute of Technology, Pasadena, CA 91125, USA
2
Department of Physics, Massachusetts Institute of Technology, Cambrdige, Massachusetts 02139, USA
3
Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
4
Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
5
Zapata Computing, Inc., 100 Federal Street, 20th Floor, Boston, Massachusetts 02110, USA
Quantum computational supremacy argu-
ments, which describe a way for a quantum com-
puter to perform a task that cannot also be done
by a classical computer, typically require some
sort of computational assumption related to the
limitations of classical computation. One com-
mon assumption is that the polynomial hierar-
chy (PH) does not collapse, a stronger version of
the statement that P 6= NP, which leads to the
conclusion that any classical simulation of cer-
tain families of quantum circuits requires time
scaling worse than any polynomial in the size
of the circuits. However, the asymptotic na-
ture of this conclusion prevents us from calcu-
lating exactly how many qubits these quantum
circuits must have for their classical simulation
to be intractable on modern classical supercom-
puters. We refine these quantum computational
supremacy arguments and perform such a calcu-
lation by imposing fine-grained versions of the
non-collapse conjecture. Our first two conjec-
tures poly3-NSETH(a) and per-int-NSETH(b)
take specific classical counting problems related
to the number of zeros of a degree-3 polynomial
in n variables over F
2
or the permanent of an
n ×n integer-valued matrix, and assert that any
non-deterministic algorithm that solves them re-
quires 2
cn
time steps, where c {a, b}. A third
conjecture poly3-ave-SBSETH(a
0
) asserts a sim-
ilar statement about average-case algorithms liv-
ing in the exponential-time version of the com-
plexity class SBP. We analyze evidence for these
conjectures and argue that they are plausible
when a = 1/2, b = 0.999 and a
0
= 1/2.
Alexander M. Dalzell: adalzell@caltech.edu
Aram W. Harrow: aram@mit.edu
Dax Enshan Koh: dax.koh@zapatacomputing.com
Rolando L. La Placa: rlaplaca@mit.edu
Imposing poly3-NSETH(1/2) and per-int-
NSETH(0.999), and assuming that the runtime
of a hypothetical quantum circuit simulation al-
gorithm would scale linearly with the number of
gates/constraints/optical elements, we conclude
that Instantaneous Quantum Polynomial-Time
(IQP) circuits with 208 qubits and 500 gates,
Quantum Approximate Optimization Algorithm
(QAOA) circuits with 420 qubits and 500 con-
straints and boson sampling circuits (i.e. lin-
ear optical networks) with 98 photons and 500
optical elements are large enough for the task
of producing samples from their output distri-
butions up to constant multiplicative error to
be intractable on current technology. Imposing
poly3-ave-SBSETH(1/2), we additionally rule
out simulations with constant additive error for
IQP and QAOA circuits of the same size. With-
out the assumption of linearly increasing simula-
tion time, we can make analogous statements for
circuits with slightly fewer qubits but requiring
10
4
to 10
7
gates.
1 Introduction
Quantum computational supremacy (QCS) is the goal
of carrying out a computational task on a quantum
computer that cannot be performed by any classical
computer [51]. Ingredients of this include choosing an
appropriate task, building a quantum device that can
perform it, ideally verifying that it was done correctly,
and finally using arguments from complexity theory to
support the claim that no classical computer can do
the same [32]. Recent advances indicate that the ex-
perimental ingredient might be available very soon
indeed, Google recently reported [7] that it has attained
QCS with a 53 qubit superconducting device (later we
comment more on how this announcement fits in with
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 1
arXiv:1805.05224v3 [quant-ph] 30 Apr 2020
our analysis) but the choice of task, its verification,
and its complexity-theoretic justification remain impor-
tant open theoretical research questions. In particu-
lar, based on the current status of complexity theory,
establishing limitations on classical computing for the
purpose of assessing how close we are to demonstrating
QCS requires making conjectures, and thus we are pre-
sented with a range of choices. If we make stronger
conjectures then we can use a smaller and more re-
stricted quantum computer while ruling out the exis-
tence of more powerful classical simulation algorithms.
Weaker conjectures, on the other hand, are more defen-
sible and can be based on more widely studied mathe-
matical principles.
A leading example of a strong conjecture is the
Quantum Threshold Assumption (QUATH) proposed
by Aaronson and Chen [4], which states that there is no
efficient (i.e. polynomial-time) classical algorithm that
takes as input a description of a random quantum cir-
cuit C, and decides whether |h0
n
|C |0
n
i|
2
is greater or
less than the median of all |h0
n
|C |0
n
i|
2
values, with
success probability at least
1
2
+ Ω(
1
2
n
) over the choice of
C. This conjecture gives one of the strongest possible
statements about the hardness of simulating quantum
circuits that is not already ruled out by known simula-
tions.
A weaker conjecture is the statement that the polyno-
mial hierarchy (PH) does not collapse, which is closely
related to the assertion that P 6= NP. Under this as-
sumption, it has been shown that there cannot exist an
efficient classical algorithm to produce samples from the
output distribution of certain families of quantum cir-
cuits [3, 5, 11, 13, 14, 16, 24, 28, 31, 38, 39, 46, 47, 58],
up to constant multiplicative error. The three fami-
lies we focus on in this work are Instantaneous Quan-
tum Polynomial-time (IQP) circuits [16, 56], Quantum
Approximate Optimization Algorithm (QAOA) circuits
[24, 25], and boson sampling circuits (i.e. linear optical
networks) [3], all of which are among those whose simu-
lation is hard for the PH. Indeed, a key selling point for
work in QCS is that it could be based not on the con-
jectured hardness of a particular quantum circuit family
or even quantum mechanics in general, but instead on
highly plausible, purely classical computational conjec-
tures, such as the non-collapse of the PH.
However, the non-collapse of the PH is in a sense too
weak of a conjecture to be practically useful. The con-
jecture rules out polynomial-time simulation algorithms
for these families of circuits, but does not describe a
concrete superpolynomial lower bound. Thus, assum-
ing only the non-collapse of the PH would be consis-
tent with a simulation of an n-qubit quantum system
running in time n
f(n)
for an arbitrarily slowly growing
function f(n), say log log log log(n). A stronger con-
jecture might lead to a requirement that simulation al-
gorithms be exponential time, meaning that there is
some constant c for which its runtime is 2
cn
. Even
this, though, is not strong enough; it remains possible
that the constant c is sufficiently small that we cannot
rule out a scenario where highly parallelized state-of-
the-art classical supercomputers, which operate at as
many as 10
17
10
18
floating-point operations per second
(FLOPS), are able to simulate any circuit that might be
experimentally realized in the near-term. For example,
Neville et al. [50], as well as Clifford and Clifford [21] re-
cently developed classical algorithms that produce sam-
ples from the output of boson sampling circuits, the for-
mer of which has been shown to simulate n = 30 pho-
tons on a standard laptop in just half an hour, contra-
dicting the belief of many that 20 to 30 photons are suf-
ficient to demonstrate a definitive quantum advantage
over classical computation. A stronger conjecture that
restricts the value of the exponential factor c, a so-called
“fine-grained” conjecture, is needed to move forward on
assessing the viability of QCS protocols. The frame-
work of fine-grained complexity has gathered much in-
terest in its own right in the last decade (see [66] for
survey), yielding unexpected connections between the
fine-grained runtime of solutions to different problems.
In this work, we examine existing QCS arguments for
IQP, QAOA, and boson sampling circuits from a fine-
grained perspective. While many previous arguments
[3, 16, 24] center on the counting complexity class PP,
which can be related to quantum circuits via postselec-
tion [1], the fine-graining process runs more smoothly
when we instead use the counting class coC
=
P, which
was first utilized in the context of QCS in [27, 28]. The
class coC
=
P is the set of languages for which there ex-
ists an efficient classical probabilistic algorithm that ac-
cepts with probability exactly 1/2 only on inputs not in
the language. It can be related to quantum circuits via
non-determinism: coC
=
P = NQP [26], where NQP, a
quantum analogue of NP, is the class of languages for
which there exists an efficient quantum circuit that has
non-zero acceptance probability only on inputs in the
language. Moreover, this equality still holds when we re-
strict NQP to quantum computations with IQP, QAOA,
or boson sampling circuits. Additionally, it is known
that if coC
=
P were to be equal to NP, the PH would
collapse to the second level [26, 60]. Thus, by making
the assumption that there is a problem in coC
=
P that
does not admit a non-deterministic polynomial-time so-
lution, i.e. coC
=
P 6⊂ NP, we conclude that there does
not exist a classical simulation algorithm that samples
from the output distribution of IQP or QAOA circuits
up to constant multiplicative error, for this would imply
NP = NQP = coC
=
P, contradicting the assumption.
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 2
To make a fine-grained version of this statement,
we pick a specific coC
=
P-complete problem related to
the number of zeros of degree-3 polynomials over the
field F
2
, which we call poly3-NONBALANCED, and we as-
sume that poly3-NONBALANCED does not have a non-
deterministic algorithm running in fewer than T (n)
time steps for an explicit function T(n). We choose
T (n) = 2
an1
for a fixed constant a and call this conjec-
ture the degree-3 polynomial Non-deterministic Strong
Exponential Time Hypothesis (poly3-NSETH(a)). It is
clear that poly3-NSETH(a) is false when a > 1 due to
the brute-force deterministic counting algorithm that
iterates through each of the 2
n
possible inputs to the
function f. However, a non-trivial algorithm by Loksh-
tanov, Paturi, Tamaki, Williams and Yu (LPTWY) [42]
gives a better-than-brute-force, deterministic algorithm
for counting zeros to systems of degree-k polynomial
that rules out poly3-NSETH(a) whenever a > 0.9965.
It may be possible to improve this constant while keep-
ing the same basic method but, as we discuss in Ap-
pendix B, we expect any such improvements to be small.
Refuting poly3-NSETH(a) for values of a substantially
below 1 would require the development of novel tech-
niques.
Assuming poly3-NSETH(a), in Section 3 we derive
a fine-grained lower bound on the runtime for any
multiplicative-error classical simulation algorithm for
QAOA and IQP circuits with n qubits. In essence,
what we show is that a classical simulation algorithm
that beats our lower bounds could be used as a subrou-
tine to break poly3-NSETH(a). Then, we repeat the
process for boson sampling circuits with n photons by
replacing poly3-NSETH(a) with a similar conjecture we
call per-int-NSETH(b) involving the permanent of n×n
integer-valued matrices. In this case, however, there is
no known algorithm that can rule out any values of b
when b < 1. Accordingly, the lower bound we derive
on the simulation time of boson sampling circuits when
we take b = 0.999 is essentially tight, matching the run-
time of the naive simulation algorithm up to factors
logarithmic in the total runtime. Recently, a similar ap-
proach was applied to obtain lower bounds on the diffi-
culty of computing output probabilities of quantum cir-
cuits based on the SETH conjecture [34]. Our work has
the disadvantage of using a less well-studied and possi-
bly stronger conjecture (poly3-NSETH(a)) but the ad-
vantage of ruling out classical algorithms for sampling,
i.e. for the same tasks performed by the quantum com-
puter. In Section 3.4, we discuss evidence for our con-
jectures poly3-NSETH(a) and per-int-NSETH(b), and
discuss their relationship to other proposed fine-grained
conjectures.
However, realistic near-term quantum devices, which
are subject to non-negligible noise, sample from a dis-
tribution that differs from ideal with constant additive
error, not constant multiplicative error, and ruling out
additive-error classical simulation algorithms presents
additional technical challenges. Previous work ruled out
polynomial-time simulation algorithms of this type by
conjecturing the non-collapse of the PH and addition-
ally that certain problems we know to be hard for the
PH in the worst case are also hard in the average case
[3, 14, 15, 17]. In Section 4, we present an argument
that also rules out some exponential-time additive-
error simulation algorithms for IQP and QAOA circuits
based on a fine-grained conjecture we call poly3-ave-
SBSETH(a
0
). Our analysis may be viewed generally
as a fine-grained version of previous work; however,
our approach has crucial novel elements that go be-
yond simply being fine-grained. Unlike previous work
on additive-error QCS, our analysis bypasses the need
for Stockmeyer’s theorem (which would incur signifi-
cant fine-grained overhead) and as a result it is per-
haps conceptually simpler. Additionally, in order to
give evidence for poly3-ave-SBSETH(a
0
), which is both
fine-grained and average-case, we make several techni-
cal observations which could be of independent interest.
We prove an average-case black-box query lower bound,
and we give a self-reduction from the worst case to a hy-
brid average/worst-case for the problem of computing
the number of zeros of a degree-3 polynomial over F
2
.
Along the way, we also provide a more complete picture
of the distribution of the number of zeros for uniformly
random degree-3 polynomials over F
2
extending work
that showed the distribution anticoncentrates [17] by
formally proving that all its moments approach those
of a Gaussian distribution as the number of variables in
the polynomial increases.
Finally in Section 5, we show how these lower bounds
lead to estimates for the number of qubits that would
be sufficient for quantum supremacy under our con-
jectures. We conclude that classically simulating (up
to multiplicative error) general IQP circuits with 93/a
qubits, QAOA circuits with 185/a qubits, or boson sam-
pling circuits with 93/b photons would require one cen-
tury for today’s fastest supercomputers, which we con-
sider to be a good measure of intractability. For addi-
tive error we may replace a with a
0
. We believe val-
ues for a, a
0
, and b, leading to plausible conjectures
are a = 1/2, which is substantially below best known
better-than-brute-force algorithms, a
0
= 1/2, which
matches best known algorithms, and b = 0.999, which
is roughly equivalent to asserting that the best known
brute force algorithm is optimal up to subexponential
factors. The relative factor of two in the number of
qubits for QAOA circuits comes from a need for an-
cilla qubits in constructing a QAOA circuit to solve the
poly3-NONBALANCED problem. However, these circuits
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 3
must have 10
4
to 10
7
gates for these bounds to apply.
Under the additional assumption that the complexity of
any simulation algorithm would scale linearly with the
number of gates, we conclude that circuits with only
500 gates and 208 IQP qubits, 420 QAOA qubits, or
98 photons would be sufficient for QCS. By compari-
son, factoring a 1024-bit integer, which is sufficiently
beyond the capabilities of today’s classical computers
running best known algorithms, has been estimated to
require more than 2000 qubits and on the order of 10
11
gates using Shor’s algorithm [54].
2 Background
2.1 Counting complexity and quantum compu-
tational supremacy
The computational assumptions underlying our work
and many previous QCS results utilize a relationship be-
tween quantum circuits and counting complexity classes
that is not seen to exist for classical computation. To
understand this relationship, we quickly review several
definitions and key results.
Let n 1, and f : {0, 1}
n
{0, 1} be a Boolean
function. The gap of f is defined to be
gap(f) =
X
x∈{0,1}
n
(1)
f(x)
. (1)
Note that the number of zeros of f may be written
in terms of the gap, as follows:
|{x {0, 1}
n
: f(x) = 0}| =
1
2
(2
n
+ gap(f)). (2)
Various complexity classes may be defined in terms
of the gap. The class #P is defined to be the class
of functions f : {0, 1}
N for which there exists a
polynomial p and a polynomial-time Turing machine
M such that for all x {0, 1}
,
f(x) = |{y {0, 1}
p(|x|)
: M(x, y) = 0}|
=
1
2
(2
p(|x|)
+ gap(M(x, ·))). (3)
Thus, #P contains functions that count the number of
zeros of a polynomial-time computable Boolean func-
tion.
A language L is in PP if there exists a polynomial p
and a polynomial-time Turing machine M such that for
all x {0, 1}
,
x L |{y {0, 1}
p(|x|)
: M(x, y) = 0}|
< |{y {0, 1}
p(|x|)
: M(x, y) = 1}|
gap(M (x, ·)) < 0. (4)
The class NP is defined similarly, but where
x L |{y {0, 1}
p(|x|)
: M(x, y) = 1}| 6= 0
gap(M (x, ·)) 6= 2
p(|x|)
, (5)
and the class coC
=
P, where
x L |{y {0, 1}
p(|x|)
: M(x, y) = 0}|
6= |{y {0, 1}
p(|x|)
: M(x, y) = 1}|
gap(M (x, ·)) 6= 0. (6)
By interpreting M as a probabilistic algorithm and y
as the random string of bits used by M, we can redefine
NP, PP, and coC
=
P as the classes of languages for which
there exists a polynomial-time Turing machine M whose
acceptance probability on input x is non-zero, at least
1/2, and not equal to 1/2, respectively, only when x is
in the language.
Of these classes, only NP is known to be part of the
polynomial hierarchy (PH), which is a class composed
of an infinite number of levels generalizing the notion
of NP. Furthermore, the other three classes, #P, PP,
and coC
=
P, which we refer to as counting classes, are
known to be hard for the PH: Toda’s theorem [59] tells
us that a #P or PP oracle is sufficient to solve any prob-
lem in the PH in polynomial time, and other work by
Toda and Ogiwara [60] shows that there is a random-
ized reduction from any problem in the PH to a coC
=
P
problem. Stated another way, if PP or coC
=
P were to be
contained in a level of the PH, the PH would necessar-
ily collapse, meaning that the entire PH would be con-
tained within one of its levels. For example, if P = NP,
then the entire PH would be equal to P, its zeroth level.
The assumption that the PH does not collapse is thus
a stronger version of the statement P 6= NP, and it is
widely believed for similar reasons.
Furthermore, these counting classes can be connected
to quantum circuits. Aaronson showed that PP =
PostBQP [1], where PostBQP is the set of problems solv-
able by quantum circuits that have the (unphysical)
power to choose, or postselect the value of measure-
ment outcomes that normally would be probabilistic.
By contrast, classical circuits endowed with this same
power form the class PostBPP which is known to lie in
the third level of the PH [30].
The story is similar for coC
=
P. It was shown that
coC
=
P = NQP [26], where NQP is the quantum gener-
alization of the class NP, defined to be the set of lan-
guages L for which there exists a polynomial-time uni-
formly generated
1
family of circuits {C
x
} such that for
all strings x, x is the language L if and only if the quan-
tum circuit C
x
has a non-zero acceptance probability.
1
We say a circuit family {C
x
} is uniformly generated if there
exists a polynomial-time Turing machine that on input x outputs
the description of the circuit {C
x
}.
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 4
This can also be thought of as PostBQP with one-sided
error. If there existed an efficient classical algorithm to
produce samples from the output distribution of quan-
tum circuits up to constant multiplicative error, then
NP would be equal to NQP, and therefore to coC
=
P,
leading to the collapse of the PH (to the second level
[2628]).
We refer to simulation algorithms of this type as ap-
proximate simulation algorithms with multiplicative er-
ror. Stated precisely, if Q(y) is the probability that a
quantum circuit produces the output y, then a classi-
cal simulation algorithm has multiplicative error if its
probability of producing outcome y is P (y) and
|P (y) Q(y)| Q(y) (7)
for all possible outcomes y.
This contrasts with a simulation algorithm with ad-
ditive error , for which
X
y
|P (y) Q(y)| . (8)
The argument we have sketched only rules out
polynomial-time simulation algorithms with multiplica-
tive error. In Section 4, we discuss arguments
[3, 14, 15, 17] that rule out additive-error simulation
algorithms. These are more complex and require im-
posing additional conjectures.
2.2 IQP Circuits
The previous argument only considers simulation algo-
rithms for arbitrary quantum circuits, but the result can
be extended to also rule out efficient simulation algo-
rithms for subclasses of quantum circuits. An example
of one such subclass is the set of instantaneous quantum
circuits [16, 56]. Problems that can be solved by instan-
taneous quantum circuits with a polynomial number
of gates form the Instantaneous Quantum Polynomial-
time (IQP) complexity class, and we will refer to the
circuits themselves as IQP circuits. There are several
equivalent ways to define the IQP model; we do so as
follows.
An IQP circuit is a circuit where a Hadamard gate
is applied to each qubit at the beginning and end of
the computation, but the rest of the gates, which we
refer to as the internal gates, are diagonal. Each qubit
begins in the |0i state but is immediately sent to the
|+i = H |0i = (|0i+|1i)/
2 state under the Hadamard
operation, and each qubit is measured at the end of
the computation in the computational basis. All of the
internal diagonal gates commute, and therefore can be
implemented in any order. An example of an IQP cir-
cuit is shown in Figure 1.
|0i
H
T H
|0i
H
H
|0i
H T H
|0i
H
T H
|0i
H
H
Figure 1: Example of an IQP circuit. Each qubit must begin
and end with a Hadamard gate, and all internal gates must be
diagonal in the Z basis. The vertical lines indicate controlled
operations, and T refers to the gate T = exp(Z/8).
2.3 Quantum approximate optimization algo-
rithm (QAOA) circuits
Another class of circuits that is not efficiently simula-
ble classically if the polynomial hierarchy does not col-
lapse are quantum approximate optimization algorithm
(QAOA) circuits [24, 25], which have some similarities
with IQP circuits. In a sense, QAOA can be thought of
as multiple rounds of instantaneous operations.
A QAOA circuit operates on n qubits, which begin in
the |0i state but are immediately hit with a Hadamard
gate, as in the IQP model (note that [24, 25] chose a
different but equivalent convention). An integer p, and
angles γ
i
, β
i
for i = 1, 2, . . . , p are chosen. A diago-
nal Hamiltonian C is specified such that C =
P
α
C
α
where each C
α
is a constraint on a small subset of the
bits, meaning for any bit string z, either C
α
|zi = 0 or
C
α
|zi = |zi and only a few bits of z are involved in
determining which is the case. We define the Hamilto-
nian B =
P
n
j=1
X
j
, where X
j
is the Pauli-X operation
applied to qubit j, and let U(H, θ) = exp(iHθ). The
remainder of the circuit consists of applying U (C, γ
1
),
U(B, β
1
), U (C, γ
2
), U (B, β
2
), etc. for a total of 2p op-
erations. Finally the qubits are measured in the com-
putational basis. The general framework for a QAOA
circuit is depicted in Figure 2.
Since U(C, γ
j
) =
Q
α
U(C
α
, γ
j
), the gate U(C, γ
j
)
can be performed as a sequence of commuting gates that
perform the unitaries associated with the constraints
C
α
. Thus each U(C, γ
j
) could form the internal por-
tion of an instantaneous quantum circuit.
Importantly, since the operator C is a sum of many
constraints, it represents a constraint satisfaction prob-
lem. For all bit strings z, C |zi = λ
z
|zi, and a com-
mon problem asks us to find the maximum value of λ
z
.
There is evidence that QAOA circuits might be able to
approximate this optimum value of λ
z
more efficiently
than classical algorithms when p > 1 [25], so in compar-
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 5
|0i
H
U(C, γ
1
)
U(B, β
1
)
. . .
U(C, γ
p
)
U(B, β
p
)
|0i
H
. . .
|0i
H
. . .
|0i
H
. . .
|0i
H
. . .
Figure 2: Framework for a QAOA circuit. Each qubit be-
gins with a Hadamard gate, and then 2p operations are per-
formed alternating between applying Hamiltonian C and apply-
ing Hamiltonian B.
ison to IQP circuits, QAOA circuits might have more
practical value.
2.4 Boson sampling circuits
The IQP and QAOA models are restrictions on the more
general quantum circuit model. But quantum circuits
are not the only model for computation on a quantum
device. Linear quantum optical experiments, for exam-
ple, can be modeled as a system of beam splitters and
phase shifters acting upon identical photons existing in
a certain number of different optical modes. Like the
IQP and QAOA models, the linear optical model is not
believed to be as powerful as the general quantum cir-
cuit model, but under the assumption that the PH does
not collapse, it has been shown that classical simulation
up to constant multiplicative error requires more than
polynomial time [3].
The basic framework [3] for the linear optical model
is as follows. Suppose the system has n photons among
m modes. A state of the system is a superposition
P
R
α
R
|Ri, where each |Ri corresponds to a configura-
tion of the n photons among the m modes, represented
by the tuple R = (r
1
, . . . , r
m
) where each r
i
is a non-
negative integer and
P
i
r
i
= n.
Passing these photons through a linear optical net-
work composed of beam splitters and phase shifters,
which we call a boson sampling circuit, gives rise to a
transformation on this Hilbert space. Valid transfor-
mations can be written as φ(U), where U is any m ×m
unitary and φ is a fixed
n+m1
n
-dimensional represen-
tation of U(m). The unitary U fully describes the choice
of circuit, and any U can be exactly implemented us-
ing only m(m + 1)/2 total beam splitters and phase
shifters [53]. We can define φ(U) by its matrix elements
hR|φ(U) |R
0
i, which will be related to the permanent of
n × n matrices formed from U . The permanent of an
n × n matrix A is given by the formula
Per(A) =
X
σ∈S
n
n
Y
i=1
A
i,σ(i)
, (9)
where S
n
is the group of permutations on {1, . . . , n}.
Then, the matrix elements are
hR|φ(U) |R
0
i =
Per(U
(R,R
0
)
)
p
r
1
! . . . r
m
!r
0
1
! . . . r
0
m
!
, (10)
where U
(R,R
0
)
is the n × n matrix formed by taking r
i
copies of row i and r
0
j
copies of column j from U [3].
As an example, if n = 3, m = 2, R = (2, 1), R
0
= (1, 2),
and
U =
1
2
1 i
i 1
, (11)
then
U
(R,R
0
)
=
1
2
1 i i
1 i i
i 1 1
. (12)
This sampling task is called BosonSampling since it
could (in theory) be applied to any system of not only
photons but any non-interacting bosons.
3 Simulation algorithms with multi-
plicative error
The goal of this section will be to perform a fine-
grained analysis that rules out certain multiplicative-
error classical simulation algorithms for the three pre-
viously mentioned families of quantum circuits.
3.1 Degree-3 polynomials and the problem
poly3-NONBALANCED
Each of the three quantum circuit families that we have
defined are especially amenable to this analysis due to
their natural connection to specific counting problems.
The specific counting problem we will use
for our analysis of IQP and QAOA we call
poly3-NONBALANCED. The input to the problem is
a polynomial over the field F
2
in n variables with
degree at most 3 and no constant term. Since the
only non-zero element in F
2
is 1, every term in the
polynomial has coefficient 1. One example could be
f(z) = z
1
+ z
2
+ z
1
z
2
+ z
1
z
2
z
3
. Evaluating f for
a given string z to determine whether f (z) = 0 or
f(z) = 1 can be done efficiently, but since there are
2
n
possible strings z, the brute-force method takes
exponential time to count the number of strings z for
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 6
which f(z) = 0, or equivalently, to compute gap(f)
where gap is given by Eq. (1). LPTWY [42] gave
a deterministic algorithm for computing the gap of
degree-3 polynomials in time scaling slightly better
than brute force, but it still has exponential time
poly(n)2
0.9965n
.
The question posed by poly3-NONBALANCED is
whether gap(f ) 6= 0, that is, whether f has the same
number of 0 and 1 outputs. Thus, poly3-NONBALANCED
is in the class coC
=
P.
The problem poly3-NONBALANCED is a natural prob-
lem to work with because there is an elegant correspon-
dence between degree-3 polynomials and IQP circuits
involving Pauli Z gates, controlled-Z (CZ) gates, and
controlled-controlled-Z (CCZ) gates [43]. Specifically,
if f is degree 3 then let
U
0
f
=
X
zF
n
2
(1)
f(z)
|zihz| (13)
and let U
f
= H
n
U
0
f
H
n
. We can implement an IQP
circuit C
f
that evaluates to U
f
as follows: if the term
z
i
appears in f , then within the diagonal portion of
C
f
we perform the gate Z on qubit i; if the term z
i
z
j
appears, we perform the CZ gate between qubits i and
j; and if the term z
i
z
j
z
k
appears, we perform the CCZ
gate between the three qubits. For example, for the
polynomial f (z) = z
1
+ z
2
+ z
1
z
2
+ z
1
z
2
z
3
, the circuit
C
f
is shown in Figure 3.
|0i
H Z
H
|0i
H Z
H
|0i
H
H
Figure 3: IQP circuit C
f
corresponding to the degree-3 poly-
nomial f(z) = z
1
+ z
2
+ z
1
z
2
+ z
1
z
2
z
3
. The unitary U
f
im-
plemented by the circuit has the property that
¯
0
U
f
¯
0
=
gap(f)/2
n
where in this case n = 3.
The crucial property of this correspondence is that
¯
0
U
f
¯
0
=
gap(f)
2
n
, where
¯
0
is shorthand for the
starting |0i
n
state. This is easily seen by not-
ing that the initial set of H gates generates the
equal superposition state |Bi =
P
2
n
1
x=0
|xi/
2
n
, so
¯
0
U
f
¯
0
= hB|U
0
f
|Bi where U
0
f
is implemented by
the internal diagonal portion of C
f
. Since U
0
f
ap-
plies a (1) phase to states |zi for which f(z) =
1,
¯
0
U
f
¯
0
=
P
2
n
1
y=0
P
2
n
1
x=0
(1)
f(x)
hy|xi/2
n
=
P
2
n
1
x=0
(1)
f(x)
/2
n
= gap(f)/2
n
. Thus, gap(f) can
be computed by calculating the amplitude of the
¯
0
state produced by the circuit. If we define acceptance
to occur when
¯
0
is obtained upon measurement, then
the circuit C
f
has non-zero acceptance probability only
when gap(f ) 6= 0. This illustrates an explicit NQP al-
gorithm for poly3-NONBALANCED, which was guaranteed
to exist since NQP = coC
=
P.
Also crucial to note is that poly3-NONBALANCED is
complete for the class coC
=
P. This is shown by adapt-
ing Montanaro’s proof [43] that computing gap(f ) for a
degree-3 polynomial f over F
2
is #P-complete. In that
proof, Montanaro reduces from the problem of comput-
ing gap(g) for an arbitrary Boolean function g, which is
#P-complete by definition. Since whether gap(g) 6= 0
is coC
=
P-complete by definition, and the reduction has
gap(g) 6= 0 if and only if gap(f) 6= 0, this also shows
that poly3-NONBALANCED is coC
=
P-complete. One im-
mediate consequence of this fact is that NIQP, the class
NQP restricted to quantum circuits of the IQP type, is
equal to coC
=
P (and hence NQP), since the circuit C
f
is an NIQP solution to a coC
=
P-complete problem.
3.2 The permanent and the problem
per-int-NONZERO
In close analogy to the correspondence between degree-3
polynomials and IQP circuits composed of Z, CZ, and
CCZ gates, there is a correspondence between matrix
permanents and boson sampling circuits.
We have already seen in the definition of the linear
optical model that any amplitude in a boson sampling
circuit on n photons can be recast as the permanent
of an n × n matrix, but the converse is also true: the
permanent of any n ×n matrix can be encoded into the
amplitude of a boson sampling circuit on n photons, up
to a known constant of proportionality.
To see how this works, given an n × n complex ma-
trix A, we will construct a 2n × 2n unitary matrix U
A
whose upper-left n × n block is equal to cA for some
c > 0. If we take R = R
0
= (1
n
, 0
n
) (i.e. 1 repeated
n times, followed by 0 repeated n times), then we will
have Per(U
A(R,R
0
)
) = c
n
Per(A). Thus Per(A) is pro-
portional to a particular boson sampling amplitude with
c an easily computable proportionality constant.
We can choose c to be kAk
1
, where kAk is the
largest singular value of A. (Note that if we want the
proportionality to hold uniformly across some class of A,
we should choose c to satisfy ckAk 1 for all A in this
class.) Then {cA,
p
I
n
c
2
A
A} are Kraus operators
for a valid quantum operation, where I
n
is the n × n
identity matrix, and
cA
p
I
n
c
2
A
A
(14)
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 7
is an isometry. We can extend this isometry to the
following unitary.
U
A
=
"
cA D
p
I
n
c
2
A
A
1
I
n
c
2
A
A
cA
D
#
, (15)
where D =
I
n
+ c
2
A(I
n
c
2
A
A)
1
A
1/2
, which is
well defined since the argument of the inverse square
root is positive definite and Hermitian. Thus the per-
manent of an arbitrary n × n matrix can be encoded
into a boson sampling circuit with n photons and 2n
modes.
The matrix permanent is playing the role for boson
sampling circuits that the gap of degree-3 polynomials
played for IQP circuits with Z, CZ, and CCZ gates;
thus, it is natural to use the computational problem of
determining if the permanent of an integer-valued ma-
trix is not equal to 0, which we call per-int-NONZERO,
in place of poly3-NONBALANCED.
In fact, per-int-NONZERO and poly3-NONBALANCED
have several similarities. For example, like comput-
ing the number of zeros of a degree-3 polynomial,
computing the permanent of an integer-valued ma-
trix is #P-complete, a fact famously first demon-
strated by Valiant [61], and later reproved by Aaron-
son [2] using the linear optical framework. This com-
pleteness extends to per-int-NONZERO, which we show
in Appendix A is coC
=
P-complete by reduction from
poly3-NONBALANCED.
Additionally, for both problems, the best known algo-
rithm is exponential and has runtime close to or equal-
ing 2
n
. While poly3-NONBALANCED can be solved in
poly(n)2
0.9965n
time, the best known algorithm for com-
puting the permanent [12] requires 2
nΩ(
n/ log log(n))
deterministic time, which is only a subexponential im-
provement over the naive algorithm that utilizes Ryser’s
formula for the permanent [55] and requires at least
n2
n
basic arithmetic operations. Using Ryser’s formula
is an improvement over the O(n!) time steps implied
by Eq. (9), but its scaling is reminiscent of that re-
quired to solve a #P problem by brute force. In prin-
ciple it is possible that a faster algorithm exists for
per-int-NONZERO, where we do not care about the ac-
tual value of the permanent, only whether it is nonzero,
but such methods are only known in special cases, such
as nonnegative matrices.
Crucially, our construction shows that boson sam-
pling circuits can solve per-int-NONZERO in non-
deterministic polynomial time, since given A we have
shown how to construct a circuit corresponding to
unitary U
A
with acceptance probability that is non-
zero only when Per(A) is non-zero. This shows that
NBosonP, the linear optical analogue of NIQP, is equal
to coC
=
P and by extension, to NQP.
3.3 Lower bounds for multiplicative-error simu-
lations
3.3.1 For IQP Circuits
In the previous section, we described how to construct
an n-qubit IQP circuit C
f
corresponding to a degree-3
polynomial f over n variables such that the acceptance
probability of C
f
is non-zero if and only if gap(f ) 6= 0.
The number of terms in f, and hence the number of
internal diagonal gates in C
f
is at most
g
1
(n) =
n
3
+
n
2
+
n
1
= (n
3
+ 5n)/6. (16)
Now, suppose we had a classical algorithm that, for any
q, produces samples from the output distribution of any
IQP circuit with q qubits and g
1
(q) internal gates, up to
some multiplicative error constant, in s
1
(q) time steps
for some function s
1
. Throughout, we will assume all
classical algorithms run in the Word RAM model of
computation.
Using this algorithm to simulate the IQP circuit C
f
generates a non-deterministic classical algorithm for
poly3-NONBALANCED running in s
1
(n) time steps. That
is, the classical probabilistic algorithm that results from
this simulation accepts on at least one computational
path if only if the function f is not balanced.
Now, we impose a fine-grained version of the non-
collapse assumption, which we motivate later in the sec-
tion.
Conjecture 1. [poly3-NSETH(a)]
Any non-deterministic classical algorithm (in the
Word RAM model of computation) that solves
poly3-NONBALANCED requires in the worst case 2
an1
time steps, where n is the number of variables in the
poly3-NONBALANCED instance.
In the Word RAM model with word size w, mem-
ory is infinite and basic arithmetic operations on words
of length w take one time step. For concreteness,
we assume that w = log
2
(N) where N is the length
of the input encoding the degree-3 polynomial (N =
O(g
1
(n) log
2
(n))). This way the words can index the
locations where the input data is stored. The Word
RAM model has previously been used for fine-grained
analyses [66] and aims to represent how a real computer
operates as faithfully as possible.
Our conjecture immediately yields a lower bound on
the simulation function s
1
:
s
1
(n) 2
an1
. (17)
This lower bound result relies on poly3-NSETH(a),
which we have not yet motivated. In particular, for
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 8
our qubit calculations we will take the specific value
of a = 1/2. This value is comfortably below the best
known limit a < 0.9965 from [42], whose algorithm is
reproduced in Appendix B. In Section 3.4, we attempt
to provide additional motivation for poly3-NSETH(1/2)
by showing its consistency with other fine-grained con-
jectures.
To our knowledge, the best known upper bound
on s
1
(n) comes from the the naive poly(n)2
n
-time
“Schrödinger-style” simulation algorithm that updates
each of the 2
n
amplitudes describing the state vector
after each gate is performed, so this lower bound is not
tight.
3.3.2 For QAOA circuits
To perform the same analysis for QAOA circuits, we
will turn the IQP circuit C
f
into a QAOA circuit. The
modifications required are straightforward. We set p,
the number of rounds of QAOA computation, equal to
1, and we let rotation angles γ = π/2 and β = π/4.
The first layer of Hadamard gates in C
f
is already built
into the QAOA framework. To implement the Z, CZ,
and CCZ gates we write Z = exp(i2γ |1ih1|), CZ =
exp(i2γ |11ih11|), and CCZ = exp(i2γ |111ih111|)
and build our constraint Hamiltonian C accordingly:
for each Z gate we add two copies of the constraint
that is satisfied only when the bit acted upon is 1; for
each CZ gate we add two copies of the constraint that
is satisfied when both bits involved are 1; and for each
CCZ gate we add two copies of the constraint that is
satisfied when all three bits involved are 1. Now, the
operation exp(C) has exactly the effect of all the Z,
CZ, and CCZ gates combined.
The final step is to implement the final column of
H gates, which is not built into the QAOA framework.
First we define
˜
H =
1
2
1 i
i 1
= exp
i
π
4
X
= exp(X).
(18)
Note that
˜
H = H exp(i
π
4
Z)H, which can be rewrit-
ten H = exp( |1ih1|)H
˜
H (up to an unimportant
global phase). Thus, we can replace the H gates in the
final column of the circuit C
f
with right-hand-side of
the previous expression, the first factor of which can be
performed by adding one copy of the |1ih1| constraint
to C. As described in [24], the second factor (H gate)
can be implemented by introducing an ancilla qubit and
four new constraints between the original qubit and the
ancilla. The original qubit is measured and if outcome
|0i is obtained, the state of the ancilla is H applied to
the input state on the original qubit. Thus we have
teleported the H gate onto the ancilla qubit within the
QAOA framework. This is described in full in [24], and
we reproduce the gadget in Figure 4.
|0i
H
Q
H |ψi
|ψi
˜
H
h0|
Figure 4: Gadget that uses an ancilla qubit to implement the
H gate within the QAOA framework. Here the gate Q is the
diagonal two-qubit gate diag(1, i, 1, i) which can be written as
exp(i
π
2
(3 |01i h01|+|11i h11|)). Thus, it can be implemented
by adding 4 constraints to the constraint Hamiltonian C. The
˜
H gate is implemented by applying the Hamiltonian B with
β = π/4.
After replacing each H gate with the gadget from
Figure 4, every qubit begins with an H gate, is acted
upon by exp(C), and ends with a
˜
H gate, which
is implemented by the exp(B) step of the QAOA
framework. Thus, the resulting circuit is a QAOA cir-
cuit.
We had to introduce one ancilla per qubit in C
f
, so
our QAOA circuit has 2n qubits, instead of just n. How-
ever, it is still true that
¯
0
V
f
¯
0
gap(f), where V
f
is now the unitary implemented by this new QAOA cir-
cuit and
¯
0
is the state |0i
2n
. Hence the acceptance
probability is non-zero if and only if f is not balanced.
The circuit requires 2 constraints per term in the
polynomial f, and an additional 5 constraints per qubit
for the Hadamard gates at the end of the computation
(1 from introducing
˜
H and 4 from the gadget in Figure
4). This yields at most
g
2
(2n) = 2g
1
(n) + 5n
= (n
3
+ 20n)/3 (19)
constraints.
As in the IQP case, we suppose a classical simula-
tion algorithm produces samples from the output dis-
tribution of QAOA circuits with q qubits and g
2
(q)
constraints, up to multiplicative error constant, in
time s
2
(q). Then, under the same conjecture poly3-
NSETH(a), we have
s
2
(2n) 2
an1
, (20)
which simplifies to
s
2
(n) 2
an
2
1
. (21)
The exponentiality of this lower bound is weaker by
a factor of two in comparison to the lower bound for
IQP circuits in Eq. (17), due to the fact that one an-
cilla was introduced per variable to turn the circuit C
f
into a QAOA circuit. However, the best known up-
per bound for QAOA simulation is the naive poly(n)2
n
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 9
brute-force algorithm, as was the case for IQP circuits.
This indicates that one might be able to eliminate the
factor of two by replacing poly3-NONBALANCED with an-
other problem. Such a problem should be solvable by a
QAOA circuit but not by non-deterministic algorithms
running much faster than brute force. We leave this for
future work.
3.3.3 For boson sampling circuits
The story for boson sampling circuits is nearly identi-
cal, except using a conjecture related to the problem
per-int-NONZERO instead of poly3-NONBALANCED.
Given an integer-valued n × n matrix A, we showed
previously how to construct a boson sampling circuit
with n photons, described by unitary U
A
, that has non-
zero acceptance probability only when Per(A) 6= 0. This
circuit has 2n modes, and hence requires at most
g
3
(n) = (2n)(2n + 1)/2 = 2n
2
+ n (22)
circuit elements, i.e. beam splitters and phase shifters.
Paralleling our IQP and QAOA analysis, we suppose
we have a classical algorithm that produces samples
from the output distribution of a boson sampling cir-
cuit with q photons and g
3
(q) total beam splitters and
phase shifters, up to some multiplicative error constant,
in s
3
(q) time steps for some function s
3
.
Using this algorithm to simulate the boson sampling
circuit described by U
A
generates a non-deterministic
algorithm for per-int-NONZERO running in s
3
(n) time
steps.
We replace poly3-NSETH(a) with the version for
per-int-NONZERO
Conjecture 2. [per-int-NSETH(b)] Any non-
deterministic classical algorithm (in the Word RAM
model of computation) that solves per-int-NONZERO
requires in the worst case 2
bn1
time steps, where n is
the number of rows in the per-int-NONZERO instance.
Unlike poly3-NSETH(a), as far as we are aware there
is no known better-than-brute force algorithm ruling
out the conjecture for any value b < 1. The algorithm in
[12], which is better-than-brute-force by subexponential
factors rules out b = 1.
This conjecture implies a lower bound on the simula-
tion function
s
3
(n) 2
bn1
. (23)
Producing samples from the output of boson sam-
pling circuits naively requires one to compute the per-
manent for many of the amplitudes. However, in the
case of a binary output, where acceptance is defined
to correspond to exactly one photon configuration, only
one permanent need be calculated the one associated
with the accepting configuration. Thus the asymptotic
scaling of this lower bound when b = 1 δ is essentially
tight with naive simulation methods as δ 0, since
Ryser’s formula can be used to evaluate the permanent
and simulate a boson sampling circuit in O(n2
n
) time
steps.
3.4 Evidence for conjectures
Where previous quantum computational supremacy ar-
guments only ruled out simulation algorithms with
polynomial runtime, our analysis also rules out some al-
gorithms with exponential runtime. These conclusions
come at the expense of imposing stronger, fine-grained
conjectures, but such assumptions are necessary for ex-
tracting the fine-grained lower bounds we seek.
Thus, our conjectures are necessarily less plausible
than the statement that the PH does not collapse, and
definitively proving our conjectures is impossible with-
out simultaneously settling major open problems in
complexity theory. However, we can give evidence for
these conjectures by thinking about how one might try
to refute them, and showing how they fit into the land-
scape of previously proposed fine-grained conjectures.
We start with poly3-NSETH(a) and discuss why cer-
tain techniques for refuting it cannot work, how cur-
rent techniques fall short of refuting it for values of a
significantly lower than 1, and why we should expect
that completely different techniques would be needed
to produce algorithms that rule out a < 1/2. Then, we
discuss how poly3-NSETH(a) fits in consistently with
other results in fine-grained complexity theory. Finally,
we discuss how per-int-NSETH(b) is similar and differ-
ent in these regards.
The conjecture poly3-NSETH(a) asserts that deter-
mining whether a Boolean function is balanced takes
non-deterministic exponential time, where that Boolean
function takes the form of a degree-3 polynomial. It is
worth noting that we can prove this conjecture with
a = 1 for Boolean functions in the black-box setting,
where the non-deterministic algorithm can only inter-
act with the Boolean function by querying its value on
certain inputs.
Theorem 1. Let f : {0, 1}
n
{0, 1} be a Boolean
function. A non-deterministic algorithm with black-box
access to f that accepts iff |{x : f (x) = 0}| 6= 2
n1
, that
is, iff f is not balanced, must make at least 2
n1
+ 1
queries to f. Moreover, this bound is optimal.
Proof. First we prove the lower bound on the number
of queries. Suppose M is a non-deterministic algorithm
with black-box access to f that accepts whenever f
is not balanced. Let f
0
be a Boolean function that
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 10
is not balanced; thus, at least one computation path
of M accepts if f = f
0
. Choose one such path and
let S {0, 1}
n
be the set of queries made by M on
this computation path. Suppose for contradiction that
|S| 2
n1
. Since at most half the possible inputs are
in S, it is possible to construct another Boolean func-
tion f
1
that is balanced and agrees with f
0
on the set
S. Since f
0
and f
1
agree on S, the computation that
accepted when f = f
0
will proceed identically and ac-
cept when f = f
1
. Thus M accepts when f = f
1
, which
is balanced, yielding a contradiction. We conclude that
|S| 2
n1
+ 1.
We can see that it is possible for M to achieve
this bound as follows: M non-deterministically chooses
2
n1
+ 1 of the 2
n
possible inputs to f and queries f
on these inputs. If all of the queries yield the same
value, it accepts. Otherwise, it rejects. If f is balanced,
M will reject no matter which set of queries it makes,
whereas if f is not balanced, there is at least one set of
2
n1
+ 1 inputs on which f takes the same value and
M will accept, so the algorithm succeeds.
Theorem 1 shows that the poly3-NSETH(1) conjec-
ture cannot be disproved using an algorithm that simply
evaluates the degree-3 polynomial f for different inputs.
Indeed, the algorithm by LPTWY [42] exploits the fact
that the Boolean functions are degree-3 polynomials in
order to refute poly3-NSETH(a) for a > 0.9965. Refut-
ing poly3-NSETH(a) for even smaller values of a would
require more techniques that further utilize the struc-
ture associated with the poly3-NONBALANCED problem.
In fact, the algorithm in [42] is substantially more
general than what is necessary for our purposes; their
deterministic algorithm counts the number of solutions
to a system of m degree-k polynomial equations over fi-
nite field F
q
. The problem poly3-NONBALANCED is con-
cerned only with the case where m = 1, k = 3, q = 2,
and all that matters is whether the number of zeros is
equal to half the possible inputs. For this special case,
the algorithm is considerably simpler, and we reproduce
it in Appendix B. The basic technique is as follows: we
fix some fraction (1 δ) of the n variables and call R
the number of zeros of f consistent with those fixed val-
ues. We can compute in time O(2
0.15n+0.85δn
) a repre-
sentation of R as a polynomial with integer coefficients
over the (1 δ)n fixed variables. Then, (even though
R has an exponential number of monomials in its rep-
resentation) it is noted that one can evaluate R for all
2
(1δ)n
possible inputs in total time O(2
(1δ)n
), as long
as δ < 0.0035. By evaluating and summing R on all of
its inputs, we compute the total number of zeros, and
the total runtime is O(2
(1δ)n
), which is better than
brute force when we choose δ positive.
Note that this algorithm is deterministic, and giving
it the power of non-determinism can only make it faster.
However, by inspection of the algorithm from [42], we
see no clear way for non-determinism to be directly uti-
lized to further accelerate the algorithm. This is con-
sistent with the finding in Theorem 1 that (asymptot-
ically speaking) the best non-deterministic algorithms
are no faster than the best deterministic algorithms
for the NONBALANCED problem in the black-box setting.
However, it is worth mentioning that a gap between
best known deterministic and non-deterministic algo-
rithms has been observed for certain coNP-hard prob-
lems, for example in [67], where the problem of deter-
mining the unsatisfiability of a system of m degree-2
polynomials in n variables over F
2
is shown to be possi-
ble in poly(n)2
n/2
non-deterministic time, an improve-
ment over best known O(2
0.8765n
) deterministic solu-
tion from LPTWY [42]. Additionally, when randomness
is added to non-determinism yielding what is known
as a Merlin-Arthur (MA) protocol, similar results can
be shown even for #P-hard problems like computing
the permanent of n × n matrices, which is possible in
poly(n)2
n/2
MA time [65] compared to O(2
n
) determin-
istic time. These results cast some doubt on the as-
sumption that non-determinism is a useless resource for
solving poly3-NONBALANCED or per-int-NONZERO, al-
though notably none of these methods appear to break
the Ω(2
n/2
) barrier.
Additionally, we mention that the authors of LPTWY
[42] were concerned primarily with showing that better-
than-brute-force algorithms were possible, perhaps leav-
ing room for optimization of their constants. In our re-
production of their algorithm when m = 1, k = 3, and
q = 2 in Appendix B, we have followed their analysis
and optimized the constants where possible yielding a
slightly better runtime than what is stated explicitly in
their paper.
The conclusion is that techniques exist to rule out
poly3-NSETH(1) but not for values of a much lower
than 1, even after some attempt at optimization. More-
over, we now provide evidence that drastically differ-
ent techniques would need to be used if one wished to
rule out poly3-NSETH(1/2); that is, ruling out poly3-
NSETH(a) when a < 1/2 could not be done by mak-
ing only slight modifications or improvements using the
same approach from [42]. Our reasoning stems from the
tradeoff between the two contributions to the runtime
of the algorithm: first, the computation of the polyno-
mial representation for R; and second, the evaluation
of R for all 2
(1δ)n
possible inputs. When δ is smaller
than 0.0035, the second contribution dominates for a to-
tal runtime O(2
(1δ)n
). However, if this step were to be
improved to allow for δ to exceed 1/2, the first contribu-
tion to the runtime would begin to dominate for a total
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 11
runtime of O(2
0.15n+0.85δn
) > O(2
n/2
). In other words,
if we try to fix fewer than half of the variables, com-
puting the representation of R (which involves cycling
through the 2
δn
strings of unfixed variables) will neces-
sarily take longer than evaluating R and ultimately it
will be impossible to produce an algorithm with runtime
below 2
n/2
through this method. While this is no proof,
it increases the plausibility of poly3-NSETH(1/2).
Next we discuss how poly3-NSETH(a) contrasts with
previously proposed fine-grained conjectures. Well-
known conjectures include the Exponential Time Hy-
pothesis (ETH), which claims that there exists some c
such that no O(2
cn
) time algorithm for k-SAT exists,
and the Strong Exponential-Time Hypothesis (SETH)
[19, 35], which states that for any one can choose k
large enough such that there is no O(2
(1)n
) algorithm
for k-SAT. In other words, SETH states that no algo-
rithm for k-SAT does substantially better than the naive
brute-force algorithm when k is unbounded.
There is substantial evidence for ETH and SETH,
even beyond the fact that decades of research on the SAT
problem have failed to refute them. For instance, SETH
implies fine-grained lower bounds on problems in P that
match long-established upper bounds. One example is
the orthogonal vectors (OV) problem, which asks if a
set of n vectors has a pair that is orthogonal. There
is a brute-force O(n
2
) solution to OV, but O(n
2
) is
impossible for any > 0 assuming SETH [63, 64]. Thus,
SETH being true would provide a satisfying rationale
for why attempts to find faster algorithms for problems
like OV have failed. On the other hand, the refutation of
SETH would imply the existence of novel circuit lower
bounds [36].
There are yet more fine-grained conjectures: replac-
ing the problem k-SAT with #k-SAT yields #ETH and
#SETH, the counting versions of ETH and SETH.
These hypotheses have interesting consequences of their
own; for example, #ETH implies that computing the
permanent cannot be done in subexponential time [23].
Additionally, if k-TAUT is the question of whether a
k-DNF formula is satisfied by all its inputs (which is
coNP-complete), then the statement that no O(2
(1)n
)
algorithm exists for k-TAUT with unbounded k is called
the Non-deterministic Strong Exponential Time Hy-
pothesis (NSETH) [20]. Like SETH, NSETH’s refuta-
tion would imply circuit lower bounds [20, 36]. Addi-
tionally, NSETH is consistent with unconditional lower
bounds that have been established in proof complexity
[9, 52].
The conjecture poly3-NSETH(a) is similar to NSETH
in that it asserts the non-existence of non-deterministic
algorithms for a problem that is hard for coNP (indeed,
poly3-NONBALANCED is hard for the whole PH), and it
is similar to #SETH in that it considers a counting
problem. It is different from all of these conjectures
because it is not based on satisfiability formulas, but
rather on degree-3 polynomials over the field F
2
, a prob-
lem that has been far less studied. Additionally, poly3-
NSETH(a) goes beyond previous conjectures to assert
not only that algorithms require O(2
an
) time, but that
they actually require at least 2
an1
time steps. It is
not conventional to worry about constant prefactors as
we have in this analysis, but doing so is necessary to
perform practical runtime estimates. On this front, our
analysis is robust in the sense that if poly3-NSETH(a)
or per-int-NSETH(b) were to fail by only a constant
prefactor, the number of additional qubits we would es-
timate would increase only logarithmically in that con-
stant.
We are unable to show that poly3-NSETH(a) is for-
mally implied by any of the previously introduced con-
jectures. However, assuming ETH, we can prove that
the deterministic version of poly3-NSETH(a) holds for
at least some a, i.e. that there does not exist a determin-
istic O(2
an
) time algorithm for poly3-NONBALANCED.
Theorem 2. Assuming ETH, there exists a constant
a such that every deterministic algorithm that solves
poly3-NONBALANCED requires Ω(2
an
) time.
Proof. Suppose for contradiction that no such con-
stant existed; thus for any a there is an algorithm for
poly3-NONBALANCED running in less than O(2
an
) time.
We give a reduction from k-SAT to poly3-NONBALANCED
showing that this leads to a contradiction with ETH.
The reduction is similar to that from [43] showing
that counting the number of zeros of a degree-3 poly-
nomial is #P-complete. Given a k-SAT instance φ with
n variables and m clauses, we can use the sparsifica-
tion lemma to assume that m is O(n) [23, 35]. Then
we introduce one additional variable x
n+1
and examine
the formula φ
0
= x
n+1
(1 φ). Note that φ is satisfi-
able if and only if φ
0
is not balanced. There is a quan-
tum circuit C made up only of O(m) CCZ and O(m)
Hadamard gates, that computes the value of φ
0
(z) into
an auxiliary register for any input z on the first n + 1
qubits. The circuit also requires O(m) ancilla qubits
that begin and end in the |0i state. As described in
[43], the circuit can be associated with a degree-3 poly-
nomial f the H gates are replaced by gadgets simi-
lar to that in Figure 4, introducing more ancilla qubits
but turning the circuit into an IQP circuit that has
O(n + m) variables, such that gap(f) = gap(φ
0
). Thus,
given any constant c, we can choose a small enough such
that a O(2
an
) time algorithm for determining whether
f is not balanced implies a O(2
cn
) algorithm for k-SAT.
Since we assumed the existence of the former, ETH is
contradicted, proving the claim.
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 12
A variant of this claim shows that, like computing the
permanent, computing the number of zeros to a degree-
3 polynomial over F
2
cannot be done in subexponential
time, assuming #ETH. This observation provides a link
between poly3-NSETH(a) and per-int-NSETH(b).
In comparison to poly3-NSETH(a), per-int-
NSETH(b) has advantages and disadvantages. There
is no analogous black-box argument we can make for
per-int-NSETH(b). On the other hand, there is no
known non-trivial algorithm that rules out the con-
jecture for any b < 1, making it possible that solving
per-int-NONZERO with Ryser’s formula is essentially
optimal. The possible optimality of Ryser’s formula is
also bolstered by work in [37], where it is uncondition-
ally proven that a monotone circuit requires n(2
n1
1)
multiplications to compute the permanent, essentially
matching the complexity of Ryser’s formula. This was
recently extended to show similar lower bounds on
monotone circuits that estimate output amplitudes of
quantum circuits [34]. Of course, per-int-NSETH(1δ)
for vanishing δ goes further and asserts that computa-
tion via Ryser’s formula is optimal even with the power
of non-determinism. Thus our conjecture formalizes the
statement that non-determinism cannot significantly
speed up computing whether the permanent is nonzero.
4 Simulation algorithms with additive
error
4.1 Overview
The conclusions in the previous section state that any
classical algorithm running in time less than the stated
lower bound cannot sample from a distribution that has
constant multiplicative error with respect to the true
output distribution of the quantum circuit. However,
real-world near-term quantum devices also cannot sam-
ple from such a distribution since each gate they per-
form has imperfect fidelity. It is more accurate to model
the device distribution as having some constant additive
error with respect to the true circuit distribution, as in
the sense of Eq. (8). Thus we would like to also rule
out classical simulation algorithms with constant addi-
tive error. At first glance, this seems challenging since
the quantum advantage our results exploit rests on the
ability to construct a quantum state with an acceptance
probability that is exactly 0 in one case and not 0 in an-
other. Constant multiplicative error preserves whether
or not an output probability is 0, but constant additive
error does not.
With minor modifications, our multiplicative-error
analysis could be made robust to a situation where the
additive error on individual output probabilities is ex-
ponentially small. When the total additive error is a
constant (i.e. Eq. (8) with = O(1)), the typical amount
of error on a randomly chosen output probability will
indeed be exponentially small (since there are exponen-
tially many possible outputs); however, it is possible
that a constant amount of error could be concentrated
on the particular output in the distribution that dictates
when to accept, while most others have exponentially
small error. This issue, or something similar to it, arises
whether one uses NQP = coC
=
P or PostBQP = PP in
their argument for quantum computational supremacy.
In previous work on QCS, this situation has been han-
dled by hiding the accepting output randomly among
the exponentially many possible outputs, which makes
it highly likely that the amount of error on that out-
put probability is typical. Then, it must be conjectured
that the hard classical problem solved by the quantum
circuit is still hard in the average case; i.e. it cannot
be solved quickly even when the algorithm is allowed to
fail on some small fraction of the instances. More con-
cretely, previous work conjectured that it is #P-hard to
approximate gap(f)
2
on some sufficiently large constant
fraction of n-variable degree-3 polynomial instances f
[17], or analogously, that it is #P-hard to approximate
Per(A)
2
on some sufficiently large constant fraction of
n × n-matrix instances A [3].
We present a fine-grained version of this argument.
As in the multiplicative case, we avoid a direct fine-
graining of the exact logical path relying on PP =
PostBQP taken by [3, 17]. The central reason for this
choice is so that we can avoid the step in the analysis
where one classically estimates the acceptance probabil-
ity of a classical randomized algorithm using resources
that fall within the polynomial hierarchy (Stockmeyer’s
theorem [57]). In particular, this estimation can be
done efficiently with an oracle in the second level of
the PH, i.e. P
Σ
P
2
, or with randomness and an NP ora-
cle, i.e. BPP
NP
. This is a costly step in the fine-grained
world since constructing this algorithm explicitly would
introduce significant overhead that would make our fine-
grained lower bounds, and hence qubit estimates, worse.
Additionally, the fine-grained conjecture we would need
to make would pertain to algorithms lying in the third
level of the exponential-time hierarchy (the natural gen-
eralization of the PH) where little is known about fine-
grained algorithms.
However, it is apparent that using the NQP = coC
=
P
route will also be insufficient for this purpose. One
reason was previously mentioned: NQP is not robust
to additive error. But another important reason is
that the problem poly3-NONBALANCED is an average-
case easy problem simply because all but an exponen-
tially small fraction of degree-3 polynomials are not bal-
anced (under plausible assumptions about the distribu-
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 13
tion of gap(f) with respect to random f). Hence one
can construct a deterministic classical algorithm that
outputs YES for every input instance f and succeeds
on an overwhelmingly large fraction of instances.
Instead, we will use a third relationship between
quantum and classical counting complexity to underlie
our fine-grained, additive-error analysis: SBQP = A
0
PP
[40]. The class SBQP is the quantum analogue of SBP,
which is, in a sense, an error-robust version of NP that
formally lies between NP and MA [18], within the second
level of the PH. Meanwhile A
0
PP is a classical counting
complexity class that is hard for the PH [40, 62], similar
to PP and coC
=
P. In the context of QCS arguments,
the class SBP was first used in [27, 28].
The specific counting problem we will solve with
quantum circuits we call poly3-SGAP and still pertains
to the gap of degree-3 polynomials over F
2
, but the
question is different: now, given f, we ask if gap(f)
2
is
less than 2
n2
or if it is at least 2
n1
, promised that
one or the other is the case. We are able to prove that
this problem is not a naively easy average-case problem
in the way poly3-NONBALANCED is, and we show how
the ability to produce samples from quantum circuits
with some small constant additive error would lead to
a classical algorithm with SB power” (replacing non-
deterministic power in the analysis from previous sec-
tions) that solves the problem on a large fraction of
instances. Our average-case conjecture asserts that any
classical algorithm with SB power that succeeds on a
large fraction of instances must have at least some ex-
ponential runtime, and this yields a lower bound on
the sampling time for IQP and QAOA circuits. Be-
yond providing us with a fine-grained lower bound from
which we can estimate the number of qubits for QCS,
this analysis reveals a new perspective on additive-error
QCS arguments one that avoids the need for Stock-
meyer counting by replacing estimation of the quantity
gap(f)
2
with a decision problem of similar flavor.
We do not perform a fine-grained analysis of additive-
error simulations for boson sampling circuits. However,
we believe it would be possible to perform such an anal-
ysis with some additional work. This would require
conjecturing the fine-grained average-case hardness of
deciding whether Per(A)
2
is at least a certain thresh-
old, or at most a smaller threshold, for random matri-
ces A. While there are no fundamental barriers to this
analysis, the details would be more complicated than
for IQP and QAOA, stemming from two factors. First,
for IQP we can draw from a discrete set, the set of
degree-3 polynomials. However, for boson sampling in
the additive-error setting, we must draw from a contin-
uous set, the set of Gaussian complex matrices. In a
fine-grained analysis, we would need to describe a spe-
cific implementation that discretizes this set. Second,
and more importantly, unlike the hiding mechanism for
IQP, the hiding mechanism for boson sampling is only
approximate, and it is conceptually more complicated
than for IQP. For boson sampling, given a matrix A, one
hides which amplitude they are interested in by drawing
a large unitary U randomly from the Haar distribution,
and post-selecting on choices for which the matrix A is
a submatrix of U. If U is sufficiently larger than A, and
the entries of A are i.i.d. Gaussian, then the position of
the submatrix A in U will be approximately hidden [3].
4.2 Quantum computational supremacy with the
complexity class SBP
We now provide the definition of the classical complex-
ity class SBP. A language L is in SBP if there exist poly-
nomials p and q, a constant c > 1, and a polynomial-
time Turing machine M such that for all x {0, 1}
,
x L = |{y {0, 1}
p(|x|)
: M(x, y) = 1}| 2
q(|x|)
x 6∈ L = |{y {0, 1}
p(|x|)
: M(x, y) = 1}|
2
q(|x|)
c
.
Thinking of M as a randomized machine with random
bits y, we see that the acceptance probability of M
must be greater than some exponentially small thresh-
old 2
(p(|x|)q(|x|))
when the input x is in L, and the ac-
ceptance probability must be smaller than some thresh-
old that is a factor of c smaller on all inputs x not in
the language. One may view this as a more error-robust
version of NP, which can also be worded in terms of
thresholds: the machine must accept with at least some
exponentially small probability (corresponding to a sin-
gle computation path) when x L and at most some
smaller probability (namely 0) when x 6∈ L. This inter-
pretation makes it clear that the class SBP contains NP.
It is also known that SBP is contained in the second
level of the PH [18]. Just as we might say a classical
algorithm has the power of non-determinism to mean
that it accepts its input if at least one of its compu-
tation paths accepts, we will say a classical algorithm
has SB power” and mean that it accepts its input as
long as some exponentially small fraction of its compu-
tation paths accept, and rejects if fewer than 1/c times
as many computation paths accept. Of course, there is
a third possibility, that the number of accepting paths
lies in the window between the two thresholds, and in
this case the algorithm with SB power can act arbitrar-
ily.
The quantum complexity class SBQP is the quantum
generalization, defined roughly as languages that can be
solved by polynomial-time quantum computation with
SB power. It was shown that SBQP = A
0
PP [40] where
A
0
PP is a classical counting complexity class defined
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 14
[62] to contain languages L for which there are polyno-
mials p and q, a constant c > 1, and a polynomial-time
Turing machine M such that
x L = gap(M(x, ·)) 2
q(|x|)
x 6∈ L = gap(M(x, ·))
2
q(|x|)
c
.
The similarity to the definition of SBP is apparent.
Formally speaking, we will actually be using promise
versions of these complexity classes. For promise
classes, languages are replaced by pairs of disjoint sets
(L
Y ES
, L
NO
), where acceptance criteria must apply for
inputs in L
Y ES
and rejection criteria for inputs in L
NO
but there may be inputs that lie in neither set (and
there are no requirements on how the Turing machine
or quantum circuits act on these inputs).
It was shown that the class PromiseSBQP =
PromiseA
0
PP is hard for the PH, i.e. PH P
PromiseSBQP
[40]. This yields an alternate route to quantum com-
putational supremacy: if one could classically simu-
late quantum circuits (with multiplicative error), then
PromiseSBQP = PromiseSBP, which would imply the
PH collapses to the third level. By exploiting NQP
SBQP this statement could be improved to yield a col-
lapse to the second level [27, 28].
4.3 The problem poly3-SGAP
To perform a fine-grained analysis we pick a spe-
cific problem suited to an analysis using SBP, which
we call poly3-SGAP. This problem plays the role of
poly3-NONBALANCED from the previous sections.
The input to poly3-SGAP is a degree-3 polynomial
f over F
2
with n variables and no constant term, and
the output should be YES if (gap(f)/2
n
)
2
2
n1
,
and NO if (gap(f)/2
n
)
2
2
n2
. This is a promise
problem; we are promised that the input f does not
satisfy 2
n2
< (gap(f )/2
n
)
2
< 2
n1
. Refer to Fig-
ure 5 for a schematic of the division between YES and
NO instances under the assumption that gap(f) is dis-
tributed as a Gaussian random variable. We let the set
F
prom
n
refer to all degree-3 polynomials with n variables
and no constant term that satisfy the promise, and if
f F
prom
n
, we let S(f) = 0 if the answer to poly3-SGAP
is NO and S(f) = 1 if it is YES; thus poly3-SGAP is
simply the task of computing the (partial) function S.
Since one can reduce any Boolean function to a
degree-3 polynomial over F
2
, poly3-SGAP is a proto-
typical PromiseSBQP problem. Indeed poly3-SGAP cap-
tures the hardness of PromiseSBQP since a poly3-SGAP
oracle is sufficient to simulate a #P oracle, which can
be seen by adapting the proof in [40] that P
#P
=
P
PromiseSBQP
.
Figure 5: Distribution of 2
n
gap(f)
2
for randomly drawn f,
under the assumption that gap(f ) is distributed as a Gaussian
with mean 0 and standard deviation 2
n/2
. For polynomials f
falling in the red (blue) area, occupying roughly 38% (48%)
of the distribution, the answer to poly3-SGAP should be NO
(YES). Polynomials falling in the gray area do not satisfy the
promise.
By the correspondence between IQP circuits and
degree-3 polynomials, it is also clear that poly3-SGAP
can be solved by an IQP circuit equipped with SB
power, using only n qubits and g
1
(n) gates. Given the
input f, one simply needs to run the circuit C
f
, whose
acceptance probability is exactly (gap(f)/2
n
)
2
; the ac-
ceptance probability is greater than 2
n1
when the
YES criteria are met, and it is less than 2
n2
when
the NO criteria are met.
Unlike non-determinism, SB power affords the brute-
force algorithm a quadratic speedup; poly3-SGAP can
be solved in poly(n)2
0.5n
classical SB time. How this is
done is discussed later in Theorem 5.
4.4 Lower bounds for additive-error simulation
4.4.1 For IQP circuits
Deriving fine-grained lower bounds for the additive-
error case will follow the same structure as the
multiplicative-error case, with a few differences. We
begin by assuming we have a classical simulation al-
gorithm that for some fixed constant , produces sam-
ples up to additive error from any IQP circuit with q
qubits and g
1
(q) internal gates in s
4
(q) time steps. We
will use this classical algorithm as a subroutine to solve
poly3-SGAP in the average case.
Given as input a random instance f from F
prom
n
,
we let
¯
f refer to the degree-3 polynomial that has no
degree-1 terms, but whose degree-2 and degree-3 terms
are the same as f . Moreover let the set [
¯
f] contain all
polynomials g for which
¯
f = ¯g. Let the n-bit string
δ
f
{0, 1}
n
be defined such that δ
f
i
= 1 if and only
if the degree-1 term x
i
appears in the polynomial f(x).
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 15
Thus f(x) =
¯
f(x) + δ
f
· x, where δ
f
· x =
P
i
δ
f
i
x
i
. We
construct the IQP circuit C
¯
f
which has n qubits and
g
1
(n) gates. It was noted previously that
¯
0
U
¯
f
¯
0
= gap(
¯
f)/2
n
, (24)
where U
f
is the unitary implemented by the circuit C
f
and
¯
0
is the all |0i starting state. This can easily be
generalized to additionally show that
δ
f
U
¯
f
¯
0
= gap(f)/2
n
, (25)
that is, the probability of measuring outcome δ
f
after
running U
¯
f
is exactly proportional to gap(f)
2
.
Thus we construct the following classical algorithm
A: given f , we use our classical simulation algorithm to
produce a sample from an output distribution that ap-
proximates that of C
¯
f
up to additive error. If outcome
δ
f
is obtained, A accepts, and otherwise A rejects. For
any z {0, 1}
n
, let P
¯
f
(z) be the probability the clas-
sical simulation algorithm produces outcome z on the
circuit C
¯
f
and let Q
¯
f
(z) be probability associated with
the true distribution. So the acceptance probability of
the classical algorithm we constructed is exactly P
¯
f
(δ
f
).
We claim that the algorithm A, when considered as an
algorithm with so-called SB power, solves poly3-SGAP
on a large fraction (1 60) of instances. This is for-
mally stated as follows.
Theorem 3. For f F
prom
n
, let A(f ) evaluate to 1
if S(f ) = 1 and P
¯
f
(δ
f
) 2
n1
5/6 or if S(f ) = 0
and P
¯
f
(δ
f
) 2
n1
2/3. Otherwise it evaluates to 0.
Then the fraction of f F
prom
n
for which A(f) = 1 is
at least 1 60.
Proof. Extend the function A to the entire set of degree-
3 polynomials with n variables by letting A(g) = 1 if
g 6∈ F
prom
n
. Now suppose the statement were false. By
Lemma 4, below, the fraction of degree-3 polynomials in
F
prom
n
that satisfy the promise is at least 1/5; hence, the
fraction of all degree-3 polynomials for which A(f) = 0
is at least 12. Thus, there must be some g where the
fraction of g
0
[¯g] for which A(g
0
) = 0 is at least 12.
For each such g
0
with A(g
0
) = 0, it is either the case that
S(g
0
) = 1, Q
¯g
(δ
g
0
) 2
n1
, and P
¯g
(δ
g
0
) < 2
n1
5/6
or that S(g
0
) = 0, Q
¯g
(δ
g
0
) 2
n1
1/2, and P
¯g
(δ
g
0
) >
2
n1
2/3. In particular, |Q
¯g
(δ
g
0
)P
¯g
(δ
g
0
)| 2
n
/12
for all such g
0
, and the number of g
0
for which this is
true is at least 2
n
12. Thus
P
z
|P
¯g
(z) Q
¯g
(z)| >
, contradicting that the classical simulation algorithm
has additive error at most .
Lemma 4. For all n, the fraction of degree-3 polyno-
mials f with n variables for which f F
prom
n
is at least
1/5.
Proof. In Ref. [17], it was shown that
E((gap(f)/2
n
)
4
) 3 2
2n
, where the expecta-
tion is taken over all degree-3 polynomials with n
variables. Moreover, E((gap(f)/2
n
)
2
) = 2
n
. The
Paley-Zygmund inequality states that for a random
variable Z, Pr(Z > θE(Z)) (1 θ)
2
E(Z)
2
/E(Z
2
).
Taking Z = gap(f)
2
/2
2n
and θ = 1/2, we find that
Pr((gap(f)/2
n
)
2
2
n1
) 1/12. Additionally,
in Lemma 9 in Appendix C, we extend the method
of [17] to compute more moments and prove that
Pr((gap(f)/2
n
)
2
2
n2
) 0.12. Taken together,
this shows that at least 1/5 of the instances satisfy the
promise.
Now we make an average-case fine-grained conjecture
about SB algorithms that solve poly3-SGAP.
Conjecture 3 (poly3-ave-SBSETH(a
0
)). Suppose a
classical probabilistic algorithm (in the Word RAM
model of computation) runs in p(n) time steps and has
the following property: there exists a function q(n)
p(n) and a constant c such that for at least a 11/12 frac-
tion of instances f from F
prom
n
, either gap(f)
2
2
n1
and the probability the algorithm accepts f is at least
2
q(n)
or gap(f)
2
2
n2
and the probability the al-
gorithm accepts f is at most 2
q(n)
/c. Then, the al-
gorithm must satisfy p(n) 2
a
0
n1
. That is, there
is no classical algorithm with SB power that solves
poly3-SGAP and runs in fewer than 2
a
0
n1
timesteps.
As long as < 1/720, the algorithm A is exactly
such an algorithm, with runtime s
4
(n). Thus, assuming
poly3-ave-SBSETH(a
0
), we obtain the lower bound
s
4
(n) 2
a
0
n1
. (26)
The choice 11/12 0.92 in Conjecture 3 is motivated
by a rigorous bound on the maximum success probabil-
ity of a naive algorithm that always outputs the same
answer on the poly3-SGAP problem. We discuss this
further in Section 4.5 and prove this bound in Appendix
C. If we assume that the gap of random degree-3 poly-
nomials is distributed as a Gaussian, as depicted in Fig-
ure 5, this bound would be improved to roughly 0.56.
Moreover, the same assumption would allow the result
in Lemma 4 to be improved from 1/5 to 0.86. Assuming
the Gaussian gap distribution and replacing 11/12 with
0.56 in Conjecture 3, the error tolerance of our additive-
error approximation algorithm could be improved from
< 1/720 to < 1/32, which would be significantly
more attainable experimentally.
4.4.2 For QAOA circuits
We previously demonstrated how one turns the IQP cir-
cuit C
f
into a QAOA circuit by introducing n ancilla
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 16
qubits. Using this construction as the only modifica-
tion to the previous analysis yields the following lower
bound.
s
5
(n) 2
a
0
n
2
1
, (27)
where s
5
(n) is the number of time steps for a hypothet-
ical algorithm that simulates any QAOA circuit with
n qubits and g
2
(n) gates, up to additive error at most
< 1/720.
4.5 Evidence for average-case conjecture
Our fine-grained lower bounds derived in the previ-
ous subsection rest on poly3-ave-SBSETH(a
0
), a con-
jecture that asserts the nonexistence of fine-grained
exponential-time classical algorithms with SB power
that solve a hard problem in the average case. There are
two main differences between poly3-ave-SBSETH(a
0
)
and the previously discussed poly3-NSETH(a). First,
poly3-ave-SBSETH(a
0
) pertains to SB classical algo-
rithms instead of non-deterministic classical algorithms.
The power of SB is at least as powerful as non-
determinism: SBP contains NP. Yet, SBP still lies only
in the second level of the PH, somewhere between NP
and MA. This aspect makes poly3-ave-SBSETH(a
0
) less
plausible than poly3-NSETH(a) when a
0
= a; as we will
see later, the move from non-determinism to SB yields
a quadratic speedup of the black-box query complex-
ity, but the similarity between the two models makes
additional speedups based on this aspect unlikely. The
second and more substantial difference is that poly3-
SBSETH(a
0
) is about an average-case problem, while
poly3-NSETH(a) is about a worst-case problem. In-
deed, while the problem poly3-SGAP is known to be
hard for the PH in the worst case, its complexity is un-
known in the average case, much less its fine-grained
complexity.
Note that in [8] and [29], average-case fine-grained
complexity was studied in the context of polynomial-
time algorithms; they defined various average-case
problems in P and showed that standard worst-case
fine-grained conjectures like SETH imply lower bounds
for these problems that essentially match known upper
bounds. Ideally, we would be able to show something
similar for our problem, but we are unable to do so due
to the fact that our problem pertains to SB algorithms
for #P-hard problems.
Nonetheless, in the remainder of this section, we
strive to put the average-case aspect of poly3-ave-
SBSETH(a
0
) in context and provide as much evidence
as possible in its favor.
A necessary condition for poly3-SGAP to be average-
case hard is that naive determinstic algorithms that pro-
duce the same output on every input cannot solve it on
more than a constant fraction of inputs. The maximum
success probability of such an algorithm is the fraction
of instances for which the output should be YES, or the
fraction for which the output should be NO, whichever
is larger. In the limit as n becomes large, this is given
mathematically by
p
0
= lim inf
n→∞
max
j=0,1
Pr
f←F
prom
n
(S(f) = j)

. (28)
We would say the problem is naively average-case easy
if p
0
= 1. If, for large n, gap(f)/2
n
were distributed as
a Gaussian with mean 0 and variance 2
n
as illustrated
in Figure 5, then p
0
= 0.48/(0.38+ 0.48) = 0.56. In Ap-
pendix C, we show that, in the limit of large n, the mo-
ments E((gap(f)/2
n
)
2k
) approach E(x
2k
) when x is dis-
tributed as a Gaussian with mean 0 and variance 2
n
.
Although we do not formally prove that this implies the
distribution itself approaches a Gaussian distribution,
we take it as evidence in favor of that hypothesis. More-
over, as we also show in Appendix C, our calculation of
the moments formally implies that p
0
11/12, indicat-
ing that poly3-SGAP is not naively average-case easy.
This contrasts with the problem poly3-NONBALANCED
for which we expect it to be the case that p
0
= 1; the
fraction of instances for which gap(f) = 0 vanishes with
increasing n.
It remains possible a more sophisticated algorithm
could solve poly3-SGAP efficiently. We now provide ar-
guments showing how certain approaches to find such
an algorithm are doomed for failure.
We begin by extending the black-box argument from
Section 3.4 so that it rules out SB algorithms solving the
SGAP problem in the average case for random general
Boolean functions on n variables:
Theorem 5 (informal). A black-box algorithm
equipped with SB power that solves the SGAP problem
on 0.56 + fraction of instances for some > 0 must
make at least Ω(2
n/2
) queries, and moreover there is a
matching upper bound.
Theorem 5. Let H
n
contain all n-bit Boolean func-
tions for which either (gap(h)/2
n
)
2
2
n1
or
(gap(h)/2
n
)
2
2
n2
. Suppose a randomized algo-
rithm A with black-box access to h H
n
has the fol-
lowing properties: (1) A makes L queries to h and (2)
there exists a function t(n) and constants n
0
, c > 1, and
> 0, such that whenever n n
0
and for a fraction at
least 0.56+ of h H
n
, A(h) accepts with at least proba-
bility t(n) if (gap(h)/2
n
)
2
2
n1
, and accepts with at
most probability t(n)/c if (gap(h)/2
n
)
2
2
n2
. Then
L Ω(2
n/2
). Moreover, there exists such an algorithm
with L = O(2
n/2
).
Proof. The main idea behind the proof is similar to that
of Theorem 1: we take YES instances on which the algo-
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 17
rithm succeeds, minimally modify the instance so that
it is a NO instance, and argue that the algorithm must
behave the same way (unless it makes many queries).
Thus it cannot succeed on a large fraction of both the
YES and the NO instances. The implementation here
is more complicated than in Theorem 1 because the
algorithm we are dealing with is a SB algorithm and
additionally because we must worry about the fraction
of instances on which the algorithm can succeed, and
not just whether or not it succeeds on all instances.
The distribution of gap(h)/2
n
for h a uniformly ran-
dom Boolean function is a shifted and scaled binomial
distribution with mean 0 and variance 2
n
, since each
of the 2
n
entries in the truth table is 0 or 1 with equal
probability. As n , it approaches a Gaussian distri-
bution with the same mean and variance. Let N H
n
contain instances h for which (gap(h)/2
n
)
2
2
n2
,
and let Y H
n
contain instances h for which 2
n1
(gap(h)/2
n
)
2
γ2
n
, where γ 2.76 is the constant
for which erf(
2/4) + erf(1/2) = erf(
2γ/2), defined
as such so that |N|/|Y | 1 as n .
There exists an algorithm that simply always outputs
YES and succeeds on a fraction 0.56 (as n ) of
instances in H
n
the instances in H
n
\ N without
making any queries. If A succeeds on a fraction 0.56 +
of instances in H
n
, then it must succeed on at least a
fraction 0.5 + of instances in N Y , and we will show
that this implies it makes at least Ω(2
n/2
) queries.
Fix an arbitrary choice of t(n), c, and . Consider
an instance h Y and an even integer x such that
(x/2
n
)
2
2
n2
. Let N
h,x
include all instances g N
for which gap(g)/2
n
= x and g and h differ on exactly
|gap(g) gap(h)|/2 O(2
n/2
) of the 2
n
input strings.
That is, g can be formed from h solely by switching en-
tries in the truth table from 0 to 1 when gap(h) > 0
or from 1 to 0 when gap(h) < 0. Consider the fol-
lowing process: first draw an instance h uniformly at
random from Y and choose an integer x according to
the binomial distribution with mean 0 and variance 2
n
(rejecting and repeating until (x/2
n
)
2
2
n2
), then
with 1/2 probability output h and with 1/2 probability
output a uniformly random instance from N
h,x
. The
output of this procedure is an element from N Y and
as n , the distribution of outputs becomes arbi-
trarily close in `
1
distance to the uniform distribution.
Now suppose the algorithm A succeeds on input h
Y i.e. its acceptance probability is at least t(n). Viewing
A as a deterministic machine taking as input h as well
as a uniformly random string r R, we have
|{r R : A(h, r) = 1}| t(n)|R|. (29)
Consider a fixed string r for which A(h, r) = 1. The
algorithm queries a fixed subset of size L among the 2
n
input strings. If (for any fixed choice of x) we choose
a function g N
h,x
uniformly at random, then the set
of input strings z for which g(z) 6= h(z) has cardinality
O(2
n/2
) and is equally likely to be any subset of that size
containing only strings for which h(z) = 0 (resp. h(z) =
1) when gap(h) > 0 (resp. gap(h) < 0). Thus, by the
union bound, the probability over the choice of g N
h,x
that A(h, r) queries some string z for which h(z) 6= g(z)
is at most O(L2
n/2
). If all L queried strings z have
g(z) = h(z), then A proceeds the same on inputs (g, r)
as it does on inputs (h, r) and outputs 1. This will be
true for every r for which A(h, r) = 1 and for every x.
Thus we can say
E
gN
h,x
(|{r R : A(h, r) = A(g, r) = 1}|)
1 O(L2
n/2
)
|{r R : A(h, r) = 1}|, (30)
which implies
Pr
gN
h,x
|{r R : A(h, r) = A(g, r) = 1}|
|R|
t(n)
c
Pr
gN
h,x
|{r R : A(h, r) = A(g, r) = 1}|
|{r R : A(h, r) = 1}|
1
c
O
L2
n/2
1 1/c
, (31)
where the last line follows from a rearrangement of
Eq. (30) and Markov’s inequality. If L = o(2
n/2
), then
this quantity approaches 0 as n . Now consider
again the procedure of choosing a random instance h
from Y , a random x, and a random g from N
h,x
, then
with equal chance running A on h or running A on g.
The bound above shows that this procedure can suc-
ceed with probability at most 1/2 + o(1). For any , we
may choose n large enough so that the success fraction
is smaller than 1/2 + /2, and so that the procedure
draws from a distribution that is /2-close to uniform
over N Y , completing the proof that there is no al-
gorithm satisfying the conditions in the statement with
L = o(2
n/2
).
To show the upper bound, that L = O(2
n/2
) is pos-
sible, we give a simple algorithm A which succeeds for
every instance in H
n
for any n. On an input h, the
algorithm randomly selects L = 10 2
n/2
input strings
{z
i
} and queries h(z
i
) for each z
i
. If h(z
i
) = h(z
j
) for
every i, j, then the algorithm accepts. Otherwise, it re-
jects. If (gap(h)/2
n
)
2
2
n1
, then the probability of
acceptance is at least
1
2
+
|gap(h)|
2
n+1
L
1
2
L
1 + 2
(n1)/2
L
1
2
L
exp(L2
(n1)/2
0.9)
= 2
102
n/2
exp(9/
2), (32)
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 18
where in the second-to-last line we have used the bound
log(1 + u) 0.9u when u is sufficiently small.
Meanwhile, if (gap(h)/2
n
)
2
2
n2
, then the prob-
ability of acceptance is at most
2
1
2
+
|gap(h)|
2
n+1
L
2
2
L
1 + 2
n/21
L
2 2
102
n/2
exp(L2
n/21
)
= 2
102
n/2
2 exp(5). (33)
Thus, we may choose t(n) = 2
102
n/2
exp(9/
2) and
c = 1.5 and then for every h, if (gap(h)/2
n
)
2
is above
the threshold the acceptance probability is above t(n)
and if it is below the threshold the acceptance proba-
bility is below t(n)/c. This completes the proof.
The fact that the bound is tight formally rules out
poly3-ave-SBSETH(a
0
) for values of a
0
> 0.5. However,
it indicates that refuting poly3-ave-SBSETH(a
0
) for val-
ues of a
0
< 0.5 would require an algorithm that goes
beyond simply evaluating the function f(z) for various
strings z; it would have to utilize the fact that f is a
random degree-3 polynomial over F
2
, and not a general
Boolean function.
Note that the average-case aspect of the problem has
no effect on the black-box query complexity in this case.
The complexity is quadratically weaker (Θ(2
n/2
)) than
what we found for the NONBALANCED problem (Θ(2
n
))
in the worst case in Theorem 1, but this difference is
a result of moving from non-deterministic algorithms
to SB algorithms. It is not a result of moving from
the worst case to the average case. Indeed, the algo-
rithm we give to show the upper bound O(2
n/2
) on the
query complexity in Theorem 5 solves the SGAP problem
not only in the average case, but also in the worst case,
demonstrating that there is no asymptotic difference be-
tween the average-case and worst-case black-box query
complexity for SB algorithms solving the SGAP problem.
We take this as evidence that the average-case aspect
of poly3-ave-SBSETH(a
0
) would not play a crucial role
in its veracity.
We bolster this intuition by giving a worst-case-
to-quasi-average-case reduction for counting zeros of
degree-3 polynomials over F
2
. The “quasi”-average-case
problem we reduce to is to, for every degree-3 polyno-
mial f, with high probability count the number of zeros
to g where g is a random degree-3 polynomial whose
degree-2 and degree-3 parts agree with f. This prob-
lem is harder than the fully average-case version where
the degree-2 and degree-3 parts are also randomized.
Theorem 6. For any degree-3 polynomial f with n
variables and no constant term, let [
¯
f] be the set of all
degree-3 polynomials whose degree-2 and degree-3 parts
agree with f. Suppose we have an algorithm A with run-
time T (n) that, for every f , computes gap(g) correctly
on at least a fraction 1 1/(3n) of instances g [
¯
f].
Then there is a randomized algorithm A
0
with runtime
nT (n)+poly(n) that computes gap(f) correctly for every
f.
Corollary 6.1. It is #P -hard to, for every f, compute
gap(g) on a fraction 1 1/(3n) of instances g [
¯
f].
Proof of Theorem 6. Let P be the quasi-average-case
problem described in the statement. We will show that
an oracle to P can be used to efficiently compute gap(f)
in the worst case. Given a degree-3 polynomial f with n
variables, choose a polynomial g [
¯
f] uniformly at ran-
dom and use the oracle to P to compute gap(g). Write
the degree-1 polynomial f(x)+g(x) as
P
j
u
j
x
j
for some
vector u F
n
2
. If u is the 0 vector, then f = g and we
may simply output gap(g). Otherwise, there is an index
j for which u
j
= 1. Without loss of generality assume
j = n. Then we can perform the bijective linear change
of variables x
0
n
=
P
j
u
j
x
j
and x
0
k
= x
k
for all k 6= n.
This yields new degree-3 polynomials f
0
(x
0
) and g
0
(x
0
)
such that gap(f) = gap(f
0
), gap(g) = gap(g
0
) (since the
transformation is bijective), and g
0
(x
0
) = f
0
(x
0
) + x
0
n
.
We can express
gap(g) = gap(g
0
) = gap(f
0
|x
0
n
= 0) gap(f
0
|x
0
n
= 1)
gap(f) = gap(f
0
) = gap(f
0
|x
0
n
= 0) + gap(f
0
|x
0
n
= 1),
where f
0
|x
0
n
denotes the degree-3 polynomial on n 1
variables formed by taking f
0
and fixing the value of x
0
n
.
Thus,
gap(f) = gap(g) + 2 gap(f
0
|x
0
n
= 1). (34)
We recursively compute gap(f
0
|x
0
n
= 1), unless it has
just one variable, in which case we compute it by brute-
force, and we add twice the result to gap(g) to produce
our output gap(f).
Each level of recursion requires poly(n) steps to per-
form the linear transformation as well as one oracle call,
which takes at most T (n) time steps. There are n recur-
sion levels, meaning the total runtime of the algorithm
is only nT (n) + poly(n).
The algorithm will succeed as long as it computes
gap(g) correctly at each of the n levels of recursion.
Since each of these occurs with probability at least
1 1/(3n), by the union bound the procedure as whole
succeeds with at least 2/3 probability. Note that this
success probability is over the randomness of the algo-
rithm itself it could be boosted by repetition and
not over the randomness in the instance.
This worst-case-to-quasi-average-case reduction is
subideal for several reasons. First it is only quasi-
average case and not fully average-case. However, ran-
domizing the instances only over [
¯
f] and not over the
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 19
entire set of degree-3 polynomials actually fits fairly well
with the framework of our analysis, since for a fixed
IQP circuit C
f
the probability of sampling each of the
2
n
outputs corresponds with (gap(g)/2
n
)
2
for a differ-
ent g [
¯
f]. A larger issue is that the worst-case-to-
quasi-average-case reduction only works for computing
gap(f) and not for computing gap(f)
2
: if one knows
gap(g)
2
for a random g [
¯
f], it is not clear how to
determine whether gap(g) = ±
p
gap(g)
2
, leading to an
exponentially large tree of possible values for gap(f) af-
ter n levels of recursion. Finally, like worst-to-average-
case reductions for computing the permanent [3, 41]
or the output probability of a random quantum cir-
cuit [15, 48, 49], our reduction requires the computa-
tion of gap(g) at each step to be exact, as any errors
would be uncontrollably amplified by the recursion pro-
cess. However, we note that the idea behind our reduc-
tion of exploiting gap-preserving linear transformations
of degree-3 polynomials over F
2
is quite distinct from
the strategy of polynomial interpolation that underlies
these previous worst-case-to-average-case reductions.
Despite these shortcomings, the reduction rules out
certain approaches to refuting our conjecture poly3-ave-
SBSETH(a
0
). In particular, an algorithm that refutes
it by exactly computing gap(f) like the algorithm
from LPTWY [42] would need to succeed in both the
average case and the worst case, or otherwise succeed
in the average case but not the quasi-average case.
Taken together, the black-box bound and the worst-
case-to-quasi-average-case reduction bolster the plau-
sibility of poly3-ave-SBSETH(a
0
), at least relative to
poly3-NSETH(a), because they present roadblocks on
ways to utilize the average-case nature of the former
conjecture to refute it without also refuting the latter.
For our qubit calculations, we take a
0
= 0.5 since
any larger number is formally refuted by the black-box
result and any smaller number would yield a non-trivial
classical algorithm.
5 Number of qubits to achieve quantum
computational supremacy
5.1 Our estimate
We can use the lower bounds on the runtime of a hypo-
thetical classical simulation algorithm for IQP, QAOA,
and boson sampling circuits in Eqs. (17), (21), (23),
(26), and (27) to estimate the minimum number of
qubits required for classical simulation of these circuit
models to be intractable.
The fastest supercomputers today can perform at be-
tween 10
17
and 10
18
FLOPS (floating-point operations
per second)
2
. Using our lower bounds, we can deter-
mine the number of qubits/photons q such that the
lower bound on s
i
(q) is equal to 10
18
·60·60·24·365·100,
the maximum number of floating-point operations to-
day’s supercomputers can perform in one century, for
i = 1, 2, 3, 4, 5. Using our multiplicative-error lower
bounds, we calculate that for IQP circuits it is 93/a
qubits (from Eq. (17)), for QAOA circuits it is 185/a
qubits (from Eq. (21)), and for boson sampling circuits
it is 93/b photons (from Eq. (23)). The additive-error
bounds replace a by a
0
(from Eqs. (26) and (27)). We
take a = a
0
= 1/2 and b = 0.999, and these esti-
mates become 185 qubits for IQP circuits, 370 qubits
for QAOA circuits, and 93 photons for boson sampling
circuits. For these values of a, a
0
, b, the number of cir-
cuit elements needed for the lower bound to apply is
g
1
(185) = 1,060,000 gates for IQP circuits, g
2
(370) =
2,110,000 constraints for QAOA circuits, and g
3
(93) =
17,400 beam splitters and phase shifters for boson sam-
pling circuits.
Thus, assuming one operation in the Word RAM
model of computation corresponds to one floating-point
operation on a supercomputer, and assuming our con-
jectures poly3-NSETH(1/2), per-int-NSETH(0.999),
and poly3-ave-SBSETH(1/2), we conclude that classi-
cally simulating circuits of the sizes quoted above (up to
either multiplicative or sufficiently small constant addi-
tive error) would take at least a century on modern clas-
sical technology, a timespan we take to be sufficiently
intractable.
If, additionally, we assume that the runtime of the
classical simulation algorithm grows linearly with the
number of circuit elements (e.g., the naive “Schrödinger-
style” simulation algorithm that updates the state vec-
tor after each gate), then we can make a similar state-
ment for circuits with many fewer gates. The cost of
this reduction in gates is only a few additional qubits,
due to the exponential scaling of the lower bound. We
can estimate the number of qubits required by finding q
such that s
i
(q)/g
i
(q) = 10
18
·60 ·60 ·24·365/5, the max-
imum number of supercomputer operations in 1/500 of
a century, for i = 1, 2, 3, 4, 5. We conclude that an IQP
circuit with 208 qubits and 500 gates, a QAOA cir-
cuit with 420 qubits and 500 constraints, and a boson
sampling circuit with 98 photons and 500 linear optical
elements each would require at least one century one
year per 5 circuit elements to be simulated using a
classical simulation algorithm of this type running on
state-of-the-art supercomputers.
Here we remark again that to make these estimates,
we have diverged from conventional complexity theory
2
A list of the fastest supercomputers is maintained at https:
//www.top500.org/statistics/list/.
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 20
to make conjectures that fix even the constant prefac-
tors on the lower bounds for algorithmic runtimes, and
additionally, we must assert that these runtime lower
bounds hold not only asymptotically but also in the
90 < n < 500 range where our estimates fall. However,
these estimates are robust in the sense that modifica-
tions to the constant prefactors have only minor effect
on the final estimate.
We also note that the relative factor of two in the es-
timate for QAOA circuits is a direct consequence of the
fact that one ancilla qubit was introduced per variable
in order to implement the H gates at the end of the IQP
circuit C
f
within the QAOA framework. This illustrates
how our estimate relies on finding a natural problem
for these restricted models of quantum circuits and an
efficient way to solve that problem within the model.
Indeed, an earlier iteration of this estimate based on
the satisfiability problem instead of the degree-3 poly-
nomial problem or matrix permanent required many an-
cilla qubits and led to a qubit estimate above 10,000.
5.2 Relationship to Google’s quantum computa-
tional supremacy experiment
Since the appearance of the first version of this paper, a
collaboration between Google and others reported that
it has achieved quantum computational supremacy with
53 superconducting qubits on a programmable quantum
device [7]. Their interpretation and approach to quan-
tum computational supremacy differs in certain respects
from the perspective presented in our work. For one, the
task they have completed on their quantum computer,
Random Circuit Sampling (RCS) of quantum circuits
with local gates on 2D lattices, is not one of the propos-
als we have considered for our analysis. RCS does not
fit nicely into our analysis since, while it is hard to clas-
sically perform RCS noiselessly (assuming non-collapse
of PH), RCS does not appear to correspond naturally
with a specific purely classical counting problem that
has garnered independent study in classical computer
science.
Additionally, their quantum device is noisy and pro-
duces samples from a distribution that has neither small
multiplicative nor small additive error with the ideal
distribution. Rather, they argue that their device ex-
periences global depolarizing noise and samples from
a distribution that has small fidelity F > 0 with the
ideal distribution, i.e. the device distribution (approx-
imately) satsifes P (x) = F Q(x) + (1 F )/2
n
, where
Q is the ideal distribution. When F = 1 (noiseless),
this task cannot be performed classically in polynomial
time unless the PH collapses, and when F = 0, P is the
uniform distribution and sampling from it is classically
easy. They argue that the task is hard when F is a small
constant (in their experiment on the order of 10
3
) by
invoking what is essentially a fine-grained reduction in
the same spirit as what we present here (see Supplemen-
tary Material of [7]). They show how a time T classical
simulation drawing samples from P (x) implies a time
O(F T ) classical Arthur-Merlin (AM) protocol that de-
cides whether Q(x) is greater than one threshold or less
than a smaller threshold (assuming one is the case),
which is essentially the RCS analogue of poly3-SGAP.
As such, we will call this problem RCS-SGAP. Indeed, in
light of our analysis, perhaps they could have opted to
reduce to an SB algorithm, as we have, instead of an
AM algorithm with a similar factor O(F ) overhead.
Thus, if they wished to derive a rigorous lower bound
similar to the ones in our paper, they would need
to impose a fine-grained conjecture asserting the non-
existence of classical exponential-time AM protocols for
RCS-SGAP. While it is certainly possible that such a con-
jecture could be true, the fact that it would not be about
a classical computational problem would make it harder
to begin to give evidence for it. The conjecture would
essentially be an assertion that quantum circuits do not
admit classical simulations that are substantially faster
than brute-force, which would ideally be the conclusion
of a QCS argument based off some unrelated conjecture
instead of the conjecture itself.
However, it is apparent from their work that their
interpretation of QCS is slightly different from ours.
While we hope to rule out all possibility of compet-
itive classical simulation algorithms, both known and
unknown, they meticulously compare and benchmark
their quantum device against best known classical algo-
rithms for performing RCS, while putting less emphasis
on arguing that yet-to-be-developed classical algorithms
for RCS will not be substantially faster than ones that
are currently known. On the one hand, this makes
sense since ultimately QCS is about the practical as-
cendancy of quantum computers over classical ones, and
unknown classical algorithms are certainly not practi-
cal. On the other hand, one of the founding rationales
[3] for sampling-based QCS was that it could be based
on classical conjectures like the non-collapse of the PH
whose refutation would bring cascading consequences to
many other problems in classical complexity theory, as
opposed to conjectures that merely assert that problems
that are easy for quantum computers but not known
to be easy for classical ones, such as factoring, in fact
do not admit efficient classical algorithms. Under their
more practically oriented interpretation, it is less im-
portant to base conjectures off of classical problems if
one carefully studies best known existing algorithms for
the quantum problem.
In the end, they are able to declare QCS with only
53 qubits, a substantially smaller number than our es-
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 21
timates. This difference comes from our conservative
approach to supercomputer performance, but more im-
portantly from our aforementioned attempt to rule out
all hypothetical simulation algorithms, known and un-
known, and not just assume that the best known al-
gorithms are optimal. Their effort to study the perfor-
mance of best known classical simulation algorithms for
RCS on one of the best supercomputers in the world re-
veals how supercomputer memory limitations are a key
consideration that we have not introduced into our anal-
ysis. However, it is difficult for us to make a similarly
detailed assessment of a supercomputer’s performance
when the classical algorithms we consider are hypothet-
ical and the only thing we know about them is their
runtime, forcing us to be conservative. Additionally,
by insisting that we make conjectures about classical
problems and not about the hardness of simulating the
circuits themselves, we end up with simulation lower
bounds that are not tight with best known simulation
algorithms, except in the case of boson sampling cir-
cuits. Our conjectures lead to qubit estimates that are
larger by roughly a factor of 2 for IQP circuits and a fac-
tor of 4 for QAOA circuits, compared to if we had simply
conjectured that best known simulation algorithms for
those circuit families were optimal.
Given our conservative approach and our more de-
manding requirements for QCS, it is in our view an
encouraging sign that our qubit estimates are within a
small constant factor of what the Google experiment
was able to achieve.
6 Conclusion
Previous quantum computational supremacy arguments
proved that polynomial-time simulation algorithms for
certain kinds of quantum circuits would imply unex-
pected algorithms for classical counting problems within
the polynomial-time hierarchy. We have taken this fur-
ther by showing that even somewhat mild improvements
over exponential-time best known simulation algorithms
would imply non-trivial and unexpected algorithms for
specific counting problems in certain cases. Thus, by
conjecturing that these non-trivial classical counting al-
gorithms cannot exist, we obtain lower bounds on the
runtime of the simulation algorithms. In the case of
boson sampling circuits, these lower bounds are essen-
tially asymptotically tight when the strongest form of
our conjecture is imposed.
Our conjectures for multiplicative-error simulation,
poly3-NSETH(a) and per-int-NSETH(b), are fine-
grained manifestations of the assumption that the PH
does not collapse. Meanwhile, our conjecture for
additive-error simulation, poly3-ave-SBSETH(a
0
) is a
fine-grained manifestation of the non-collapse assump-
tion plus the statement that a certain counting problem
is hard for the PH even on average. While unproven,
the non-collapse conjecture is extremely plausible; its
refutation would entail many unexpected ramifications
in complexity theory. This contrasts with the assump-
tion that factoring has no efficient classical algorithm,
which would also entail hardness of simulation but is
less plausible because the consequences of its refuta-
tion on our current understanding of complexity theory
would be minimal. Of course, the fine-grained nature
of our conjectures makes them less plausible than the
non-collapse of the PH, but they are in line with current
knowledge and beliefs in fine-grained complexity theory
when a, a
0
1/2 and b < 1.
It is worth comparing our approach with using a
fine-grained version of the conjecture that PP 6⊂ Σ
P
3
,
which is the complexity theoretic conjecture proposed
in Aaronson-Arkhipov [3]. An immediate advantage of
our approach is that we avoid invoking Stockmeyer’s
theorem [57] to estimate the acceptance probability of
a hypothetical classical simulation algorithm for quan-
tum circuits in Σ
3
P ; the fine-grained cost of this step
would be significant, ultimately increasing our qubit es-
timates by roughly a factor of 3 [22]. Additionally, to
understand the range of plausible fine-grained conjec-
tures relating to Σ
3
algorithms, we might start with
oracle bounds, analogous to our Theorem 1. Known or-
acle lower bounds for the majority function show only
that Σ
3
circuits that compute the majority of an oracle
function (the oracle analogue of PP) need size Ω(2
n/5
).
This would correspond to taking a or b equal to 1/5
which would increase our qubit estimates by a factor of
2.5 or 5 respectively. The proof is also more complex,
involving the switching lemma [22]. Thus our approach
based on coC
=
P 6⊂ NP instead of PP 6⊂ Σ
P
3
yields both
a much simpler proof and a tighter bound.
The main motivation for imposing these fine-grained
conjectures was to make an estimate of how large quan-
tum circuits must be to rule out practical classical simu-
lation on state-of-the-art classical computers. We chose
the IQP, QAOA, and boson sampling models for our
analysis because they are prominent QCS proposals
and because their close connection with specific hard
counting problems (i.e. computing the gap of degree-
3 polynomials over F
2
and computing the permanent)
made them amenable to perform a fine-grained analysis
with little unnecessary overhead. Our estimate relies on
poly3-NSETH(1/2), per-int-NSETH(0.999), and poly3-
ave-SBSETH(1/2), but it is somewhat robust to failure
of these conjectures in the sense that if they fail in fa-
vor of mildly weaker versions, our estimate will increase
only slightly. For example, replacing these conjectures
with the slightly weaker poly3-NSETH(1/2d), per-int-
NSETH(1/d), and poly3-ave-SBSETH(1/2d) increases
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 22
the qubit estimate by only a factor of d, and replacing
2
cn1
time steps with 2
cn1
/d time steps in either con-
jecture (i.e. c {a, b}) increases the estimate by only
log
2
(d) qubits.
Our qubit estimates of fewer than 200 qubits for
IQP circuits, fewer than 400 qubits for QAOA circuits,
and fewer than 100 photons for boson sampling circuits
are beyond current experimental capabilities but poten-
tially within reach in the near future. Additionally, our
estimate for boson sampling circuits is consistent with
recently improved simulation algorithms [21, 50] that
can simulate circuits with up to as many as 50 pho-
tons but would quickly become intractable for higher
numbers of photons.
Here we emphasize the importance of ruling out sim-
ulations with O(1) additive error and not merely sim-
ulations with O(1) multiplicative error. Experimen-
tal noise in real quantum systems without fault tol-
erance is likely to be large enough that most realistic
devices could not achieve the noise rates for which our
multiplicative-error bounds apply. This is not as prob-
lematic for the additive-error bounds; quantum systems
with small but potentially reaslistic noise rates would
generate an output distribution that has = O(1) ad-
ditive error with respect to the ideal output distribu-
tion. Our bounds require that this additive error sat-
isfy < 1/720, and we argue that this could probably be
improved to < 1/32. These numbers are in line with
previous additive-error QCS analyses (e.g., < 1/192 in
[17]), but finding ways to boost them might make QCS
considerably more attainable for near-term devices un-
der our analysis.
An important place our analysis can be improved
is in providing additional evidence for our conjec-
tures. While the conjectures poly3-NSETH(a), per-int-
NSETH(b), and poly3-ave-SBSETH(a
0
) are consistent
with other fine-grained conjectures like SETH, NSETH,
and #SETH, it is an open question whether it is pos-
sible to prove a concrete relationship with one of these
conjectures, which would be important evidence in their
favor. In particular, poly3-ave-SBSETH(a
0
), which is
a fine-grained statement about SB algorithms for an
average-case problem, is unlike any conjecture previ-
ously proposed of which we are aware. The evidence we
have provided, in the form of a black-box average-case
query lower bound and a worst-case-to-quasi-average-
case reduction, rules out certain methods one might use
to refute it, but given little previous work on statements
of this nature, it is hard for us to assess the likelihood
that other methods for refuting it exist.
Nevertheless, if anything, this highlights a weakness
in QCS more generally. To arrive at practical qubit es-
timates, QCS arguments must be made fine-grained,
but doing so uncovers the need to make conjectures
about classical problems whose fine-grained complexity
has not previously garnered considerable attention.
Finally, we conclude by noting that our analysis
would likely be applicable to many other classes of
quantum circuits whose efficient classical simulation en-
tails the collapse of the PH. Indeed, since the release
of the first version of this paper, our method was ex-
tended in [45] to derive fine-grained lower bounds for
the simulation (up to multiplicative error) of the one-
clean-qubit (DQC1) model, and the Hadamard-classical
circuit model (HC1Q) [47], as well as provide a lower
bound on Clifford+T simulation in terms of the num-
ber of T gates. In [45], they also provide a com-
pelling argument that it would be difficult to show that
NSETH implies a version of the conjectures underlying
their (and our) analysis. Additionally, in [33], the fine-
grained lower bounds for simulation from our work and
from [34] (which used SETH to rule out exponential-
time algorithms that compute output probabilities of
quantum circuits) were extended to be based on other
fine-grained assumptions, including the well-studied Or-
thogonal Vectors, 3-SUM, and All Pairs Shortest Path
conjectures [66]. Other models that are universal un-
der post-selection where this method may apply include
various kinds of extended Clifford circuits [38, 39], and
conjugated Clifford circuits [14].
Note: The first version of this paper included only the
lower bounds for multiplicative-error simulations and
did not include any of the content of Section 4. We
would like to draw the reader’s attention to Ref. [44] by
Morimae and Tamaki, which was posted as the additive-
error lower bounds that appear in the current version
of this paper were in preparation. Ref. [44] is an in-
dependent analysis for additive-error QCS that shows
several results, some of which overlap with ours in Sec-
tion 4, based on new fine-grained conjectures that are
similar in spirit but different in detail to our poly3-ave-
SBSETH(a
0
).
Acknowledgments
We are grateful to Ashley Montanaro for important sug-
gestions and conversations about degree-3 polynomials
and the computational problem underlying this work.
We would also like to thank Virginia Williams and Ryan
Williams for useful discussions, and for pointing out
Ref. [42] and correcting an error in an earlier version
of the paper. Finally, we acknowledge Richard Kueng
for suggesting the proof strategy in Lemma 9, as well as
Tomoyuki Morimae and Suguru Tamaki for comments
on a draft of this paper.
AMD acknowledges support from the Undergraduate
Research Opportunities Program (UROP) at MIT, the
Dominic Orr Fellowship at Caltech, and the National
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 23
Science Foundation Graduate Research Fellowship un-
der Grant No. DGE-1745301. AWH was funded by NSF
grants CCF-1452616 and CCF-1729369, ARO contract
W911NF-17-1-0433 and the MIT-IBM Watson AI Lab
under the project Machine Learning in Hilbert space.
DEK was supported by the National Science Scholar-
ship from the Agency for Science, Technology and Re-
search (A*STAR) and Enabling Practical-scale Quan-
tum Computation (EPiQC), a National Science Foun-
dation (NSF) Expedition in Computing, under grant
CCF-1729369. RLP was funded by the MIT School
of Science Fellowship, the MIT Department of Physics,
and the MIT-IBM Watson AI Lab.
A Reduction from poly3-NON-
BALANCED to per-int-NONZERO
Valiant famously showed that computing the perma-
nent of an integer matrix is #P-hard by reduction from
#3SAT [61]. A concise reproduction of this proof can
be found in [6]. The main idea for our reduction is the
same, the only change being in the details of the clause
and variable gadgets we use for the construction.
There is a bijective correspondence between n × n
matrices and directed graphs with n vertices, where the
entry A
ij
of a matrix A corresponds to the edge weight
from vertex i to vertex j in the associated graph G
A
. A
cycle cover of G
A
is a subset of the edges of G
A
forming
some number of cycles in which each vertex appears in
exactly one cycle. The weight of a cycle cover is the
product of the weights of all the edges traversed by one
of the cycles. From the definition of the permanent in
Eq. (9), we can see that the sum of the weights of all
the cycle covers of G
A
is given by Per(A).
It will be straightforward to convert the reduction
from #3SAT to computing the permanent into a re-
duction from poly3-NONBALANCED to per-int-NONZERO
since degree-3 polynomials and 3-CNF formulas have a
common structure in the sense that both involve n vari-
ables where groups of three variables appear together in
terms/clauses.
Suppose we are given a degree-3 polynomial f with
n variables and m clauses. We build a corresponding
graph G
f
by including one term gadget for each of
the m terms and one variable gadget for each of the
n variables, and then connecting them in a certain way.
These gadgets are shown in Figure 6 and Figure 7. If a
term of f has fewer than three variables, we can re-
peat one of the variables that appears in that term
(e.g. x
1
x
2
= x
1
x
2
x
2
), and thereby assume that each
term has three variables. Each term gadget has three
dotted edges corresponding to the three variables that
appear in that term. A variable that appears t times
will have t dotted edges in its variable gadget. Thus,
each dotted variable edge from some node u to node u
0
has a corresponding dotted edge from node v to node
v
0
in a term gadget associated with a term in which
that variable appears. Each such pair of dotted edges
indicates that the nodes u, u
0
, v and v
0
should be con-
nected using the XOR gadget shown in Figure 8. Thus,
the dotted edges are not part of the final graph. The
XOR gadget has the effect of ensuring that any cycle
cover of the graph uses one of the two dotted edges but
not both. The effective weight of an edge connected to
an XOR gadget is 4.
v’
v
4
-1
4
2
4
Figure 6: Gadget for each term in the degree-3 polynomial
f. Unlabeled edges are assumed to have weight 1. The three
dashed lines are connected via the XOR gadget to the dashed
lines in the variable gadgets for the variables that appear in the
term, as exemplified by the labeling of vertices v and v
0
in the
context of Figure 8. If all three variables are true, the term
gadget will contribute a cycle cover factor of 1, excluding the
factors of 4 from dotted edges. If at least one variable is false,
the term will contribute a cycle cover factor of 1.
Every cycle cover of G
f
corresponds to some setting
of the variables z
1
, . . . , z
n
. If the cycle cover traverses
the solid lines at the top of the variable gadget asso-
ciated with variable z
j
, then the corresponding setting
has z
j
= 1. In this case, the cycle cover cannot also
traverse the dotted lines at the bottom of the z
j
vari-
able gadget. Thus, due to the XOR gadget, the cycle
cover must traverse the dotted lines corresponding to
z
j
in each term gadget associated with a term in which
z
j
appears.
On the other hand, if the cycle cover uses the dotted
lines in the z
j
gadget instead of the solid lines at the top,
this corresponds to z
j
= 0, and the cycle cover cannot
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 24
u
u’
4
4
4
4
Figure 7: Gadget for each variable in the degree-3 polynomial
f. The number of dashed lines is equal to the number of terms
in which the variable appears, so this example is for a variable
that appears in four terms. The dashed lines are connected
to the dashed lines in the term gadget in which that variable
appears via the XOR gadget, as exemplified by the labeling of
vertices u and u
0
in the context of Figure 8.
also traverse the edges corresponding to z
j
in the term
gadgets associated with terms in which z
j
appears.
When all three dotted edges of a term gadget are
traversed, this corresponds to all three variables in the
term being set to 1. There is only one way to cycle
cover the term gadget in this case, and it has a weight
of 1, excluding the factors of 4 that come from the
dotted edges in the XOR gadget. Meanwhile, if at least
one dotted edge in the term gadget is not traversed,
the total weight of all cycle covers will contribute a fac-
tor of 1, again excluding the factors of 4. Thus, each
assignment z for which f(z) = 0 corresponds to cy-
cle covers that satisfy an even number of terms, with
total weight 4
3m
since exactly 3m XOR gadgets are
involved. Each assignment for which f(z) = 1 corre-
sponds to cycle covers that satisfy an odd number of
terms, with total weight 4
3m
. Thus, the total cycle
cover weight of G
f
, and by extension the permanent of
the integer-valued matrix corresponding to G
f
is non-
zero if and only if gap(f ) 6= 0. The number of vertices
in G
f
is a polynomial in the number of variables of f , so
this completes the reduction from poly3-NONBALANCED
to per-int-NONZERO. Since poly3-NONBALANCED is
coC
=
P-complete, per-int-NONZERO is coC
=
P-complete
as well.
B Better-than-brute-force solution to
poly3-NONBALANCED
LPTWY [42] gave a better-than-brute-force random-
ized algorithm that determines whether a system of m
u
v’
v
u’
-1
-1
-1
2
3
Figure 8: XOR gadget that connects dotted lines from node u
to u
0
in the variable gadget with dotted lines from node v to v
0
in the term gadget. The effect of the XOR gadget is that any
cycle cover must use either the edge from u to u
0
or the edge
from v to v
0
, but not both. Each XOR gadget contributes a
factor of 4 to the weight of the cycle cover.
degree-k polynomial equations over finite field F
q
has a
solution (i.e. a setting of the variables that makes all m
polynomials equal to 0). They also derandomized this
procedure to create a better-than-brute-force determin-
istic algorithm that counts the number of solutions to
a system of m degree-k polynomial equations over fi-
nite field F
q
. Applying their deterministic algorithm
for the special case m = 1, k = 3, q = 2 (for which it is
considerably simpler) yields a deterministic solution for
poly3-NONBALANCED. We give a simple reproduction of
their algorithm in this case below.
Theorem 7. There is a deterministic algorithm for
poly3-NONBALANCED running in time poly(n)2
(1δ)n
where δ = 0.0035.
Proof. The algorithm beats brute force by finding a
clever way to efficiently represent the number of zeros
of a degree-3 polynomial with n variables when (1 δ)n
of the variables have been fixed. Then, by summing
the number of zeros associated with the 2
(1δ)n
possi-
ble settings of these variables, the algorithm computes
the total number of zeros in poly(n)2
(1δ)n
time, which
is better than brute-force poly(n)2
n
.
First we describe the algorithm. The input is the
degree-3 polynomial f, which has n variables. In the
following we have x {0, 1}
n
, and we let y be the first
(1δ)n bits of x and a be the last δn bits of x. Following
the notation from [42], we define
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 25
ˆ
Q
l
(y, a) = 1 (1 f(x))
l
l1
X
j=0
l + j 1
j
f(x)
j
. (35)
In [10], it is shown that if f(x) 0 mod 2, then
ˆ
Q
l
(y, a) 0 mod 2
l
and if f(x) 1 mod 2, then
ˆ
Q
l
(y, a) 1 mod 2
l
. We define
R
l
(y) =
X
a∈{0,1}
δn
ˆ
Q
l
(y, a) (36)
and observe that R
l
(y) gives the number of settings x
(mod 2
l
) for which f(x) = 1 and the first (1 δ)n bits
of x are y.
The algorithm operates by enumerating all values of
y, computing R
l
(y) when l = δn (which is large enough
so that the number of settings for which f(x) = 1 will
never exceed 2
l
for a given value of y), and summing all
the results. This gives the total number of inputs x for
which f(x) = 1. The algorithm rejects if this number is
2
n1
, and otherwise accepts.
There are two contributions to the runtime. The first
is the computation of a representation of R
δn
(y) as a
sum of monomials in (1δ)n variables of y with integer
coefficients. Each monomial has degree at most 6δn3.
The number of possible monomials with coefficient 1
over a variables with degree at most b is
M(a, b) =
a + b
b
(1 + a/b)
b
(1 + b/a)
a
, (37)
and, from Eq. (35), it is apparent that
ˆ
Q
δn
(y, a) can be
computed by a polynomially long sequence of sums or
products of a pair of polynomials, where a product al-
ways includes either the polynomial (1 f(x)) or f(x),
which have degree only 3. Thus each step in the se-
quence takes time at most poly(n)M((1 δ)n, 6δn 3).
For a certain value of a, a polynomial number of such
steps required to create a representation of
ˆ
Q
δn
(y, a)
and then R
δn
is the sum over 2
δn
such representations
(one for each setting of a). Thus the total time is also
bounded by poly(n)2
δn
M((1 δ)n, 6δn 3).
The second contribution to the runtime is the evalu-
ation of this polynomial for all points y, given its repre-
sentation computed as described. It is shown in Lemma
2.3 of [42] that this evaluation can be performed in time
poly(n)2
(1δ)n
, so long as the representation of R
δn
has
fewer than 2
0.15(1δ)n
monomials. This is satisfied as
long as
M((1 δ)n, 6δn 3) 2
0.15(1δ)n
, (38)
which, using Eq. (37), can be seen to occur whenever
δ < 0.0035. This is an improvement on the general
formula in [42], which when evaluated for k = 3 and
q = 2 yields a bound of δ < 0.00061.
Assuming δ satisfies this bound, the total runtime is
the sum of the two contributions, poly(n)2
δn
M((1
δ)n, 6δn 3) + poly(n)2
(1δ)n
. The first term is smaller
than poly(n)2
(0.85δ+0.15)n
, so the second term domi-
nates, and the total runtime is poly(n)2
(1δ)n
, proving
the theorem.
Consider ways in which the runtime could be im-
proved. Suppose the evaluation time were to be im-
proved such that the polynomial R
δn
could be evalu-
ated in time poly(n)2
(1δ)n
even when δ > 0.5. With
no further changes to the algorithm, the first contribu-
tion to the runtime stemming from the time required to
compute the representation of R
δn
would now dominate
and the runtime would still exceed 2
0.5n
. Moreover, as
long as R
l
is expressed as a sum over 2
δn
terms as in
Eq. (36), it is hard to see how any current techniques
would allow this representation to be computed in less
than 2
0.5n
time when δ > 0.5.
Stated another way, this method of beating brute
force by enumerating over only a fraction (1 δ)n of
the variables and evaluating the number of solutions
when those variables have been fixed in 2
(1δ)n
time
will surely break down when δ > 0.5 because there will
be more variables not fixed than fixed, and the prepa-
ration of the efficient representation of the number of
zeros will become the slowest step.
C The moments of the gap(f ) distribu-
tion
In this appendix we compute the limiting value of
the moments of the distribution of the quantity
(gap(f)/2
n
)
2
where f is drawn uniformly at random
from the set of degree-3 polynomials with n variables
over the field F
2
. As the number of variables increases,
the values become arbitrarily close to what they would
be if gap(f)/2
n
were a random Gaussian variable with
mean 0 and variance 2
n
. We believe this observation
could be relevant in other applications. Following the
proof, we briefly comment on other possible implica-
tions and also what would be different if f is only a
uniformly random degree-2 polynomial. Then, we use
this fact to show that the amount of probability mass
in the NO part of the distribution (see Figure 5) is at
least 0.12 for sufficiently large n.
Let F
n
be the set of degree-3 polynomials over the
field F
2
with n variables and no constant term and let
ngap(f) = gap(f )/2
n
.
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 26
Theorem 8. For any k and any , there exists a con-
stant n
0
such that whenever n > n
0
|2
nk
E
f∈F
n
(ngap(f)
2k
) (2k 1)!!| , (39)
where (2k 1)!! = 1 3 5 . . . (2k 1)
Proof. A degree-3 polynomial f over F
2
with no con-
stant term can be written
f(z
1
, . . . , z
n
) =
n
X
a,b,c=1
α
abc
z
a
z
b
z
c
, (40)
where α
ijk
{0, 1}. Note that z
p
= z
2
p
= z
3
p
when
z
p
F
2
. This is not a one-to-one mapping. Degree-1
monomials z
p
correspond to the single term in the ex-
pansion for which which a = b = c = p, but degree-2
monomials z
p
z
q
correspond to the six terms in which,
for example, a = b = p and c = q. Degree-3 monomi-
als also each correspond to six terms in the expansion.
Using this correspondence
ngap(f) = 2
n
X
x
(1)
f(x)
= 2
n
X
x
(1)
P
a,b,c
α
abc
x
a
x
b
x
c
= 2
n
X
x
Y
a,b,c
(1)
α
abc
x
a
x
b
x
c
. (41)
Choosing the coefficients α
abc
uniformly at random
from {0, 1} is equivalent to choosing a degree-3 poly-
nomial from F
n
uniformly at random. Thus we may
express
2
2nk
E
f∈F
n
(ngap(f)
2k
) (42)
= E
α
X
x
1
,...,x
2k
n
Y
a,b,c=0
(1)
α
abc
(x
1
a
x
1
b
x
1
c
+...+x
2k
a
x
2k
b
x
2k
c
)
=
X
x
1
,...,x
2k
n
Y
a,b,c=0
E
α
abc
h
(1)
α
abc
(x
1
a
x
1
b
x
1
c
+...+x
2k
a
x
2k
b
x
2k
c
)
i
=
X
x
1
,...,x
2k
n
Y
a,b,c=0
1
x
1
a
x
1
b
x
1
c
+ . . . + x
2k
a
x
2k
b
x
2k
c
= 0
,
where 1 is the indicator function.
For each of the 2
2nk
terms in the sum above, there
is a corresponding 2k × n matrix X = (x
j
a
) over F
2
.
This term contributes 1 to the overall expectation if for
any three columns X
a
, X
b
, X
c
(each vectors with 2k
entries), hX
a
, X
b
, X
c
i = 0, where
hu, v, wi =
2k
X
j=1
u
j
v
j
w
j
mod 2 (43)
is an extension of the standard inner product hu, vi =
P
j
u
j
v
j
mod 2 to three vectors. Otherwise, the term
yields 0. Thus, the problem amounts to counting the
number of 2k ×n matrices X for which this condition is
met. In what follows, we will perform this counting by
showing that the number is dominated by matrices X
whose 2k rows “pair up” into k sets of 2, where rows are
identical to their pair. This is reminiscent of the Wick
contractions used to compute Gaussian integrals and is
fundamentally why the moments agree with those of a
Gaussian in the limit n .
To count the number of terms X that yield 1, we
use the properties of F
2k
2
as a 2k-dimensional vector
space with inner product , ·i. We associate with each
matrix X, a subspace H
X
F
2k
2
formed by taking linear
combinations of the columns of X. Another way to say
this is that H
X
is the linear binary code generated by
the transpose of X. For any subspace V we let V
contain vectors that are orthogonal to all the elements
of V . Note that a vector may be orthogonal to itself.
We also define the operation × to be entry-wise vector
multiplication between two vectors. This multiplication
operation turns F
2k
2
into an algebra. For any subspace
V , we let V
×
denote the subspace generated by pairwise
products u×v where u, v are in V . We will be interested
in subspaces H that satisfy the condition
H
×
H
. (44)
We claim that a term X in the sum above yields 1 if
and only if H
X
satisfies condition (44). This is seen as
follows.
If u H
X
and v H
×
X
, then hu, vi may be decom-
posed into a sum of hX
a
, X
b
×X
c
i for various a, b, c by
writing u and v as a linear combination of columns of
X and of products of columns of X, respectively. If the
term yields 1, then for any three columns X
a
, X
b
, X
c
of X, hX
a
, X
b
, X
c
i = hX
a
, X
b
×X
c
i = 0, implying that
H
×
X
is orthogonal to H
X
. Conversely, we have X
a
H
X
and X
b
×X
c
H
×
X
for all a, b, c so if the condition (44)
holds, then hX
a
, X
b
×X
c
i = hX
a
, X
b
, X
c
i = 0, implying
the term X yields 1.
Given a subspace H of dimension d satisfying condi-
tion (44), for how many matrices X does H = H
X
? All
such X can be formed by choosing n columns among
the 2
d
elements in H, giving an initial count of 2
dn
.
However, for some of these matrices, H
X
will merely be
a (strict) subset of H. The number of these matrices
is at most 2
n(d1)
2
d
since there are at most 2
d
strict
subspaces of H with dimension d 1. In the limit of
large n, this correction will vanish compared to 2
dn
.
This establishes that the exponential growth of the
number of matrices X for which H
X
= H is governed
by the dimension d of H, so to leading order we must
merely calculate how many distinct H have maximal
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 27
dimension. We claim that the maximum dimension is
d = k and the number of H with this dimension is given
by the number of distinct ways to partition 2k indices
into k pairs. We see this as follows.
Assume H satisfies the condition. Note that the di-
mension of H
is 2k d, and since H H
×
H
, d
2k d, and hence d k. When d = k, H = H
×
= H
,
which implies a couple of important facts. First, the all
ones vector 1 must be in H, since h1, ui = hu, ui = 0
and hence 1 H
= H. Second, the space H is a
subalgebra of F
2k
2
, since the fact that H = H
×
implies
it is closed under multiplication. We may choose a ba-
sis v
1
, . . . , v
k
for H, with v
1
= 1. For any fixed index
a, we may additionally require that v
j
a
= 1 (the ath
entry of v
j
) for all j, because we may always add 1
to a basis vector if v
j
a
= 0. Since H is a subalgebra,
v
1
× . . . × v
k
H, and by construction it has a 1 in
its ath entry. Moreover, 0 = h1, v
1
× . . . × v
k
i since
H H
. Therefore v
1
×. . . ×v
k
must also have a 1 in
some other entry b 6= a, and hence v
j
b
= 1 for all j. Since
there is a basis for H in which all the basis vectors have
1s in both index a and index b, every vector in H has
the same value in index a and index b. This shows that
if H has dimension k and satisfies condition (44), then
every index 1 a 2k must have a partner b 6= a for
which u
a
= u
b
for every u H. Conversely, if every in-
dex pairs up in this sense, then H must satisfy condition
(44), since each pair of indices always contributes a pair
of 0s or a pair of 1s to the sum hu, v ×wi =
P
a
u
a
v
a
w
a
mod 2.
Thus the number of distinct k-dimensional H satis-
fying condition (44) is the number of ways to pair up
the 2k indices, given by the expression
(2k 1) (2k 3) . . . 7 5 3 1 = (2k 1)!! (45)
For each of these H, there are 2
kn
X for which H
X
= H,
with corrections of at most c
k
2
(k1)n
with c
k
depending
only on k. In addition, there may be H with dimension
d < k that satisfy condition (44), but for each of these
the number of X for which H
X
= H is at most 2
dn
2
(k1)n
. Moreover, the number of subspaces H with
dimension d < k is given by some number c
0
k
depending
only on k. Thus, we may choose n
0
large enough so that
2
n
0
c
k
(2k 1) . . . 5 3 1 /2 and 2
n
0
c
0
k
/2.
Recalling the 2
2nk
prefactor in Eq. (42), this proves
the statement of the theorem.
If f were only a degree-2 polynomial, this theorem
would not hold. In particular, terms in the sum associ-
ated with matrix X would yield 1 if and only if all pairs
of columns X
a
, X
b
of X satisfy hX
a
, X
b
i = 0. It need
not also be true that hX
a
, X
b
, X
c
i = 0 for any trio of
columns. Thus the condition (44) is replaced by the less
restrictive statement H H
. Note that this condition
is precisely the statement that the linear binary code H
is contained in its dual. It is still the case that the max-
imum dimension of any such H is d = k, in which case
H = H
(i.e. H is self-dual), but there are a larger
number of subspaces H with dimension k that satisfy
the new condition, since it is possible that H = H
,
while H
×
forms a larger subspace. For example, we
can take H be generated by the columns of
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
. (46)
Here, none of the rows are identical so none of the in-
dices have paired up. As expected, H
×
6⊂ H
as seen
by the fact that the inner product of the second col-
umn with the entrywise product of the third and fourth
columns is 1. Yet it is still true that H H
.
We showed above that the number of subspaces H
that satisfy condition (44) and have dimension k is (2k
1)!!. There are additional matrices that satisfy just the
weaker degree-2 condition H = H
, bringing the total
to (whenever k > 1)
(2
2k1
2
1
)(2
2k2
2
2
) . . . (2
k+1
2
k1
)
(2
k
2
1
)(2
k
2
2
) . . . (2
k
2
k1
)
(47)
= (2
1
+ 1)(2
2
+ 1) . . . (2
k1
+ 1), (48)
which is derived by first choosing k 1 vectors that
are linearly independent to form a basis (along with the
all ones vector 1) for H (numerator), and then divid-
ing by the number of bases associated with a particular
subspace.
For k = 1, 2, 3, this number is equal to (2k 1)!!,
so the first three moments agree with a Gaussian even
for degree-2 polynomials. However, for k = 4 this ex-
pression evaluates to 135, while 7!! = 105. Indeed, as k
increases, the expression grows much more quickly, like
2
O(k
2
)
, than (2k 1)!! grows.
The statement that the moments of the quantity
gap(f) match those of a Gaussian when f is a degree-3
polynomial could have consequences beyond the scope
of this work. It might be possible to use this analysis
to formally show that the gap(f) distribution itself ap-
proaches a Gaussian. It is also important to note that if
f is instead drawn uniformly at random from the set of
all Boolean functions, the distribution of gap(f ) would
be exactly a Binomial distribution, which has the same
moments for large n. Thus, in a sense, random degree-3
polynomials might be thought to be mimicking the be-
havior of completely random Boolean functions (with
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 28
many fewer parameters), and the same cannot be said
for degree-2 polynomials. Perhaps this could be con-
nected to the fact that computing gap(f ) for degree-
3 polynomials is #P -hard, while doing the same for
degree-2 polynomials is easy.
Next we prove a statement about the probability mass
near 0 in the distribution for degree-3 polynomials.
Lemma 9. There exists an n
0
such that for all n > n
0
Pr
f∈F
n
(gap(f)/2
n
)
2
2
n2
0.12. (49)
Proof. Let I(x) be the indicator function that evalu-
ates to 1 when x 1/4 and to 0 otherwise. Suppose
p(x) is a polynomial of degree L for which p(x) I(x)
whenever x 0. By Theorem 8, the first L moments
E
f∈F
n
(2
kn
ngap(f)
2k
) can be made arbitrarily close to
their Gaussian values by taking n
0
large. Thus for any
, we may take n
0
large enough so that
Pr
f∈F
n
(gap(f)/2
n
)
2
2
n2
= E
f∈F
n
I(2
n
ngap(f)
2
)
E
f∈F
n
p(2
n
ngap(f)
2
)
E
x∼N (0,1)
(p(x
2
)) , (50)
where N(0, 1) is the Gaussian distribution with mean 0
and variance 1.
We construct a specific polynomial p(x) with degree
L = 15.
p(x) =
δ
2
1 δ
2
T
L
(
1 + A 4Ax)
2
1
, (51)
where T
L
is the Lth Chebyshev polynomial of the first
kind, δ = 0.5 and A = T
1/L
(1)
2
1 = 0.00773.
It can be verified that for any odd choice of L and
any choice of δ, this polynomial is smaller than I(x) for
all x 0. We can also give the coefficients explicitly by
writing
p(x) =
15
X
j=0
c
j
(2j 1)!!
x
j
, (52)
where {c
j
}
15
j=0
is given by
{1, 6.0672, 29.9730, 114.8688,
345.0021, 829.2997, 1620.0455, 2593.7392,
3410.0118, 3665.1216, 3183.4033, 2188.3186,
1149.8164, 435.1008, 105.8449, 12.4590}.
(53)
Then, since E
x∼N (0,1)
(x
2j
) = (2j 1)!! it is easy to
evaluate
E
x∼N (0,1)
(p(x
2
)) =
15
X
j=0
c
j
= 0.1222. (54)
This proves the claim. Note that the bound could be
mildly improved by taking smaller δ and larger L.
Corollary 9.1. The quantity
p
0
= lim inf
n→∞
max
j=0,1
Pr
f←F
prom
n
(S(f) = j)

(55)
satisfies p
0
11/12 for the poly3-SGAP problem.
Proof. The previous theorem states that at least 0.12
of the probability mass lies in the set of NO instances
(S(f) = 0) for sufficiently large n. Meanwhile, as dis-
cussed in Lemma 4, at least 1/12 of the probability
mass lies in the set of YES instances (S(f ) = 1) for
all n. In other words, the fraction of NO instances is
at most 11/12, and the fraction of YES instances is at
most 0.88, which proves the corollary.
References
[1] S. Aaronson. Quantum computing, postselection,
and probabilistic polynomial-time. Proceedings of
the Royal Society A: Mathematical, Physical and
Engineering Sciences, 461(2063):3473–3482, 2005.
DOI: 10.1098/rspa.2005.1546.
[2] S. Aaronson. A linear-optical proof that the per-
manent is #P-hard. Proceedings of the Royal
Society A: Mathematical, Physical and Engineer-
ing Sciences, 467(2136):3393–3405, 2011. DOI:
10.1098/rspa.2011.0232.
[3] S. Aaronson and A. Arkhipov. The computa-
tional complexity of linear optics. In Proceedings
of the Forty-Third Annual ACM Symposium on
Theory of Computing, pages 333–342, 2011. DOI:
10.1145/1993636.1993682.
[4] S. Aaronson and L. Chen. Complexity-theoretic
foundations of quantum supremacy experiments.
In 32nd Computational Complexity Conference
(CCC 2017), volume 79, pages 22:1–22:67, 2017.
DOI: 10.4230/LIPIcs.CCC.2017.22.
[5] S. Aaronson, A. Bouland, G. Kuperberg, and
S. Mehraban. The computational complexity
of ball permutations. In Proceedings of the
49th Annual ACM SIGACT Symposium on The-
ory of Computing, pages 317–327, 2017. DOI:
10.1145/3055399.3055453.
[6] S. Arora and B. Barak. Computational complexity:
a modern approach. Cambridge University Press,
2009.
[7] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C.
Bardin, R. Barends, R. Biswas, S. Boixo, F. G.
Brandao, D. A. Buell, et al. Quantum supremacy
using a programmable superconducting proces-
sor. Nature, 574(7779):505–510, 2019. DOI:
10.1038/s41586-019-1666-5.
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 29
[8] M. Ball, A. Rosen, M. Sabin, and P. N. Vasudevan.
Average-case fine-grained hardness. In Proceedings
of the 49th Annual ACM SIGACT Symposium on
Theory of Computing, page 483–496, 2017. DOI:
10.1145/3055399.3055466.
[9] C. Beck and R. Impagliazzo. Strong ETH holds
for regular resolution. In Proceedings of the
Forty-Fifth Annual ACM Symposium on The-
ory of Computing, page 487–494, 2013. DOI:
10.1145/2488608.2488669.
[10] R. Beigel and J. Tarui. On ACC. Compu-
tational Complexity, 4(4):350–366, 1994. DOI:
10.1007/BF01263423.
[11] J. Bermejo-Vega, D. Hangleiter, M. Schwarz,
R. Raussendorf, and J. Eisert. Architectures for
quantum simulation showing a quantum speedup.
Phys. Rev. X, 8:021010, Apr 2018. DOI:
10.1103/PhysRevX.8.021010.
[12] A. Björklund, P. Kaski, and R. Williams. Gener-
alized Kakeya sets for polynomial evaluation and
faster computation of fermionants. Algorithmica,
81(10):4010–4028, 2019. DOI: 10.1007/s00453-018-
0513-7.
[13] A. Bouland, L. Mančinska, and X. Zhang.
Complexity classification of two-qubit commut-
ing Hamiltonians. In 31st Conference on
Computational Complexity (CCC 2016), vol-
ume 50, pages 28:1–28:33, 2016. DOI:
10.4230/LIPIcs.CCC.2016.28.
[14] A. Bouland, J. F. Fitzsimons, and D. E. Koh.
Complexity classification of conjugated Clifford cir-
cuits. In 33rd Computational Complexity Confer-
ence (CCC 2018), volume 102, pages 21:1–21:25,
2018. DOI: 10.4230/LIPIcs.CCC.2018.21.
[15] A. Bouland, B. Fefferman, C. Nirkhe, and U. Vazi-
rani. On the complexity and verification of quan-
tum random circuit sampling. Nature Physics, 15
(2):159, 2019. DOI: 10.1038/s41567-018-0318-2.
[16] M. J. Bremner, R. Jozsa, and D. J. Shepherd. Clas-
sical simulation of commuting quantum computa-
tions implies collapse of the polynomial hierarchy.
Proceedings of the Royal Society A: Mathematical,
Physical and Engineering Sciences, 467(2126):459–
472, 2011. DOI: 10.1098/rspa.2010.0301.
[17] M. J. Bremner, A. Montanaro, and D. J. Shep-
herd. Average-case complexity versus approximate
simulation of commuting quantum computations.
Phys. Rev. Lett., 117:080501, Aug 2016. DOI:
10.1103/PhysRevLett.117.080501.
[18] E. Böhler, C. Glaßer, and D. Meister. Error-
bounded probabilistic computations between MA
and AM. Journal of Computer and Sys-
tem Sciences, 72(6):1043–1076, 2006. DOI:
10.1016/j.jcss.2006.05.001.
[19] C. Calabro, R. Impagliazzo, and R. Paturi. The
complexity of satisfiability of small depth circuits.
In Parameterized and Exact Computation, pages
75–85, 2009. DOI: 10.1007/978-3-642-11269-0_6.
[20] M. L. Carmosino, J. Gao, R. Impagliazzo, I. Mi-
hajlin, R. Paturi, and S. Schneider. Nondetermin-
istic extensions of the strong exponential time hy-
pothesis and consequences for non-reducibility. In
Proceedings of the 2016 ACM Conference on In-
novations in Theoretical Computer Science, page
261–270, 2016. DOI: 10.1145/2840728.2840746.
[21] P. Clifford and R. Clifford. The classical com-
plexity of boson sampling. In Proceedings of
the 2018 Annual ACM-SIAM Symposium on Dis-
crete Algorithms, pages 146–155, 2018. DOI:
10.1137/1.9781611975031.10.
[22] A. M. Dalzell. Lower bounds on the classical simu-
lation of quantum circuits for quantum supremacy.
Bachelor’s thesis, Massachusetts Institute of Tech-
nology, 2017. URL http://hdl.handle.net/
1721.1/111859.
[23] H. Dell, T. Husfeldt, D. Marx, N. Taslaman,
and M. Wahlén. Exponential time complex-
ity of the permanent and the Tutte polynomial.
ACM Trans. Algorithms, 10(4), Aug 2014. DOI:
10.1145/2635812.
[24] E. Farhi and A. W. Harrow. Quantum supremacy
through the quantum approximate optimization al-
gorithm. arXiv preprint arXiv:1602.07674, 2016.
[25] E. Farhi, J. Goldstone, and S. Gutmann. A quan-
tum approximate optimization algorithm. arXiv
preprint arXiv:1411.4028, 2014.
[26] S. Fenner, F. Green, S. Homer, and R. Pruim.
Determining acceptance possibility for a quantum
computation is hard for the polynomial hierar-
chy. Proceedings of the Royal Society of London.
Series A: Mathematical, Physical and Engineer-
ing Sciences, 455(1991):3953–3966, 1999. DOI:
10.1098/rspa.1999.0485.
[27] K. Fujii, H. Kobayashi, T. Morimae, H. Nishimura,
S. Tamate, and S. Tani. Power of quantum compu-
tation with few clean qubits. In 43rd International
Colloquium on Automata, Languages, and Pro-
gramming (ICALP 2016), volume 55, pages 13:1–
13:14, 2016. DOI: 10.4230/LIPIcs.ICALP.2016.13.
[28] K. Fujii, H. Kobayashi, T. Morimae, H. Nishimura,
S. Tamate, and S. Tani. Impossibility of classically
simulating one-clean-qubit model with multiplica-
tive error. Phys. Rev. Lett., 120:200502, May 2018.
DOI: 10.1103/PhysRevLett.120.200502.
[29] O. Goldreich and G. N. Rothblum. Worst-case
to average-case reductions for subclasses of P. In
Computational Complexity and Property Testing:
On the Interplay Between Randomness and Com-
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 30
putation, pages 249–295. 2020. DOI: 10.1007/978-
3-030-43662-9_15.
[30] Y. Han, L. A. Hemaspaandra, and T. Thierauf.
Threshold computation and cryptographic secu-
rity. SIAM Journal on Computing, 26(1):59–78,
1997. DOI: 10.1137/S0097539792240467.
[31] D. Hangleiter, J. Bermejo-Vega, M. Schwarz, and
J. Eisert. Anticoncentration theorems for schemes
showing a quantum speedup. Quantum, 2:65, May
2018. DOI: 10.22331/q-2018-05-22-65.
[32] A. W. Harrow and A. Montanaro. Quantum com-
putational supremacy. Nature, 549(7671):203–209,
2017. DOI: 10.1038/nature23458.
[33] R. Hayakawa, T. Morimae, and S. Tamaki. Fine-
grained quantum supremacy based on orthogonal
vectors, 3-sum and all-pairs shortest paths. arXiv
preprint arXiv:1902.08382, 2019.
[34] C. Huang, M. Newman, and M. Szegedy. Explicit
lower bounds on strong quantum simulation. arXiv
preprint arXiv:1804.10368, 2018.
[35] R. Impagliazzo, R. Paturi, and F. Zane. Which
problems have strongly exponential complexity?
Journal of Computer and System Sciences, 63(4):
512–530, 2001. DOI: 10.1006/jcss.2001.1774.
[36] H. Jahanjou, E. Miles, and E. Viola. Local re-
ductions. In Automata, Languages, and Program-
ming, pages 749–760, 2015. DOI: 10.1007/978-3-
662-47672-7_61.
[37] M. Jerrum and M. Snir. Some exact complexity
results for straight-line computations over semir-
ings. J. ACM, 29(3):874–897, Jul 1982. DOI:
10.1145/322326.322341.
[38] R. Jozsa and M. Van Den Nest. Classical sim-
ulation complexity of extended Clifford circuits.
Quantum Information & Computation, 14(7&8):
633–648, 2014. DOI: 10.26421/QIC14.7-8.
[39] D. E. Koh. Further extensions of Clifford circuits
and their classical simulation complexities. Quan-
tum Information & Computation, 17(3&4):0262–
0282, 2017. DOI: 10.26421/QIC17.3-4.
[40] G. Kuperberg. How hard is it to approximate the
Jones polynomial? Theory of Computing, 11(1):
183–219, 2015. DOI: 10.4086/toc.2015.v011a006.
[41] R. J. Lipton. New directions in testing. In
Distributed Computing and Cryptography, vol-
ume 2, pages 191–202, 1989. DOI: 10.1090/di-
macs/002/13.
[42] D. Lokshtanov, R. Paturi, S. Tamaki, R. Williams,
and H. Yu. Beating brute force for systems of poly-
nomial equations over finite fields. In Proceedings of
the 2017 Annual ACM-SIAM Symposium on Dis-
crete Algorithms, pages 2190–2202. 2017. DOI:
10.1137/1.9781611974782.143.
[43] A. Montanaro. Quantum circuits and low-degree
polynomials over F
2
. Journal of Physics A: Math-
ematical and Theoretical, 50(8):084002, Jan 2017.
DOI: 10.1088/1751-8121/aa565f.
[44] T. Morimae and S. Tamaki. Additive-error fine-
grained quantum supremacy. arXiv preprint
arXiv:1912.06336, 2019.
[45] T. Morimae and S. Tamaki. Fine-grained quantum
computational supremacy. Quantum Information
& Computation, 19(13&14):1089–1115, 2019. DOI:
10.26421/QIC19.13-14.
[46] T. Morimae, K. Fujii, and J. F. Fitzsimons. Hard-
ness of classically simulating the one-clean-qubit
model. Phys. Rev. Lett., 112:130502, Apr 2014.
DOI: 10.1103/PhysRevLett.112.130502.
[47] T. Morimae, Y. Takeuchi, and H. Nishimura.
Merlin-Arthur with efficient quantum Merlin and
quantum supremacy for the second level of the
Fourier hierarchy. Quantum, 2:106, Nov 2018.
DOI: 10.22331/q-2018-11-15-106.
[48] R. Movassagh. Efficient unitary paths and quan-
tum computational supremacy: A proof of average-
case hardness of random circuit sampling. arXiv
preprint arXiv:1810.04681, 2018.
[49] R. Movassagh. Cayley path and quantum compu-
tational supremacy: A proof of average-case #P-
hardness of random circuit sampling with quanti-
fied robustness. arXiv preprint arXiv:1909.06210,
2019.
[50] A. Neville, C. Sparrow, R. Clifford, E. John-
ston, P. M. Birchall, A. Montanaro, and
A. Laing. Classical boson sampling algorithms
with superior performance to near-term experi-
ments. Nature Physics, 13(12):1153, 2017. DOI:
10.1038/nphys4270.
[51] J. Preskill. Quantum computing and the entan-
glement frontier. arXiv preprint arXiv:1203.5813,
2012.
[52] P. Pudlák and R. Impagliazzo. A lower bound
for DLL algorithms for k-SAT (preliminary ver-
sion). In Proceedings of the Eleventh Annual ACM-
SIAM Symposium on Discrete Algorithms, page
128–136, 2000. URL https://dl.acm.org/doi/
abs/10.5555/338219.338244.
[53] M. Reck, A. Zeilinger, H. J. Bernstein, and
P. Bertani. Experimental realization of any dis-
crete unitary operator. Phys. Rev. Lett., 73:58–61,
Jul 1994. DOI: 10.1103/PhysRevLett.73.58.
[54] M. Roetteler, M. Naehrig, K. M. Svore, and
K. Lauter. Quantum resource estimates for com-
puting elliptic curve discrete logarithms. In Ad-
vances in Cryptology ASIACRYPT 2017, pages
241–270, 2017. DOI: 10.1007/978-3-319-70697-
9_9.
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 31
[55] H. J. Ryser. Combinatorial mathematics, vol-
ume 14. 1963. DOI: 10.5948/UPO9781614440147.
[56] D. Shepherd and M. J. Bremner. Temporally un-
structured quantum computation. Proceedings of
the Royal Society A: Mathematical, Physical and
Engineering Sciences, 465(2105):1413–1439, 2009.
DOI: 10.1098/rspa.2008.0443.
[57] L. Stockmeyer. The complexity of approximate
counting. In Proceedings of the Fifteenth Annual
ACM Symposium on Theory of Computing, page
118–126, 1983. DOI: 10.1145/800061.808740.
[58] B. M. Terhal and D. P. DiVincenzo. Adaptive
quantum computation, constant depth quantum
circuits and Arthur-Merlin games. Quantum Infor-
mation & Computation, 4(2):134–145, 2004. DOI:
10.26421/QIC4.2.
[59] S. Toda. PP is as hard as the polynomial-time hi-
erarchy. SIAM Journal on Computing, 20(5):865–
877, 1991. DOI: 10.1137/0220053.
[60] S. Toda and M. Ogiwara. Counting classes are
at least as hard as the polynomial-time hierarchy.
SIAM Journal on Computing, 21(2):316–328, 1992.
DOI: 10.1137/0221023.
[61] L. Valiant. The complexity of computing the per-
manent. Theoretical Computer Science, 8(2):189
201, 1979. DOI: 10.1016/0304-3975(79)90044-6.
[62] M. Vyalyi. QMA=PP implies that PP con-
tains PH. In ECCCTR: Electronic Colloquium
on Computational Complexity, technical reports,
2003. URL https://eccc.weizmann.ac.il/
report/2003/021/.
[63] R. Williams. A new algorithm for optimal 2-
constraint satisfaction and its implications. The-
oretical Computer Science, 348(2):357–365, 2005.
DOI: 10.1016/j.tcs.2005.09.023.
[64] R. Williams and H. Yu. Finding orthogonal vec-
tors in discrete structures. In Proceedings of
the 2014 Annual ACM-SIAM Symposium on Dis-
crete Algorithms, pages 1867–1877. 2014. DOI:
10.1137/1.9781611973402.135.
[65] R. R. Williams. Strong ETH breaks with Merlin
and Arthur: Short non-interactive proofs of batch
evaluation. In 31st Conference on Computational
Complexity (CCC 2016), volume 50, pages 2:1–
2:17, 2016. DOI: 10.4230/LIPIcs.CCC.2016.2.
[66] V. V. Williams. Hardness of easy problems: Basing
hardness on popular conjectures such as the strong
exponential time hypothesis (invited talk). In 10th
International Symposium on Parameterized and
Exact Computation (IPEC 2015), volume 43, pages
17–29, 2015. DOI: 10.4230/LIPIcs.IPEC.2015.17.
[67] A. R. Woods. Unsatisfiable systems of equations,
over a finite field. In Proceedings 39th Annual
Symposium on Foundations of Computer Science
(Cat. No.98CB36280), pages 202–211, 1998. DOI:
10.1109/SFCS.1998.743444.
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 32