How many qubits are needed for quantum computational
supremacy?
Alexander M. Dalzell
1,2
, Aram W. Harrow
3
, Dax Enshan Koh
4,5
, and Rolando L. La Placa
3
1
Institute for Quantum Information and Matter, California Institute of Technology, Pasadena, CA 91125, USA
2
Department of Physics, Massachusetts Institute of Technology, Cambrdige, Massachusetts 02139, USA
3
Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
4
Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
5
Zapata Computing, Inc., 100 Federal Street, 20th Floor, Boston, Massachusetts 02110, USA
Quantum computational supremacy argu-
ments, which describe a way for a quantum com-
puter to perform a task that cannot also be done
by a classical computer, typically require some
sort of computational assumption related to the
limitations of classical computation. One com-
mon assumption is that the polynomial hierar-
chy (PH) does not collapse, a stronger version of
the statement that P 6= NP, which leads to the
conclusion that any classical simulation of cer-
tain families of quantum circuits requires time
scaling worse than any polynomial in the size
of the circuits. However, the asymptotic na-
ture of this conclusion prevents us from calcu-
lating exactly how many qubits these quantum
circuits must have for their classical simulation
to be intractable on modern classical supercom-
puters. We refine these quantum computational
supremacy arguments and perform such a calcu-
lation by imposing fine-grained versions of the
non-collapse conjecture. Our first two conjec-
tures poly3-NSETH(a) and per-int-NSETH(b)
take specific classical counting problems related
to the number of zeros of a degree-3 polynomial
in n variables over F
2
or the permanent of an
n ×n integer-valued matrix, and assert that any
non-deterministic algorithm that solves them re-
quires 2
cn
time steps, where c {a, b}. A third
conjecture poly3-ave-SBSETH(a
0
) asserts a sim-
ilar statement about average-case algorithms liv-
ing in the exponential-time version of the com-
plexity class SBP. We analyze evidence for these
conjectures and argue that they are plausible
when a = 1/2, b = 0.999 and a
0
= 1/2.
Alexander M. Dalzell: adalzell@caltech.edu
Aram W. Harrow: aram@mit.edu
Dax Enshan Koh: dax.koh@zapatacomputing.com
Rolando L. La Placa: rlaplaca@mit.edu
Imposing poly3-NSETH(1/2) and per-int-
NSETH(0.999), and assuming that the runtime
of a hypothetical quantum circuit simulation al-
gorithm would scale linearly with the number of
gates/constraints/optical elements, we conclude
that Instantaneous Quantum Polynomial-Time
(IQP) circuits with 208 qubits and 500 gates,
Quantum Approximate Optimization Algorithm
(QAOA) circuits with 420 qubits and 500 con-
straints and boson sampling circuits (i.e. lin-
ear optical networks) with 98 photons and 500
optical elements are large enough for the task
of producing samples from their output distri-
butions up to constant multiplicative error to
be intractable on current technology. Imposing
poly3-ave-SBSETH(1/2), we additionally rule
out simulations with constant additive error for
IQP and QAOA circuits of the same size. With-
out the assumption of linearly increasing simula-
tion time, we can make analogous statements for
circuits with slightly fewer qubits but requiring
10
4
to 10
7
gates.
1 Introduction
Quantum computational supremacy (QCS) is the goal
of carrying out a computational task on a quantum
computer that cannot be performed by any classical
computer [51]. Ingredients of this include choosing an
appropriate task, building a quantum device that can
perform it, ideally verifying that it was done correctly,
and finally using arguments from complexity theory to
support the claim that no classical computer can do
the same [32]. Recent advances indicate that the ex-
perimental ingredient might be available very soon
indeed, Google recently reported [7] that it has attained
QCS with a 53 qubit superconducting device (later we
comment more on how this announcement fits in with
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 1
arXiv:1805.05224v3 [quant-ph] 30 Apr 2020
our analysis) but the choice of task, its verification,
and its complexity-theoretic justification remain impor-
tant open theoretical research questions. In particu-
lar, based on the current status of complexity theory,
establishing limitations on classical computing for the
purpose of assessing how close we are to demonstrating
QCS requires making conjectures, and thus we are pre-
sented with a range of choices. If we make stronger
conjectures then we can use a smaller and more re-
stricted quantum computer while ruling out the exis-
tence of more powerful classical simulation algorithms.
Weaker conjectures, on the other hand, are more defen-
sible and can be based on more widely studied mathe-
matical principles.
A leading example of a strong conjecture is the
Quantum Threshold Assumption (QUATH) proposed
by Aaronson and Chen [4], which states that there is no
efficient (i.e. polynomial-time) classical algorithm that
takes as input a description of a random quantum cir-
cuit C, and decides whether |h0
n
|C |0
n
i|
2
is greater or
less than the median of all |h0
n
|C |0
n
i|
2
values, with
success probability at least
1
2
+ Ω(
1
2
n
) over the choice of
C. This conjecture gives one of the strongest possible
statements about the hardness of simulating quantum
circuits that is not already ruled out by known simula-
tions.
A weaker conjecture is the statement that the polyno-
mial hierarchy (PH) does not collapse, which is closely
related to the assertion that P 6= NP. Under this as-
sumption, it has been shown that there cannot exist an
efficient classical algorithm to produce samples from the
output distribution of certain families of quantum cir-
cuits [3, 5, 11, 13, 14, 16, 24, 28, 31, 38, 39, 46, 47, 58],
up to constant multiplicative error. The three fami-
lies we focus on in this work are Instantaneous Quan-
tum Polynomial-time (IQP) circuits [16, 56], Quantum
Approximate Optimization Algorithm (QAOA) circuits
[24, 25], and boson sampling circuits (i.e. linear optical
networks) [3], all of which are among those whose simu-
lation is hard for the PH. Indeed, a key selling point for
work in QCS is that it could be based not on the con-
jectured hardness of a particular quantum circuit family
or even quantum mechanics in general, but instead on
highly plausible, purely classical computational conjec-
tures, such as the non-collapse of the PH.
However, the non-collapse of the PH is in a sense too
weak of a conjecture to be practically useful. The con-
jecture rules out polynomial-time simulation algorithms
for these families of circuits, but does not describe a
concrete superpolynomial lower bound. Thus, assum-
ing only the non-collapse of the PH would be consis-
tent with a simulation of an n-qubit quantum system
running in time n
f(n)
for an arbitrarily slowly growing
function f(n), say log log log log(n). A stronger con-
jecture might lead to a requirement that simulation al-
gorithms be exponential time, meaning that there is
some constant c for which its runtime is 2
cn
. Even
this, though, is not strong enough; it remains possible
that the constant c is sufficiently small that we cannot
rule out a scenario where highly parallelized state-of-
the-art classical supercomputers, which operate at as
many as 10
17
10
18
floating-point operations per second
(FLOPS), are able to simulate any circuit that might be
experimentally realized in the near-term. For example,
Neville et al. [50], as well as Clifford and Clifford [21] re-
cently developed classical algorithms that produce sam-
ples from the output of boson sampling circuits, the for-
mer of which has been shown to simulate n = 30 pho-
tons on a standard laptop in just half an hour, contra-
dicting the belief of many that 20 to 30 photons are suf-
ficient to demonstrate a definitive quantum advantage
over classical computation. A stronger conjecture that
restricts the value of the exponential factor c, a so-called
“fine-grained” conjecture, is needed to move forward on
assessing the viability of QCS protocols. The frame-
work of fine-grained complexity has gathered much in-
terest in its own right in the last decade (see [66] for
survey), yielding unexpected connections between the
fine-grained runtime of solutions to different problems.
In this work, we examine existing QCS arguments for
IQP, QAOA, and boson sampling circuits from a fine-
grained perspective. While many previous arguments
[3, 16, 24] center on the counting complexity class PP,
which can be related to quantum circuits via postselec-
tion [1], the fine-graining process runs more smoothly
when we instead use the counting class coC
=
P, which
was first utilized in the context of QCS in [27, 28]. The
class coC
=
P is the set of languages for which there ex-
ists an efficient classical probabilistic algorithm that ac-
cepts with probability exactly 1/2 only on inputs not in
the language. It can be related to quantum circuits via
non-determinism: coC
=
P = NQP [26], where NQP, a
quantum analogue of NP, is the class of languages for
which there exists an efficient quantum circuit that has
non-zero acceptance probability only on inputs in the
language. Moreover, this equality still holds when we re-
strict NQP to quantum computations with IQP, QAOA,
or boson sampling circuits. Additionally, it is known
that if coC
=
P were to be equal to NP, the PH would
collapse to the second level [26, 60]. Thus, by making
the assumption that there is a problem in coC
=
P that
does not admit a non-deterministic polynomial-time so-
lution, i.e. coC
=
P 6⊂ NP, we conclude that there does
not exist a classical simulation algorithm that samples
from the output distribution of IQP or QAOA circuits
up to constant multiplicative error, for this would imply
NP = NQP = coC
=
P, contradicting the assumption.
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 2
To make a fine-grained version of this statement,
we pick a specific coC
=
P-complete problem related to
the number of zeros of degree-3 polynomials over the
field F
2
, which we call poly3-NONBALANCED, and we as-
sume that poly3-NONBALANCED does not have a non-
deterministic algorithm running in fewer than T (n)
time steps for an explicit function T(n). We choose
T (n) = 2
an1
for a fixed constant a and call this conjec-
ture the degree-3 polynomial Non-deterministic Strong
Exponential Time Hypothesis (poly3-NSETH(a)). It is
clear that poly3-NSETH(a) is false when a > 1 due to
the brute-force deterministic counting algorithm that
iterates through each of the 2
n
possible inputs to the
function f. However, a non-trivial algorithm by Loksh-
tanov, Paturi, Tamaki, Williams and Yu (LPTWY) [42]
gives a better-than-brute-force, deterministic algorithm
for counting zeros to systems of degree-k polynomial
that rules out poly3-NSETH(a) whenever a > 0.9965.
It may be possible to improve this constant while keep-
ing the same basic method but, as we discuss in Ap-
pendix B, we expect any such improvements to be small.
Refuting poly3-NSETH(a) for values of a substantially
below 1 would require the development of novel tech-
niques.
Assuming poly3-NSETH(a), in Section 3 we derive
a fine-grained lower bound on the runtime for any
multiplicative-error classical simulation algorithm for
QAOA and IQP circuits with n qubits. In essence,
what we show is that a classical simulation algorithm
that beats our lower bounds could be used as a subrou-
tine to break poly3-NSETH(a). Then, we repeat the
process for boson sampling circuits with n photons by
replacing poly3-NSETH(a) with a similar conjecture we
call per-int-NSETH(b) involving the permanent of n×n
integer-valued matrices. In this case, however, there is
no known algorithm that can rule out any values of b
when b < 1. Accordingly, the lower bound we derive
on the simulation time of boson sampling circuits when
we take b = 0.999 is essentially tight, matching the run-
time of the naive simulation algorithm up to factors
logarithmic in the total runtime. Recently, a similar ap-
proach was applied to obtain lower bounds on the diffi-
culty of computing output probabilities of quantum cir-
cuits based on the SETH conjecture [34]. Our work has
the disadvantage of using a less well-studied and possi-
bly stronger conjecture (poly3-NSETH(a)) but the ad-
vantage of ruling out classical algorithms for sampling,
i.e. for the same tasks performed by the quantum com-
puter. In Section 3.4, we discuss evidence for our con-
jectures poly3-NSETH(a) and per-int-NSETH(b), and
discuss their relationship to other proposed fine-grained
conjectures.
However, realistic near-term quantum devices, which
are subject to non-negligible noise, sample from a dis-
tribution that differs from ideal with constant additive
error, not constant multiplicative error, and ruling out
additive-error classical simulation algorithms presents
additional technical challenges. Previous work ruled out
polynomial-time simulation algorithms of this type by
conjecturing the non-collapse of the PH and addition-
ally that certain problems we know to be hard for the
PH in the worst case are also hard in the average case
[3, 14, 15, 17]. In Section 4, we present an argument
that also rules out some exponential-time additive-
error simulation algorithms for IQP and QAOA circuits
based on a fine-grained conjecture we call poly3-ave-
SBSETH(a
0
). Our analysis may be viewed generally
as a fine-grained version of previous work; however,
our approach has crucial novel elements that go be-
yond simply being fine-grained. Unlike previous work
on additive-error QCS, our analysis bypasses the need
for Stockmeyer’s theorem (which would incur signifi-
cant fine-grained overhead) and as a result it is per-
haps conceptually simpler. Additionally, in order to
give evidence for poly3-ave-SBSETH(a
0
), which is both
fine-grained and average-case, we make several techni-
cal observations which could be of independent interest.
We prove an average-case black-box query lower bound,
and we give a self-reduction from the worst case to a hy-
brid average/worst-case for the problem of computing
the number of zeros of a degree-3 polynomial over F
2
.
Along the way, we also provide a more complete picture
of the distribution of the number of zeros for uniformly
random degree-3 polynomials over F
2
extending work
that showed the distribution anticoncentrates [17] by
formally proving that all its moments approach those
of a Gaussian distribution as the number of variables in
the polynomial increases.
Finally in Section 5, we show how these lower bounds
lead to estimates for the number of qubits that would
be sufficient for quantum supremacy under our con-
jectures. We conclude that classically simulating (up
to multiplicative error) general IQP circuits with 93/a
qubits, QAOA circuits with 185/a qubits, or boson sam-
pling circuits with 93/b photons would require one cen-
tury for today’s fastest supercomputers, which we con-
sider to be a good measure of intractability. For addi-
tive error we may replace a with a
0
. We believe val-
ues for a, a
0
, and b, leading to plausible conjectures
are a = 1/2, which is substantially below best known
better-than-brute-force algorithms, a
0
= 1/2, which
matches best known algorithms, and b = 0.999, which
is roughly equivalent to asserting that the best known
brute force algorithm is optimal up to subexponential
factors. The relative factor of two in the number of
qubits for QAOA circuits comes from a need for an-
cilla qubits in constructing a QAOA circuit to solve the
poly3-NONBALANCED problem. However, these circuits
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 3
must have 10
4
to 10
7
gates for these bounds to apply.
Under the additional assumption that the complexity of
any simulation algorithm would scale linearly with the
number of gates, we conclude that circuits with only
500 gates and 208 IQP qubits, 420 QAOA qubits, or
98 photons would be sufficient for QCS. By compari-
son, factoring a 1024-bit integer, which is sufficiently
beyond the capabilities of today’s classical computers
running best known algorithms, has been estimated to
require more than 2000 qubits and on the order of 10
11
gates using Shor’s algorithm [54].
2 Background
2.1 Counting complexity and quantum compu-
tational supremacy
The computational assumptions underlying our work
and many previous QCS results utilize a relationship be-
tween quantum circuits and counting complexity classes
that is not seen to exist for classical computation. To
understand this relationship, we quickly review several
definitions and key results.
Let n 1, and f : {0, 1}
n
{0, 1} be a Boolean
function. The gap of f is defined to be
gap(f) =
X
x∈{0,1}
n
(1)
f(x)
. (1)
Note that the number of zeros of f may be written
in terms of the gap, as follows:
|{x {0, 1}
n
: f(x) = 0}| =
1
2
(2
n
+ gap(f)). (2)
Various complexity classes may be defined in terms
of the gap. The class #P is defined to be the class
of functions f : {0, 1}
N for which there exists a
polynomial p and a polynomial-time Turing machine
M such that for all x {0, 1}
,
f(x) = |{y {0, 1}
p(|x|)
: M(x, y) = 0}|
=
1
2
(2
p(|x|)
+ gap(M(x, ·))). (3)
Thus, #P contains functions that count the number of
zeros of a polynomial-time computable Boolean func-
tion.
A language L is in PP if there exists a polynomial p
and a polynomial-time Turing machine M such that for
all x {0, 1}
,
x L |{y {0, 1}
p(|x|)
: M(x, y) = 0}|
< |{y {0, 1}
p(|x|)
: M(x, y) = 1}|
gap(M (x, ·)) < 0. (4)
The class NP is defined similarly, but where
x L |{y {0, 1}
p(|x|)
: M(x, y) = 1}| 6= 0
gap(M (x, ·)) 6= 2
p(|x|)
, (5)
and the class coC
=
P, where
x L |{y {0, 1}
p(|x|)
: M(x, y) = 0}|
6= |{y {0, 1}
p(|x|)
: M(x, y) = 1}|
gap(M (x, ·)) 6= 0. (6)
By interpreting M as a probabilistic algorithm and y
as the random string of bits used by M, we can redefine
NP, PP, and coC
=
P as the classes of languages for which
there exists a polynomial-time Turing machine M whose
acceptance probability on input x is non-zero, at least
1/2, and not equal to 1/2, respectively, only when x is
in the language.
Of these classes, only NP is known to be part of the
polynomial hierarchy (PH), which is a class composed
of an infinite number of levels generalizing the notion
of NP. Furthermore, the other three classes, #P, PP,
and coC
=
P, which we refer to as counting classes, are
known to be hard for the PH: Toda’s theorem [59] tells
us that a #P or PP oracle is sufficient to solve any prob-
lem in the PH in polynomial time, and other work by
Toda and Ogiwara [60] shows that there is a random-
ized reduction from any problem in the PH to a coC
=
P
problem. Stated another way, if PP or coC
=
P were to be
contained in a level of the PH, the PH would necessar-
ily collapse, meaning that the entire PH would be con-
tained within one of its levels. For example, if P = NP,
then the entire PH would be equal to P, its zeroth level.
The assumption that the PH does not collapse is thus
a stronger version of the statement P 6= NP, and it is
widely believed for similar reasons.
Furthermore, these counting classes can be connected
to quantum circuits. Aaronson showed that PP =
PostBQP [1], where PostBQP is the set of problems solv-
able by quantum circuits that have the (unphysical)
power to choose, or postselect the value of measure-
ment outcomes that normally would be probabilistic.
By contrast, classical circuits endowed with this same
power form the class PostBPP which is known to lie in
the third level of the PH [30].
The story is similar for coC
=
P. It was shown that
coC
=
P = NQP [26], where NQP is the quantum gener-
alization of the class NP, defined to be the set of lan-
guages L for which there exists a polynomial-time uni-
formly generated
1
family of circuits {C
x
} such that for
all strings x, x is the language L if and only if the quan-
tum circuit C
x
has a non-zero acceptance probability.
1
We say a circuit family {C
x
} is uniformly generated if there
exists a polynomial-time Turing machine that on input x outputs
the description of the circuit {C
x
}.
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 4
This can also be thought of as PostBQP with one-sided
error. If there existed an efficient classical algorithm to
produce samples from the output distribution of quan-
tum circuits up to constant multiplicative error, then
NP would be equal to NQP, and therefore to coC
=
P,
leading to the collapse of the PH (to the second level
[2628]).
We refer to simulation algorithms of this type as ap-
proximate simulation algorithms with multiplicative er-
ror. Stated precisely, if Q(y) is the probability that a
quantum circuit produces the output y, then a classi-
cal simulation algorithm has multiplicative error if its
probability of producing outcome y is P (y) and
|P (y) Q(y)| Q(y) (7)
for all possible outcomes y.
This contrasts with a simulation algorithm with ad-
ditive error , for which
X
y
|P (y) Q(y)| . (8)
The argument we have sketched only rules out
polynomial-time simulation algorithms with multiplica-
tive error. In Section 4, we discuss arguments
[3, 14, 15, 17] that rule out additive-error simulation
algorithms. These are more complex and require im-
posing additional conjectures.
2.2 IQP Circuits
The previous argument only considers simulation algo-
rithms for arbitrary quantum circuits, but the result can
be extended to also rule out efficient simulation algo-
rithms for subclasses of quantum circuits. An example
of one such subclass is the set of instantaneous quantum
circuits [16, 56]. Problems that can be solved by instan-
taneous quantum circuits with a polynomial number
of gates form the Instantaneous Quantum Polynomial-
time (IQP) complexity class, and we will refer to the
circuits themselves as IQP circuits. There are several
equivalent ways to define the IQP model; we do so as
follows.
An IQP circuit is a circuit where a Hadamard gate
is applied to each qubit at the beginning and end of
the computation, but the rest of the gates, which we
refer to as the internal gates, are diagonal. Each qubit
begins in the |0i state but is immediately sent to the
|+i = H |0i = (|0i+|1i)/
2 state under the Hadamard
operation, and each qubit is measured at the end of
the computation in the computational basis. All of the
internal diagonal gates commute, and therefore can be
implemented in any order. An example of an IQP cir-
cuit is shown in Figure 1.
|0i
H
T H
|0i
H
H
|0i
H T H
|0i
H
T H
|0i
H
H
Figure 1: Example of an IQP circuit. Each qubit must begin
and end with a Hadamard gate, and all internal gates must be
diagonal in the Z basis. The vertical lines indicate controlled
operations, and T refers to the gate T = exp(Z/8).
2.3 Quantum approximate optimization algo-
rithm (QAOA) circuits
Another class of circuits that is not efficiently simula-
ble classically if the polynomial hierarchy does not col-
lapse are quantum approximate optimization algorithm
(QAOA) circuits [24, 25], which have some similarities
with IQP circuits. In a sense, QAOA can be thought of
as multiple rounds of instantaneous operations.
A QAOA circuit operates on n qubits, which begin in
the |0i state but are immediately hit with a Hadamard
gate, as in the IQP model (note that [24, 25] chose a
different but equivalent convention). An integer p, and
angles γ
i
, β
i
for i = 1, 2, . . . , p are chosen. A diago-
nal Hamiltonian C is specified such that C =
P
α
C
α
where each C
α
is a constraint on a small subset of the
bits, meaning for any bit string z, either C
α
|zi = 0 or
C
α
|zi = |zi and only a few bits of z are involved in
determining which is the case. We define the Hamilto-
nian B =
P
n
j=1
X
j
, where X
j
is the Pauli-X operation
applied to qubit j, and let U(H, θ) = exp(iHθ). The
remainder of the circuit consists of applying U (C, γ
1
),
U(B, β
1
), U (C, γ
2
), U (B, β
2
), etc. for a total of 2p op-
erations. Finally the qubits are measured in the com-
putational basis. The general framework for a QAOA
circuit is depicted in Figure 2.
Since U(C, γ
j
) =
Q
α
U(C
α
, γ
j
), the gate U(C, γ
j
)
can be performed as a sequence of commuting gates that
perform the unitaries associated with the constraints
C
α
. Thus each U(C, γ
j
) could form the internal por-
tion of an instantaneous quantum circuit.
Importantly, since the operator C is a sum of many
constraints, it represents a constraint satisfaction prob-
lem. For all bit strings z, C |zi = λ
z
|zi, and a com-
mon problem asks us to find the maximum value of λ
z
.
There is evidence that QAOA circuits might be able to
approximate this optimum value of λ
z
more efficiently
than classical algorithms when p > 1 [25], so in compar-
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 5
|0i
H
U(C, γ
1
)
U(B, β
1
)
. . .
U(C, γ
p
)
U(B, β
p
)
|0i
H
. . .
|0i
H
. . .
|0i
H
. . .
|0i
H
. . .
Figure 2: Framework for a QAOA circuit. Each qubit be-
gins with a Hadamard gate, and then 2p operations are per-
formed alternating between applying Hamiltonian C and apply-
ing Hamiltonian B.
ison to IQP circuits, QAOA circuits might have more
practical value.
2.4 Boson sampling circuits
The IQP and QAOA models are restrictions on the more
general quantum circuit model. But quantum circuits
are not the only model for computation on a quantum
device. Linear quantum optical experiments, for exam-
ple, can be modeled as a system of beam splitters and
phase shifters acting upon identical photons existing in
a certain number of different optical modes. Like the
IQP and QAOA models, the linear optical model is not
believed to be as powerful as the general quantum cir-
cuit model, but under the assumption that the PH does
not collapse, it has been shown that classical simulation
up to constant multiplicative error requires more than
polynomial time [3].
The basic framework [3] for the linear optical model
is as follows. Suppose the system has n photons among
m modes. A state of the system is a superposition
P
R
α
R
|Ri, where each |Ri corresponds to a configura-
tion of the n photons among the m modes, represented
by the tuple R = (r
1
, . . . , r
m
) where each r
i
is a non-
negative integer and
P
i
r
i
= n.
Passing these photons through a linear optical net-
work composed of beam splitters and phase shifters,
which we call a boson sampling circuit, gives rise to a
transformation on this Hilbert space. Valid transfor-
mations can be written as φ(U), where U is any m ×m
unitary and φ is a fixed
n+m1
n
-dimensional represen-
tation of U(m). The unitary U fully describes the choice
of circuit, and any U can be exactly implemented us-
ing only m(m + 1)/2 total beam splitters and phase
shifters [53]. We can define φ(U) by its matrix elements
hR|φ(U) |R
0
i, which will be related to the permanent of
n × n matrices formed from U . The permanent of an
n × n matrix A is given by the formula
Per(A) =
X
σ∈S
n
n
Y
i=1
A
i,σ(i)
, (9)
where S
n
is the group of permutations on {1, . . . , n}.
Then, the matrix elements are
hR|φ(U) |R
0
i =
Per(U
(R,R
0
)
)
p
r
1
! . . . r
m
!r
0
1
! . . . r
0
m
!
, (10)
where U
(R,R
0
)
is the n × n matrix formed by taking r
i
copies of row i and r
0
j
copies of column j from U [3].
As an example, if n = 3, m = 2, R = (2, 1), R
0
= (1, 2),
and
U =
1
2
1 i
i 1
, (11)
then
U
(R,R
0
)
=
1
2
1 i i
1 i i
i 1 1
. (12)
This sampling task is called BosonSampling since it
could (in theory) be applied to any system of not only
photons but any non-interacting bosons.
3 Simulation algorithms with multi-
plicative error
The goal of this section will be to perform a fine-
grained analysis that rules out certain multiplicative-
error classical simulation algorithms for the three pre-
viously mentioned families of quantum circuits.
3.1 Degree-3 polynomials and the problem
poly3-NONBALANCED
Each of the three quantum circuit families that we have
defined are especially amenable to this analysis due to
their natural connection to specific counting problems.
The specific counting problem we will use
for our analysis of IQP and QAOA we call
poly3-NONBALANCED. The input to the problem is
a polynomial over the field F
2
in n variables with
degree at most 3 and no constant term. Since the
only non-zero element in F
2
is 1, every term in the
polynomial has coefficient 1. One example could be
f(z) = z
1
+ z
2
+ z
1
z
2
+ z
1
z
2
z
3
. Evaluating f for
a given string z to determine whether f (z) = 0 or
f(z) = 1 can be done efficiently, but since there are
2
n
possible strings z, the brute-force method takes
exponential time to count the number of strings z for
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 6
which f(z) = 0, or equivalently, to compute gap(f)
where gap is given by Eq. (1). LPTWY [42] gave
a deterministic algorithm for computing the gap of
degree-3 polynomials in time scaling slightly better
than brute force, but it still has exponential time
poly(n)2
0.9965n
.
The question posed by poly3-NONBALANCED is
whether gap(f ) 6= 0, that is, whether f has the same
number of 0 and 1 outputs. Thus, poly3-NONBALANCED
is in the class coC
=
P.
The problem poly3-NONBALANCED is a natural prob-
lem to work with because there is an elegant correspon-
dence between degree-3 polynomials and IQP circuits
involving Pauli Z gates, controlled-Z (CZ) gates, and
controlled-controlled-Z (CCZ) gates [43]. Specifically,
if f is degree 3 then let
U
0
f
=
X
zF
n
2
(1)
f(z)
|zihz| (13)
and let U
f
= H
n
U
0
f
H
n
. We can implement an IQP
circuit C
f
that evaluates to U
f
as follows: if the term
z
i
appears in f , then within the diagonal portion of
C
f
we perform the gate Z on qubit i; if the term z
i
z
j
appears, we perform the CZ gate between qubits i and
j; and if the term z
i
z
j
z
k
appears, we perform the CCZ
gate between the three qubits. For example, for the
polynomial f (z) = z
1
+ z
2
+ z
1
z
2
+ z
1
z
2
z
3
, the circuit
C
f
is shown in Figure 3.
|0i
H Z
H
|0i
H Z
H
|0i
H
H
Figure 3: IQP circuit C
f
corresponding to the degree-3 poly-
nomial f(z) = z
1
+ z
2
+ z
1
z
2
+ z
1
z
2
z
3
. The unitary U
f
im-
plemented by the circuit has the property that
¯
0
U
f
¯
0
=
gap(f)/2
n
where in this case n = 3.
The crucial property of this correspondence is that
¯
0
U
f
¯
0
=
gap(f)
2
n
, where
¯
0
is shorthand for the
starting |0i
n
state. This is easily seen by not-
ing that the initial set of H gates generates the
equal superposition state |Bi =
P
2
n
1
x=0
|xi/
2
n
, so
¯
0
U
f
¯
0
= hB|U
0
f
|Bi where U
0
f
is implemented by
the internal diagonal portion of C
f
. Since U
0
f
ap-
plies a (1) phase to states |zi for which f(z) =
1,
¯
0
U
f
¯
0
=
P
2
n
1
y=0
P
2
n
1
x=0
(1)
f(x)
hy|xi/2
n
=
P
2
n
1
x=0
(1)
f(x)
/2
n
= gap(f)/2
n
. Thus, gap(f) can
be computed by calculating the amplitude of the
¯
0
state produced by the circuit. If we define acceptance
to occur when
¯
0
is obtained upon measurement, then
the circuit C
f
has non-zero acceptance probability only
when gap(f ) 6= 0. This illustrates an explicit NQP al-
gorithm for poly3-NONBALANCED, which was guaranteed
to exist since NQP = coC
=
P.
Also crucial to note is that poly3-NONBALANCED is
complete for the class coC
=
P. This is shown by adapt-
ing Montanaro’s proof [43] that computing gap(f ) for a
degree-3 polynomial f over F
2
is #P-complete. In that
proof, Montanaro reduces from the problem of comput-
ing gap(g) for an arbitrary Boolean function g, which is
#P-complete by definition. Since whether gap(g) 6= 0
is coC
=
P-complete by definition, and the reduction has
gap(g) 6= 0 if and only if gap(f) 6= 0, this also shows
that poly3-NONBALANCED is coC
=
P-complete. One im-
mediate consequence of this fact is that NIQP, the class
NQP restricted to quantum circuits of the IQP type, is
equal to coC
=
P (and hence NQP), since the circuit C
f
is an NIQP solution to a coC
=
P-complete problem.
3.2 The permanent and the problem
per-int-NONZERO
In close analogy to the correspondence between degree-3
polynomials and IQP circuits composed of Z, CZ, and
CCZ gates, there is a correspondence between matrix
permanents and boson sampling circuits.
We have already seen in the definition of the linear
optical model that any amplitude in a boson sampling
circuit on n photons can be recast as the permanent
of an n × n matrix, but the converse is also true: the
permanent of any n ×n matrix can be encoded into the
amplitude of a boson sampling circuit on n photons, up
to a known constant of proportionality.
To see how this works, given an n × n complex ma-
trix A, we will construct a 2n × 2n unitary matrix U
A
whose upper-left n × n block is equal to cA for some
c > 0. If we take R = R
0
= (1
n
, 0
n
) (i.e. 1 repeated
n times, followed by 0 repeated n times), then we will
have Per(U
A(R,R
0
)
) = c
n
Per(A). Thus Per(A) is pro-
portional to a particular boson sampling amplitude with
c an easily computable proportionality constant.
We can choose c to be kAk
1
, where kAk is the
largest singular value of A. (Note that if we want the
proportionality to hold uniformly across some class of A,
we should choose c to satisfy ckAk 1 for all A in this
class.) Then {cA,
p
I
n
c
2
A
A} are Kraus operators
for a valid quantum operation, where I
n
is the n × n
identity matrix, and
cA
p
I
n
c
2
A
A
(14)
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 7
is an isometry. We can extend this isometry to the
following unitary.
U
A
=
"
cA D
p
I
n
c
2
A
A
1
I
n
c
2
A
A
cA
D
#
, (15)
where D =
I
n
+ c
2
A(I
n
c
2
A
A)
1
A
1/2
, which is
well defined since the argument of the inverse square
root is positive definite and Hermitian. Thus the per-
manent of an arbitrary n × n matrix can be encoded
into a boson sampling circuit with n photons and 2n
modes.
The matrix permanent is playing the role for boson
sampling circuits that the gap of degree-3 polynomials
played for IQP circuits with Z, CZ, and CCZ gates;
thus, it is natural to use the computational problem of
determining if the permanent of an integer-valued ma-
trix is not equal to 0, which we call per-int-NONZERO,
in place of poly3-NONBALANCED.
In fact, per-int-NONZERO and poly3-NONBALANCED
have several similarities. For example, like comput-
ing the number of zeros of a degree-3 polynomial,
computing the permanent of an integer-valued ma-
trix is #P-complete, a fact famously first demon-
strated by Valiant [61], and later reproved by Aaron-
son [2] using the linear optical framework. This com-
pleteness extends to per-int-NONZERO, which we show
in Appendix A is coC
=
P-complete by reduction from
poly3-NONBALANCED.
Additionally, for both problems, the best known algo-
rithm is exponential and has runtime close to or equal-
ing 2
n
. While poly3-NONBALANCED can be solved in
poly(n)2
0.9965n
time, the best known algorithm for com-
puting the permanent [12] requires 2
nΩ(
n/ log log(n))
deterministic time, which is only a subexponential im-
provement over the naive algorithm that utilizes Ryser’s
formula for the permanent [55] and requires at least
n2
n
basic arithmetic operations. Using Ryser’s formula
is an improvement over the O(n!) time steps implied
by Eq. (9), but its scaling is reminiscent of that re-
quired to solve a #P problem by brute force. In prin-
ciple it is possible that a faster algorithm exists for
per-int-NONZERO, where we do not care about the ac-
tual value of the permanent, only whether it is nonzero,
but such methods are only known in special cases, such
as nonnegative matrices.
Crucially, our construction shows that boson sam-
pling circuits can solve per-int-NONZERO in non-
deterministic polynomial time, since given A we have
shown how to construct a circuit corresponding to
unitary U
A
with acceptance probability that is non-
zero only when Per(A) is non-zero. This shows that
NBosonP, the linear optical analogue of NIQP, is equal
to coC
=
P and by extension, to NQP.
3.3 Lower bounds for multiplicative-error simu-
lations
3.3.1 For IQP Circuits
In the previous section, we described how to construct
an n-qubit IQP circuit C
f
corresponding to a degree-3
polynomial f over n variables such that the acceptance
probability of C
f
is non-zero if and only if gap(f ) 6= 0.
The number of terms in f, and hence the number of
internal diagonal gates in C
f
is at most
g
1
(n) =
n
3
+
n
2
+
n
1
= (n
3
+ 5n)/6. (16)
Now, suppose we had a classical algorithm that, for any
q, produces samples from the output distribution of any
IQP circuit with q qubits and g
1
(q) internal gates, up to
some multiplicative error constant, in s
1
(q) time steps
for some function s
1
. Throughout, we will assume all
classical algorithms run in the Word RAM model of
computation.
Using this algorithm to simulate the IQP circuit C
f
generates a non-deterministic classical algorithm for
poly3-NONBALANCED running in s
1
(n) time steps. That
is, the classical probabilistic algorithm that results from
this simulation accepts on at least one computational
path if only if the function f is not balanced.
Now, we impose a fine-grained version of the non-
collapse assumption, which we motivate later in the sec-
tion.
Conjecture 1. [poly3-NSETH(a)]
Any non-deterministic classical algorithm (in the
Word RAM model of computation) that solves
poly3-NONBALANCED requires in the worst case 2
an1
time steps, where n is the number of variables in the
poly3-NONBALANCED instance.
In the Word RAM model with word size w, mem-
ory is infinite and basic arithmetic operations on words
of length w take one time step. For concreteness,
we assume that w = log
2
(N) where N is the length
of the input encoding the degree-3 polynomial (N =
O(g
1
(n) log
2
(n))). This way the words can index the
locations where the input data is stored. The Word
RAM model has previously been used for fine-grained
analyses [66] and aims to represent how a real computer
operates as faithfully as possible.
Our conjecture immediately yields a lower bound on
the simulation function s
1
:
s
1
(n) 2
an1
. (17)
This lower bound result relies on poly3-NSETH(a),
which we have not yet motivated. In particular, for
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 8
our qubit calculations we will take the specific value
of a = 1/2. This value is comfortably below the best
known limit a < 0.9965 from [42], whose algorithm is
reproduced in Appendix B. In Section 3.4, we attempt
to provide additional motivation for poly3-NSETH(1/2)
by showing its consistency with other fine-grained con-
jectures.
To our knowledge, the best known upper bound
on s
1
(n) comes from the the naive poly(n)2
n
-time
“Schrödinger-style” simulation algorithm that updates
each of the 2
n
amplitudes describing the state vector
after each gate is performed, so this lower bound is not
tight.
3.3.2 For QAOA circuits
To perform the same analysis for QAOA circuits, we
will turn the IQP circuit C
f
into a QAOA circuit. The
modifications required are straightforward. We set p,
the number of rounds of QAOA computation, equal to
1, and we let rotation angles γ = π/2 and β = π/4.
The first layer of Hadamard gates in C
f
is already built
into the QAOA framework. To implement the Z, CZ,
and CCZ gates we write Z = exp(i2γ |1ih1|), CZ =
exp(i2γ |11ih11|), and CCZ = exp(i2γ |111ih111|)
and build our constraint Hamiltonian C accordingly:
for each Z gate we add two copies of the constraint
that is satisfied only when the bit acted upon is 1; for
each CZ gate we add two copies of the constraint that
is satisfied when both bits involved are 1; and for each
CCZ gate we add two copies of the constraint that is
satisfied when all three bits involved are 1. Now, the
operation exp(C) has exactly the effect of all the Z,
CZ, and CCZ gates combined.
The final step is to implement the final column of
H gates, which is not built into the QAOA framework.
First we define
˜
H =
1
2
1 i
i 1
= exp
i
π
4
X
= exp(X).
(18)
Note that
˜
H = H exp(i
π
4
Z)H, which can be rewrit-
ten H = exp( |1ih1|)H
˜
H (up to an unimportant
global phase). Thus, we can replace the H gates in the
final column of the circuit C
f
with right-hand-side of
the previous expression, the first factor of which can be
performed by adding one copy of the |1ih1| constraint
to C. As described in [24], the second factor (H gate)
can be implemented by introducing an ancilla qubit and
four new constraints between the original qubit and the
ancilla. The original qubit is measured and if outcome
|0i is obtained, the state of the ancilla is H applied to
the input state on the original qubit. Thus we have
teleported the H gate onto the ancilla qubit within the
QAOA framework. This is described in full in [24], and
we reproduce the gadget in Figure 4.
|0i
H
Q
H |ψi
|ψi
˜
H
h0|
Figure 4: Gadget that uses an ancilla qubit to implement the
H gate within the QAOA framework. Here the gate Q is the
diagonal two-qubit gate diag(1, i, 1, i) which can be written as
exp(i
π
2
(3 |01i h01|+|11i h11|)). Thus, it can be implemented
by adding 4 constraints to the constraint Hamiltonian C. The
˜
H gate is implemented by applying the Hamiltonian B with
β = π/4.
After replacing each H gate with the gadget from
Figure 4, every qubit begins with an H gate, is acted
upon by exp(C), and ends with a
˜
H gate, which
is implemented by the exp(B) step of the QAOA
framework. Thus, the resulting circuit is a QAOA cir-
cuit.
We had to introduce one ancilla per qubit in C
f
, so
our QAOA circuit has 2n qubits, instead of just n. How-
ever, it is still true that
¯
0
V
f
¯
0
gap(f), where V
f
is now the unitary implemented by this new QAOA cir-
cuit and
¯
0
is the state |0i
2n
. Hence the acceptance
probability is non-zero if and only if f is not balanced.
The circuit requires 2 constraints per term in the
polynomial f, and an additional 5 constraints per qubit
for the Hadamard gates at the end of the computation
(1 from introducing
˜
H and 4 from the gadget in Figure
4). This yields at most
g
2
(2n) = 2g
1
(n) + 5n
= (n
3
+ 20n)/3 (19)
constraints.
As in the IQP case, we suppose a classical simula-
tion algorithm produces samples from the output dis-
tribution of QAOA circuits with q qubits and g
2
(q)
constraints, up to multiplicative error constant, in
time s
2
(q). Then, under the same conjecture poly3-
NSETH(a), we have
s
2
(2n) 2
an1
, (20)
which simplifies to
s
2
(n) 2
an
2
1
. (21)
The exponentiality of this lower bound is weaker by
a factor of two in comparison to the lower bound for
IQP circuits in Eq. (17), due to the fact that one an-
cilla was introduced per variable to turn the circuit C
f
into a QAOA circuit. However, the best known up-
per bound for QAOA simulation is the naive poly(n)2
n
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 9
brute-force algorithm, as was the case for IQP circuits.
This indicates that one might be able to eliminate the
factor of two by replacing poly3-NONBALANCED with an-
other problem. Such a problem should be solvable by a
QAOA circuit but not by non-deterministic algorithms
running much faster than brute force. We leave this for
future work.
3.3.3 For boson sampling circuits
The story for boson sampling circuits is nearly identi-
cal, except using a conjecture related to the problem
per-int-NONZERO instead of poly3-NONBALANCED.
Given an integer-valued n × n matrix A, we showed
previously how to construct a boson sampling circuit
with n photons, described by unitary U
A
, that has non-
zero acceptance probability only when Per(A) 6= 0. This
circuit has 2n modes, and hence requires at most
g
3
(n) = (2n)(2n + 1)/2 = 2n
2
+ n (22)
circuit elements, i.e. beam splitters and phase shifters.
Paralleling our IQP and QAOA analysis, we suppose
we have a classical algorithm that produces samples
from the output distribution of a boson sampling cir-
cuit with q photons and g
3
(q) total beam splitters and
phase shifters, up to some multiplicative error constant,
in s
3
(q) time steps for some function s
3
.
Using this algorithm to simulate the boson sampling
circuit described by U
A
generates a non-deterministic
algorithm for per-int-NONZERO running in s
3
(n) time
steps.
We replace poly3-NSETH(a) with the version for
per-int-NONZERO
Conjecture 2. [per-int-NSETH(b)] Any non-
deterministic classical algorithm (in the Word RAM
model of computation) that solves per-int-NONZERO
requires in the worst case 2
bn1
time steps, where n is
the number of rows in the per-int-NONZERO instance.
Unlike poly3-NSETH(a), as far as we are aware there
is no known better-than-brute force algorithm ruling
out the conjecture for any value b < 1. The algorithm in
[12], which is better-than-brute-force by subexponential
factors rules out b = 1.
This conjecture implies a lower bound on the simula-
tion function
s
3
(n) 2
bn1
. (23)
Producing samples from the output of boson sam-
pling circuits naively requires one to compute the per-
manent for many of the amplitudes. However, in the
case of a binary output, where acceptance is defined
to correspond to exactly one photon configuration, only
one permanent need be calculated the one associated
with the accepting configuration. Thus the asymptotic
scaling of this lower bound when b = 1 δ is essentially
tight with naive simulation methods as δ 0, since
Ryser’s formula can be used to evaluate the permanent
and simulate a boson sampling circuit in O(n2
n
) time
steps.
3.4 Evidence for conjectures
Where previous quantum computational supremacy ar-
guments only ruled out simulation algorithms with
polynomial runtime, our analysis also rules out some al-
gorithms with exponential runtime. These conclusions
come at the expense of imposing stronger, fine-grained
conjectures, but such assumptions are necessary for ex-
tracting the fine-grained lower bounds we seek.
Thus, our conjectures are necessarily less plausible
than the statement that the PH does not collapse, and
definitively proving our conjectures is impossible with-
out simultaneously settling major open problems in
complexity theory. However, we can give evidence for
these conjectures by thinking about how one might try
to refute them, and showing how they fit into the land-
scape of previously proposed fine-grained conjectures.
We start with poly3-NSETH(a) and discuss why cer-
tain techniques for refuting it cannot work, how cur-
rent techniques fall short of refuting it for values of a
significantly lower than 1, and why we should expect
that completely different techniques would be needed
to produce algorithms that rule out a < 1/2. Then, we
discuss how poly3-NSETH(a) fits in consistently with
other results in fine-grained complexity theory. Finally,
we discuss how per-int-NSETH(b) is similar and differ-
ent in these regards.
The conjecture poly3-NSETH(a) asserts that deter-
mining whether a Boolean function is balanced takes
non-deterministic exponential time, where that Boolean
function takes the form of a degree-3 polynomial. It is
worth noting that we can prove this conjecture with
a = 1 for Boolean functions in the black-box setting,
where the non-deterministic algorithm can only inter-
act with the Boolean function by querying its value on
certain inputs.
Theorem 1. Let f : {0, 1}
n
{0, 1} be a Boolean
function. A non-deterministic algorithm with black-box
access to f that accepts iff |{x : f (x) = 0}| 6= 2
n1
, that
is, iff f is not balanced, must make at least 2
n1
+ 1
queries to f. Moreover, this bound is optimal.
Proof. First we prove the lower bound on the number
of queries. Suppose M is a non-deterministic algorithm
with black-box access to f that accepts whenever f
is not balanced. Let f
0
be a Boolean function that
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 10
is not balanced; thus, at least one computation path
of M accepts if f = f
0
. Choose one such path and
let S {0, 1}
n
be the set of queries made by M on
this computation path. Suppose for contradiction that
|S| 2
n1
. Since at most half the possible inputs are
in S, it is possible to construct another Boolean func-
tion f
1
that is balanced and agrees with f
0
on the set
S. Since f
0
and f
1
agree on S, the computation that
accepted when f = f
0
will proceed identically and ac-
cept when f = f
1
. Thus M accepts when f = f
1
, which
is balanced, yielding a contradiction. We conclude that
|S| 2
n1
+ 1.
We can see that it is possible for M to achieve
this bound as follows: M non-deterministically chooses
2
n1
+ 1 of the 2
n
possible inputs to f and queries f
on these inputs. If all of the queries yield the same
value, it accepts. Otherwise, it rejects. If f is balanced,
M will reject no matter which set of queries it makes,
whereas if f is not balanced, there is at least one set of
2
n1
+ 1 inputs on which f takes the same value and
M will accept, so the algorithm succeeds.
Theorem 1 shows that the poly3-NSETH(1) conjec-
ture cannot be disproved using an algorithm that simply
evaluates the degree-3 polynomial f for different inputs.
Indeed, the algorithm by LPTWY [42] exploits the fact
that the Boolean functions are degree-3 polynomials in
order to refute poly3-NSETH(a) for a > 0.9965. Refut-
ing poly3-NSETH(a) for even smaller values of a would
require more techniques that further utilize the struc-
ture associated with the poly3-NONBALANCED problem.
In fact, the algorithm in [42] is substantially more
general than what is necessary for our purposes; their
deterministic algorithm counts the number of solutions
to a system of m degree-k polynomial equations over fi-
nite field F
q
. The problem poly3-NONBALANCED is con-
cerned only with the case where m = 1, k = 3, q = 2,
and all that matters is whether the number of zeros is
equal to half the possible inputs. For this special case,
the algorithm is considerably simpler, and we reproduce
it in Appendix B. The basic technique is as follows: we
fix some fraction (1 δ) of the n variables and call R
the number of zeros of f consistent with those fixed val-
ues. We can compute in time O(2
0.15n+0.85δn
) a repre-
sentation of R as a polynomial with integer coefficients
over the (1 δ)n fixed variables. Then, (even though
R has an exponential number of monomials in its rep-
resentation) it is noted that one can evaluate R for all
2
(1δ)n
possible inputs in total time O(2
(1δ)n
), as long
as δ < 0.0035. By evaluating and summing R on all of
its inputs, we compute the total number of zeros, and
the total runtime is O(2
(1δ)n
), which is better than
brute force when we choose δ positive.
Note that this algorithm is deterministic, and giving
it the power of non-determinism can only make it faster.
However, by inspection of the algorithm from [42], we
see no clear way for non-determinism to be directly uti-
lized to further accelerate the algorithm. This is con-
sistent with the finding in Theorem 1 that (asymptot-
ically speaking) the best non-deterministic algorithms
are no faster than the best deterministic algorithms
for the NONBALANCED problem in the black-box setting.
However, it is worth mentioning that a gap between
best known deterministic and non-deterministic algo-
rithms has been observed for certain coNP-hard prob-
lems, for example in [67], where the problem of deter-
mining the unsatisfiability of a system of m degree-2
polynomials in n variables over F
2
is shown to be possi-
ble in poly(n)2
n/2
non-deterministic time, an improve-
ment over best known O(2
0.8765n
) deterministic solu-
tion from LPTWY [42]. Additionally, when randomness
is added to non-determinism yielding what is known
as a Merlin-Arthur (MA) protocol, similar results can
be shown even for #P-hard problems like computing
the permanent of n × n matrices, which is possible in
poly(n)2
n/2
MA time [65] compared to O(2
n
) determin-
istic time. These results cast some doubt on the as-
sumption that non-determinism is a useless resource for
solving poly3-NONBALANCED or per-int-NONZERO, al-
though notably none of these methods appear to break
the Ω(2
n/2
) barrier.
Additionally, we mention that the authors of LPTWY
[42] were concerned primarily with showing that better-
than-brute-force algorithms were possible, perhaps leav-
ing room for optimization of their constants. In our re-
production of their algorithm when m = 1, k = 3, and
q = 2 in Appendix B, we have followed their analysis
and optimized the constants where possible yielding a
slightly better runtime than what is stated explicitly in
their paper.
The conclusion is that techniques exist to rule out
poly3-NSETH(1) but not for values of a much lower
than 1, even after some attempt at optimization. More-
over, we now provide evidence that drastically differ-
ent techniques would need to be used if one wished to
rule out poly3-NSETH(1/2); that is, ruling out poly3-
NSETH(a) when a < 1/2 could not be done by mak-
ing only slight modifications or improvements using the
same approach from [42]. Our reasoning stems from the
tradeoff between the two contributions to the runtime
of the algorithm: first, the computation of the polyno-
mial representation for R; and second, the evaluation
of R for all 2
(1δ)n
possible inputs. When δ is smaller
than 0.0035, the second contribution dominates for a to-
tal runtime O(2
(1δ)n
). However, if this step were to be
improved to allow for δ to exceed 1/2, the first contribu-
tion to the runtime would begin to dominate for a total
Accepted in Quantum 2020-04-17, click title to verify. Published under CC-BY 4.0. 11
runtime of O(2
0.15n+0.85δn
) > O(2
n/2
). In other words,
if we try to fix fewer than half of the variables, com-
puting the representation of R (which involves cycling
through the 2
δn
strings of unfixed variables) will neces-
sarily take longer than evaluating R and ultimately it
will be impossible to produce an algorithm with runtime
below 2
n/2
through this method. While this is no proof,
it increases the plausibility of poly3-NSETH(1/2).
Next we discuss how poly3-NSETH(a) contrasts with
previously proposed fine-grained conjectures. Well-
known conjectures include the Exponential Time Hy-
pothesis (ETH), which claims that there exists some c
such that no O(2
cn
) time algorithm for k-SAT exists,
and the Strong Exponential-Time Hypothesis (SETH)
[19, 35], which states that for any one can choose k
large enough such that there is no O(2
(1)n
) algorithm
for k-SAT. In other words, SETH states that no algo-
rithm for k-SAT does substantially better than the naive
brute-force algorithm when k is unbounded.
There is substantial evidence for ETH and SETH,
even beyond the fact that decades of research on the SAT
problem have failed to refute them. For instance, SETH
implies fine-grained lower bounds on problems in P that
match long-established upper bounds. One example is
the orthogonal vectors (OV) problem, which asks if a
set of n vectors has a pair that is orthogonal. There
is a brute-force O(n
2
) solution to OV, but O(n
2
) is
impossible for any > 0 assuming SETH [63, 64]. Thus,
SETH being true would provide a satisfying rationale
for why attempts to find faster algorithms for problems
like OV have failed. On the other hand, the refutation of
SETH would imply the existence of novel circuit lower
bounds [36].
There are yet more fine-grained conjectures: replac-
ing the problem k-SAT with #k-SAT yields #ETH and
#SETH, the counting versions of ETH and SETH.
These<