Magic State Distillation: Not as Costly as You Think
Daniel Litinski @ Dahlem Center for Complex Quantum Systems, Freie Universit
¨
at Berlin, Arnimallee 14, 14195 Berlin, Germany
Despite significant overhead reductions since
its first proposal, magic state distillation is often
considered to be a very costly procedure that
dominates the resource cost of fault-tolerant
quantum computers. The goal of this work is
to demonstrate that this is not true. By writ-
ing distillation circuits in a form that separates
qubits that are capable of error detection from
those that are not, most logical qubits used for
distillation can be encoded at a very low code
distance. This significantly reduces the space-
time cost of distillation, as well as the number
of qubits. In extreme cases, it can cost less to
distill a magic state than to perform a logical
Clifford gate on full-distance logical qubits.
Quantum error correction is expected to be an es-
sential part of a large-scale quantum computer [13].
The most promising quantum error-correcting codes
are two-dimensional topological codes such as surface
codes [4, 5]. With these codes, the preparation of log-
ical Pauli eigenstates and the measurement of logical
Pauli product operators (e.g., via lattice surgery [6
8]) are fault-tolerant operations, which is sufficient for
fault-tolerant logical Clifford gates. However, logical
non-Clifford gates, such as T gates, cannot be executed
directly. Instead, they can be performed by preparing
a resource state that is consumed to execute a non-
Clifford gate [9]. For T gates, this resource state is a
magic state |mi = (|0i + e
/4
|1i)/
2. Such a state
can be used to perform a π/8 rotation P
π/8
= e
iP π/8
,
where P is an n-qubit Pauli product operator from the
set {X, Y, Z, }
n
. Here, X, Y and Z are Pauli op-
erators, such that Z
π/8
corresponds to a T gate. If a
magic state is available, a logical n-qubit P
π/8
gate is
performed by measuring the logical Pauli product P Z
acting on the n qubits and the magic state, see Fig. 1.
The problem with magic states is that, with surface
codes, only faulty magic states can be prepared. They
are faulty in the sense that they are initialized with
an error probability proportional to the physical error
rate p
phys
, regardless of the code distance. If these
states are used to perform logical P
π/8
rotations, one
out of every 1/p
phys
logical gates is expected to be
faulty. Since faulty gates spoil the outcome of a com-
putation, but classically intractable quantum compu-
tations with a useful computational result typically in-
volve more than 10
8
T gates [10], low-error magic states
are required to execute gates with a low error probabil-
ity. One possibility to generate low-error magic states is
via a magic state distillation protocol. These protocols
are short error-detecting quantum computations that
use multiple high-error magic states to generate fewer
low-error states. Many such protocols [8, 1121] have
been developed since magic state distillation was first
proposed [9], gradually decreasing the cost of distilla-
tion. Even though state-of-the-art protocols are orders
of magnitude more efficient than the earliest propos-
als, magic state distillation is still often described as
a costly procedure and the leading contributing factor
to the overhead of fault-tolerant quantum computing,
which is the primary motivation for research into alter-
natives to magic state distillation [2227].
In this work, we reduce the cost of distillation by an-
other order of magnitude. On the level of circuits, none
of the distillation protocols discussed in this work are
new. Rather, the circuits are written in a way that the
number of qubits is low and the circuit depth is high.
The overhead reduction is achieved by finding surface-
code implementations of these protocols in which the
code distance of each surface-code patch is never higher
than required to achieve a specific output error prob-
ability, as was previously proposed in Ref. [21]. This
yields protocols that not only have a low space-time
cost, but also a small qubit footprint.
The cost of distillation. How does one quantify
the cost of a distillation protocol? The space-time cost
is often quantified in units of d
3
, where d is the code
distance. However, this can be confusing, if different
code distances are used in different parts of the quantum
computer. Consider an n
q
-qubit quantum computation
with n
T
T gates and a quantum computer consisting of
a block of qubits used to distill magic states and a block
of qubits used to store the n
q
data qubits and consume
Figure 1: A P
π/8
rotation on n qubits can be performed by
measuring the Pauli product operator P Z acting on the n
qubits and a magic state |mi = (|0i + e
iπ/4
|1i)/
2. The
magic state is discarded via an X measurement. Measurement
outcomes of 1 of the P Z or X measurement prompt a
P
π/4
or P
π/2
correction, respectively.
Accepted in Quantum 2019-10-30, click title to verify 1
arXiv:1905.06903v3 [quant-ph] 6 Nov 2019
Protocol p
phys
p
out
Qubits Cycles
Space-time cost per output state
Qubitcycles Full distance
(15-to-1)
7,3,3
10
4
4.4 × 10
8
810 18.1 14,600 5.49d
3
/d = 11 3.33d
3
/d = 13
(15-to-1)
9,3,3
10
4
9.3 × 10
10
1,150 18.1 20,700 4.71d
3
/d = 13 3.07d
3
/d = 15
(15-to-1)
11,5,5
10
4
1.9 × 10
11
2,070 30.0 62,000 9.19d
3
/d = 15 6.31d
3
/d = 17
(15-to-1)
4
9,3,3
× (20-to-4)
15,7,9
10
4
2.4 × 10
15
16,400 90.3 371,000 27.0d
3
/d = 19 20.0d
3
/d = 21
(15-to-1)
4
9,3,3
× (15-to-1)
25,9,9
10
4
6.3 × 10
25
18,600 67.8 1,260,000 25.9d
3
/d = 29 21.2d
3
/d = 31
(15-to-1)
17,7,7
10
3
4.5 × 10
8
4,620 42.6 197,000 6.30d
3
/d = 25 4.04d
3
/d = 29
(15-to-1)
6
13,5,5
× (20-to-4)
23,11,13
10
3
1.4 × 10
10
43,300 130 1,410,000 28.9d
3
/d = 29 19.6d
3
/d = 33
(15-to-1)
4
13,5,5
× (20-to-4)
27,13,15
10
3
2.6 × 10
11
46,800 157 1,840,000 30.9d
3
/d = 31 21.5d
3
/d = 35
(15-to-1)
6
11,5,5
× (15-to-1)
25,11,11
10
3
2.7 × 10
12
30,700 82.5 2,540,000 35.3d
3
/d = 33 25.0d
3
/d = 37
(15-to-1)
6
13,5,5
× (15-to-1)
29,11,13
10
3
3.3 × 10
14
39,100 97.5 3,810,000 37.6d
3
/d = 37 27.7d
3
/d = 41
(15-to-1)
6
17,7,7
× (15-to-1)
41,17,17
10
3
4.5 × 10
20
73,400 128 9,370,000 39.8d
3
/d = 49 31.5d
3
/d = 53
Small-footprint and synthillation protocols
(15-to-1)
9,3,3
10
4
1.5 × 10
9
762 36.2 27,600 6.27d
3
/d = 13 4.08d
3
/d = 15
(15-to-1)
9,5,5
× (15-to-1)
21,9,11
10
3
6.1 × 10
10
7,780 469 3,650,000 74.7d
3
/d = 29 50.7d
3
/d = 33
(15-to-1)
4
7,3,3
× (8-to-CCZ)
15,7,9
10
4
7.2 × 10
14
12,400 36.1 447,000 32.6d
3
/d = 19 24.1d
3
/d = 21
(15-to-1)
6
13,7,7
× (8-to-CCZ)
25,15,15
10
3
5.2 × 10
11
47,000 60.0 2,820,000 47.4d
3
/d = 31 32.9d
3
/d = 35
Historical numbers
(15-to-1) in Ref. [28] 10
4
3.5 × 10
11
3,720 143 532,000 121d
3
(d = 13)
(15-to-1) × (8-to-2) in Ref. [21] 10
3
2.7 × 10
11
148,000 202 14,900,000 251d
3
(d = 31)
(15-to-1) × (8-to-CCZ) in Ref. [21] 10
3
5.3 × 10
11
134,000 171 22,800,000 473d
3
(d = 31)
(15-to-1) × (15-to-1) in Ref. [8] 10
3
1.0 × 10
14
177,000 202 35,700,000 599d
3
(d = 31)
(15-to-1) × (15-to-1) in Ref. [5] 10
3
3.0 × 10
15
800,000 250 200,000,000 2544d
3
(d = 34)
Table 1: Comparison of different distillation protocols with respect to the following characteristics: physical error rate p
phys
, output
error probability per output state p
out
, space cost in qubits, time cost in surface-code cycles, and space-time cost in qubitcycles.
The last two columns report the space-time cost in (physical data qubits) × (code cycles) measured in units of the full distance
d
3
, where d is the distance required for the data qubits of a 100-qubit (left column) or 10,000-qubit (right column) computation
with at most 1/p
out
T gates. The subscripts and superscripts in the protocol description indicate the code distances and number
of level-1 distillation blocks used in the protocol, as explained in Secs. 3-6.
magic states [28]. The distance required for the storage
of the data qubits depends on n
q
and n
T
, as it needs to
be high enough to guarantee that the probability of an
error on any of the n
q
qubits during the entire n
T
-T gate
computation is sufficiently low. In other words, this
distance is governed by the space-time volume of the
computation n
q
·n
T
, as weight-d/2 error strings in this
space-time volume can potentially corrupt the output of
the computation. We will refer to this distance as the
full distance. The code distances used for distillation, on
the other hand, are completely irrelevant. The protocol
merely needs to produce magic states with an output
error probability that is lower than 1/n
T
, for which it
uses a certain number of qubits for a certain number of
code cycles. Since the full distance depends on n
q
, but
the space-time cost of a distillation protocol does not,
it is more meaningful to quantify the space-time cost in
terms of qubitcycles, i.e., qubits · cycles.
Results. Table 1 shows the space-time costs of the
protocols that are constructed in the following sections.
These protocols generate states with different output
error probabilities p
out
, assuming physical circuit-level
error rates p
phys
of 10
3
and 10
4
. The more T gates
need to be executed, the lower p
out
needs to be. Each
protocol is characterized by the space cost in terms of
physical qubits (including ancilla qubits) and the time
cost in terms of code cycles, where a code cycle cor-
responds to measuring all surface-code check operators
exactly once. These numbers can be multiplied to ob-
tain the space-time cost in qubitcycles. This is a mean-
ingful figure of merit that should be minimized. It is
more meaningful than only the space cost or only the
time cost, since distillation protocols can be straight-
forwardly parallelized, using twice as many qubits to
distill states twice as fast.
Even though this is not necessarily a meaningful
quantity, we report the space-time cost in terms of the
full distance d for two different choices of d in the last
two columns of Tab. 1. While the smallest classically
intractable quantum computations require 100 qubits,
Accepted in Quantum 2019-10-30, click title to verify 2
Figure 2: A sequence of 16 π/8 rotations on 5 qubits that is non-trivially equivalent to the identity.
more complicated quantum algorithms use thousands of
qubits, such as factoring 2048-bit numbers using Shor’s
algorithm. The lower and higher values of d are cho-
sen such that they are sufficient for a 100-qubit and
10,000-qubit computation with at most 1/p
out
T gates,
respectively. The reported costs in terms of the full
distance are in terms of (physical data qubits)×(code
cycles), i.e., they do not consider physical measurement
ancillas and are therefore smaller by a factor of 2. This
is done to more easily compare the numbers to the cost
of storing a d × d surface-code patch for d code cycles,
which is 1d
3
in terms of (physical data qubits)×(code
cycles).
How to interpret the cost. Table 1 shows pro-
tocols that generate one magic state, 4 magic states,
or one |CCZi state that can be used to execute a Tof-
foli gate. For protocols that generate multiple magic
states, the space-time cost and output error are per
magic state. Our protocols feature order-of-magnitude
overhead reductions compared to the previous state of
the art for all parameter regimes. One example is the
(15-to-1)
9,3,3
protocol, where the subscripts label the
code distances used in the protocol, as explained in
Sec. 3. For p
phys
= 10
4
, it generates magic states with
p
out
= 9.3 × 10
10
, sufficiently low for classically in-
tractable 100-qubit computations with 10
8
T gates. In
a quantum computer that can execute one T gate every
d code cycles, 231 logical qubits at d = 13 would be used
to store the 100 qubits with a low error rate [28], taking
into account the routing overhead. A space-time cost of
4.71d
3
for distillation implies that a footprint equivalent
to 4.71 full-distance qubits would be able to distill one
magic state every d code cycles. In this example, 2%
of the approximately 80,000 physical qubits are used for
distillation. The numbers become even more extreme
for the example of a 10,000-qubit computation with 10
8
T gates. Here, the 10,000 data qubits are stored using
20,000 logical qubits with d = 15, which means that
the space-time cost of distillation is 3.07d
3
per magic
state. For a quantum computation on more qubits or
with a lower overall error probability, distance-17 data
qubits might be required, reducing the cost to 2.11d
3
. In
this example, the cost to distill a magic state would be
lower than the space-time cost of a full-distance logical
CNOT gate, which is 3d
3
per qubit [6], demonstrating
that the cost of magic state distillation is not very high,
and that space-time costs that are quantified in units
of d
3
are of limited usefulness. These numbers are ad-
mittedly a bit contrived, but even in the more realistic
case of a 100-qubit computation with p
phys
= 10
3
and
p
out
10
10
, only 10% of all physical qubits are used
for distillation.
The main message is that magic state distillation is
not the dominant cost in a surface-code-based quantum
computer. Rather, the large overhead of surface codes
is due to their low encoding rate, which implies that a
large number of qubits is required to simply store all
data qubits of the computation.
Overview. In the following sections, we discuss how
the protocols in Tab. 1 are constructed. We start in
Sec. 1 by reviewing how distillation circuits work and
how their performance is quantified. Distillation pro-
tocols require faulty T gates on the level of logical
qubits, which are usually performed via state injection
and measurement. In Sec. 2, we introduce additional
protocols for faulty logical T gates based on shrinking
patches and faulty T measurements, which avoid Clif-
ford corrections and use fewer qubits and cycles. Next,
in Sec. 3, we go through the construction of the low-cost
15-to-1 protocol. In Sec. 4, we construct two-level pro-
tocols, where 15-to-1-distillation output states are fed
into a second level of 15-to-1 or 20-to-4. In Sec. 5, we
discuss synthillation protocols, i.e., the distillation of re-
source states that perform entire layers of π/8 rotations.
Specifically, we show the example of |CCZi state dis-
tillation, which can replace four T -gate magic states for
the execution of a controlled-controlled-Z gate. For the
protocols in Tab. 1, the distillation costs of CCZ states
are lower than the cost of four T -gate magic states with
a similar p
out
, indicating that synthillation can lower
Accepted in Quantum 2019-10-30, click title to verify 3
5 6 7 8 9 10 11 12 13 14 151-4
Figure 3: 15-to-1 distillation circuit.
the cost compared to the distillation of T -gate magic
states. Finally, in Sec. 6, we discuss how protocols with
a higher space-time cost, but smaller qubit footprint can
be constructed. The examples shown in Tab. 1 reduce
the error rate from 10
3
or 10
4
to 10
9
, but use only
as few as 762 or 7,780 physical qubits.
1 Distillation circuits
Magic state distillation protocols can be understood in
terms of quantum error-correcting codes with transver-
sal T gates [9, 11], but it is conceptually simpler to
explain them in terms of circuits [17]. When writing
quantum circuits as sequences of Pauli product rota-
tions P
ϕ
= e
iP ϕ
, specifically π/8 rotations P
π/8
, cer-
tain sequences are equivalent to the identity. While
some of these sequences are trivial, e.g., P
π/8
followed
by P
π/8
, there also exist non-trivial sequences. One
such sequence of 16 rotations on 5 qubits is shown in
Fig. 2. In general, such sequences are described by tri-
orthogonal matrices [11, 17]. The equivalent concept
of phase-polynomial identities is used in the context of
circuit optimization [29].
If we multiply the circuit in Fig. 2 by a single-qubit
rotation Z
π/8
on the first qubit, the first rotation will
be cancelled and the remaining circuit will consist of
15 rotations, as in Fig. 3. Since the 16-rotation cir-
cuit is equivalent to the identity, the 15-rotation cir-
cuit is equivalent to a single Z
π/8
rotation on the first
qubit. In other words, if the initial state is |+i
5
, where
|+i = (|0i+ |1i)/
2, then the circuit prepares the state
|emi|+i
4
. Here, |emi = (|0i+e
/4
|1i)/
2 is a state
that can be used to perform π/8 rotations in the same
way as |mi, but the outcome of the P Z measurement
in Fig. 1 needs to be interpreted differently, i.e., this
state is a magic state.
Because all rotations in Fig. 3 act non-trivially on
qubits 2-5, these qubits can be used to detect errors.
If the circuit is executed without errors, qubits 2-5 are
initialized in the |+i state and returned to the |+i state,
i.e., have an outcome of +1 upon X measurement. Er-
rors are detected, if any of these measurement outcomes
are 1, in which case the protocol fails and the state is
discarded.
The 15-to-1 protocol [9] is sometimes characterized
as having an output error probability of 35p
3
. This
assumes that every P
π/8
rotation generates a Pauli er-
ror P = P
π/2
with a probability of p. Since these are
Z-type Pauli errors, they will flip all X measurement
outcomes of the qubits that they act on. Therefore,
any one faulty P
π/8
gate can be detected. Furthermore,
there is no combination of two faulty gates that can
go undetected. However, some combinations of three
faulty gates, e.g., rotations 5, 11 and 14, will cause a
Z Pauli error on the output state, but will not trigger
any flipped X measurement outcomes. Since there are
35 such combinations, the probability to generate an
undetected error is 35p
3
to leading order.
To compute the subleading corrections to the output
error, this process can be simulated numerically. Start-
ing with the initial state ρ
init
= |+ih+|
5
, each of the
15 rotations is applied by mapping
ρ (1 p) · P
π/8
ρP
π/8
+ p · P
5π/8
ρP
5π/8
. (1)
The output state is determined by projecting into the
subspace with the correct measurement outcomes using
the projectors Π
X
= ( + X)/2, i.e.,
ρ
out
=
1
1 p
fail
( Π
4
X
)ρ( Π
4
X
) , (2)
where
p
fail
= 1 tr
( Π
4
X
)ρ
(3)
is the failure probability of the protocol. The output
error probability is computed by comparing the ideal
output state ρ
ideal
= |emihem| |+ih+|
4
to the actual
output state ρ
out
. This is done by computing the infi-
Accepted in Quantum 2019-10-30, click title to verify 4
4 5 6
7
8
9 11 12 13 14 15 16 17 18 19 20
1-3
10
Figure 4: 20-to-4 distillation circuit.
delity
p
out
= 1 F (ρ
ideal
, ρ
out
)
= 1 tr
q
ρ
out
ρ
ideal
ρ
out
2
= 1 tr (ρ
ideal
ρ
out
) ,
(4)
where the last equality holds, because ρ
ideal
is a pure
state. The infidelity corresponds to the probability that
a faulty magic state that is used to perform a gate
in a quantum circuit will lead to an error of this cir-
cuit’s outcome [30]. Notably, in the examples that we
consider, the trace distance tr
p
(ρ
ideal
ρ
out
)
2
/2 yields
identical or at least similar results. For the example
of p = 10
4
, the approximate output error probabil-
ity is 35p
3
= 3.5 × 10
11
, whereas the exact result is
p
out
= 3.501 × 10
11
.
Random Pauli errors. If faulty P
π/8
rotations are
performed by preparing faulty magic states and using
the circuit in Fig. 1, then the output error depends
on the error model for the preparation of faulty magic
states. In particular, if the faulty magic state is affected
by a random Pauli error with probability p, i.e., by an
X, Y or Z error with probabilities p/3, respectively,
then this translates into a probability of p/3 of per-
forming either a P
π/8
, P
3π/8
or P
5π/8
rotation instead
of a P
π/8
rotation. In other words, after each rotation,
the state is mapped to
ρ (1 p) · P
π/8
ρP
π/8
+
p
3
· P
π/8
ρP
π/8
+
p
3
· P
3π/8
ρP
3π/8
+
p
3
· P
5π/8
ρP
5π/8
,
(5)
so there is a p/3 probability of either a P
π/4
, P
π/4
or
P
π/2
error. These first two errors are more forgiving
than a proper P
π/2
Pauli error, since they effectively
only lead to a Pauli error with 50% probability. As a
consequence, we expect each of the 35 combinations of
three faulty rotations to contribute to the output error
with (8/27)p
3
instead of 1p
3
: Out of the 27 combina-
tions in {P
π/4
, P
π/4
, P
π/2
}
3
, there is one combination
with three P
π/2
’s, which leads to an undetected error.
There are 6 combinations with two P
π/2
’s leading to an
error with a 50% probability, 20 combinations with one
P
π/2
leading to an error with a 25% probability, and 8
combinations with no P
π/2
’s, leading to an error with a
12.5% probability. Therefore, the output error should
be p
out
= 35 ·
8
27
p
3
10.3704p
3
to leading order. In-
deed, a numerical treatment of the full density matrix
for p = 10
4
yields p
out
= 1.03724 × 10
11
.
Coherent errors. The previous two error models
randomly applied Pauli errors with a certain probabil-
ity. One might object that, for physical qubits, this is
not necessarily a realistic error model. A more realis-
tic error model would take coherent errors into account,
such as systematic under- and over-rotation. Distilla-
tion circuits can also detect these errors, but their per-
formance is indeed worse than for incoherent errors. For
example, consider the map
ρ P
π/8+ϕ
ρP
π/8+ϕ
, (6)
which systematically over-rotates each gate by an ex-
cess angle ϕ. A gate that over-rotates by an an-
gle ϕ = arcsin(1/100) has the same gate fidelity as
a gate that applies a Z error with a probability of
10
4
. However, the infidelity of the output magic state
p
out
= 1.22 × 10
9
is higher by almost two orders of
magnitude compared to the incoherent case. In our
resource analysis, we will be working with incoherent
circuit-level Pauli noise, applying errors according to
Eq. (5), but with three different probabilities for the
three different errors. Still, we comment on how coher-
ent errors might affect the output error in Sec. 4.
20-to-4 distillation. Distillation protocols can out-
put more than one magic state. If the 16-rotation circuit
in Fig. 2 is multiplied by two Z
π/8
rotations, one on the
first and one on the second qubit, a 14-rotation circuit is
Accepted in Quantum 2019-10-30, click title to verify 5
obtained that outputs a |emi|emi⊗|+i
3
state, i.e., two
magic states. Similarly, using a 24-rotation circuit that
non-trivially corresponds to the identity, a 20-rotation
circuit that outputs a |emi
4
|+i
3
state can be ob-
tained. This is the 20-to-4 protocol [11] shown in Fig. 4.
With a Z-Pauli error model, there are 22 pairs of rota-
tions that can lead to an output error. Therefore, the
probability of an output error is 22p
2
to leading order.
However, since four states are produced, one should in-
terpret this as p
out
= 5.5p
2
per magic state. In other
words, the probability that the resource state |emi
4
will
cause an error in a circuit is 22p
2
, but, since this re-
source state executes four π/8 rotations, this translates
into a 5.5p
2
error probability per gate. In a numerical
simulation, the output error per state is determined via
the infidelity between the projected output state and
the ideal output state |emihem|
4
|+ih+|
3
divided by
four. For p = 10
4
, this yields p
out
= 5.505 × 10
8
per
output state.
2 Faulty logical T gates
We use the notation of Ref. [28] to draw arrangements of
logical surface-code qubits, where patches with dashed
and solid edges represent d×d surface-code patches with
X and Z boundaries. Logical operations are performed
by measuring products of logical Pauli operators via
lattice surgery [68]. A naive layout for the 15-to-1
protocol is shown in Fig. 5a, where the five qubits of the
15-to-1 circuit are placed next to each other with their
Z boundaries facing up and down. A 5d × d ancillary
space above and below these five qubits can be used to
measure Pauli product operators between these qubits
to perform π/8 rotations.
The code distance determines the logical error rate of
the encoded qubits, which also depends on the under-
lying error model. Here, we consider circuit-level noise,
where each physical gate, state initialization and mea-
surement outcome is affected by a Pauli error with prob-
ability p
phys
. Using a minimum-weight perfect match-
ing decoder for such a noise model, the logical error rate
per code cycle [8] can be approximated as
p
L
(p
phys
, d) = 0.1(100p
phys
)
(d+1)/2
. (7)
Since a failure to decode X or Z syndromes correctly
leads to logical Z or X errors, respectively, we will as-
sume that logical X and Z errors each occur with a
probability of 0.5p
L
(p
phys
, d) per code cycle.
Not all errors are equally harmful in the context of
distillation protocols. Consider X and Z errors that af-
fect one of the five qubits during the 15-to-1 protocol.
Z errors affecting the first qubit (i.e., the output qubit)
are always detrimental, since they cannot be detected
ancilla
ancilla
ancilla
ancilla
(b)(a)
Figure 5: A naive arrangement (a) of logical qubits could con-
sist of five d × d patches initialized in the |+i state and two
additional 5d × d ancilla regions for Pauli product measure-
ments. The arrangement that we consider (b) consists of one
d
X
×d
X
patch, four d
Z
×d
X
patches, and two ancilla regions
with a width d
X
for Pauli product measurements.
and contribute to the overall output error of the proto-
col. The effect of X errors on any of the five qubits is to
turn all previous P
π/8
rotations that acted on this qubit
into P
π/8
rotations. For instance, consider an X error
on the third qubit after rotation 7 in Fig. 3. This X er-
ror can be commuted to the beginning of the circuit and
absorbed into the initial |+i state. The commutation
turns rotations 2, 5 and 6 into π/8 rotations, since X
and Z anti-commute. As errors on multiple rotations
can lead to undetected errors, X errors should also be
avoided.
Z errors on qubits 2-5, on the other hand, are less
damaging. They are detectable, as they have the same
effect as Z errors that affect rotations 1-4. Therefore,
it is not necessary to encode the logical Z operators of
qubits 2-5 with the same distance as their logical X op-
erators. Instead, we encode these qubits using rectan-
gular d
X
×d
Z
patches with d
Z
d
X
. Their probability
of X errors is 0.5(d
Z
/d
X
) ·p
L
(p
phys
, d
X
) per code cycle,
since the X distance is d
X
, but the number of possible
X error strings is lower by a factor of (d
Z
/d
X
) compared
to a square patch. Correspondingly, the probability of Z
errors is 0.5(d
X
/d
Z
)·p
L
(p
phys
, d
Z
), since the Z distance
is d
Z
, but the number of Z error strings is higher by a
factor of (d
X
/d
Z
) compared to a square patch. We also
fix the distance used in the ancillary region to d
X
. Fi-
nally, there is a third distance d
m
which determines the
number of code cycles used in lattice surgery. This af-
fects the error of the Pauli product measurements used
for logical gates, which can be detected by the distilla-
tion protocol.
In total, we end up with the arrangement shown in
Fig. 5b that is characterized by three code distances
d
X
, d
Z
and d
m
, where d
X
and d
Z
are spatial distances,
and d
m
is the temporal distance. Before we construct
a surface-code implementation of the 15-to-1 protocol,
we first discuss two different ways of performing faulty
logical π/8 rotations with surface codes: the traditional
method based on state injection, and a protocol based
Accepted in Quantum 2019-10-30, click title to verify 6
(a)
(b)
Figure 6: A faulty T gate performed via state injection (a) and
a faulty T measurement (b).
on faulty T measurements.
2.1 State injection
The standard method to perform faulty T gates with
topological codes is via state injection and measure-
ment. State injection is a protocol that prepares an
arbitrary logical state |ψ
L
i from a corresponding arbi-
trary physical state |ψi. Several such state-injection
protocols exist [5, 6, 3134], but none of them are fault-
tolerant, i.e., the error probability of |ψ
L
i is always pro-
portional to p
phys
. The simplest protocol [6] starts with
a physical state |ψi, i.e., a 1 × 1 surface-code patch,
and then grows it into a d × 1 patch, and finally into
a d × d patch. This is not a very efficient protocol,
since growing patches involves measuring stabilizers for
d code cycles. The qubit, therefore, spends many cycles
in a distance-1 state, which increases the error proba-
bility.
More sophisticated state-injection protocols use post-
selection [32, 34] to decrease the error. If the er-
ror rate of single-qubit operations is significantly lower
than the error rate of two-qubit gates, the error due
to state injection can even be lower than p
phys
. In
circuit-level noise, a single number p
phys
characterizes
all gates. However, physical systems typically fea-
ture significantly better single-qubit operations than
two-qubit gates. In state-of-the-art superconducting-
qubit [35] and ion-trap [36] architectures, for instance,
the fidelities of single-qubit and two-qubit gates differ
by up to almost two orders of magnitude. Since two-
qubit gates are typically the lowest-fidelity operations,
and syndrome-readout circuits of surface codes mostly
consist of two-qubit gates, the characteristic error rate
p
phys
in circuit-level noise will be largely determined by
the error rate of two-qubit gates. If the two-qubit error
rate is p
phys
, but the single-qubit error rate is p
phys
/10,
state injection can produce magic states with an error
as low as
13
30
p
phys
[34] in just two code cycles. However,
there is a certain failure rate of the protocol due to post-
selection, which increases the length of the protocol.
While state injection can be used to prepare faulty
Figure 7: A faulty P
π/8
rotation corresponds to a P Z mea-
surement involving a |+i ancilla, followed by a faulty T mea-
surement of the ancilla.
magic states, it cannot be used to directly execute P
π/8
rotations. Instead, state injection is used indirectly by
preparing a faulty magic state and measuring P Z
via lattice surgery, as shown in Fig. 1. With a 50%
probability, a P
π/4
correction is required. Performing
this correction operation either requires extra time or
extra space. In any case, this Clifford correction has an
effect on the distillation protocol and, therefore, needs
to be performed, increasing the space-time cost of the
protocol. For this reason, we will avoid state injection,
and instead construct a protocol that executes faulty
P
π/8
rotations without the need for Clifford corrections.
2.2 Faulty T measurements
When a T gate is performed on a qubit |ψi via state in-
jection, a faulty magic state is prepared, entangled with
|ψi, and measured, as shown in Fig. 6a. The faulty
preparation can be treated as a T gate applied on a
|+i state followed by a random X, Y or Z Pauli error.
The idea of faulty T measurements is to avoid the Clif-
ford correction by reversing the order of the entangling
operation and the faulty T gate, as shown in Fig. 6b.
Here, |ψi is first entangled with a |0i qubit. Next, a
sequence of a random Pauli error, a T gate and an X
measurement is performed, which we refer to as a faulty
T measurement. Now, the correction operation in re-
sponse to the X measurement is no longer a Clifford
gate, but a Pauli Z operation, which requires no addi-
tional hardware operations. X, Y and Z errors lead to
S
, S and Z errors on |ψi, respectively. Thus, a P
π/8
rotation can be performed by measuring P Z involving
an ancilla qubit initialized in the |+i state, followed by
a faulty T measurement of the ancilla qubit, as shown
in Fig. 7.
With surface codes, protocols for faulty T measure-
ments are exactly identical to protocols for state injec-
tion, except that the order of operations is reversed.
Here, we describe a simplified protocol to demonstrate
the working principle of faulty T measurements. Sim-
ilarly to the case of state injection, one can construct
significantly more sophisticated protocols, as we discuss
in Appendix A.
One particularly simple state-injection protocol is
Accepted in Quantum 2019-10-30, click title to verify 7
Figure 8: Surface-code implementation of a faulty T measure-
ment. Bright and dark faces correspond to Z-type and X-type
stabilizers, respectively.
performed by growing a physical qubit (a 1 × 1 patch)
into a d × 1 patch and then into a d × d patch [6].
The corresponding faulty-T -measurement protocol can
be performed by shrinking patches. Suppose that a log-
ical qubit |ψ
L
i = α |0
L
i + β |1
L
i is encoded in a d × d
patch as in Fig. 8, with logical Z operators as strings
from left to right. This d × d patch can be shrunk to
a d × 1 patch by measuring all green qubits in the X
basis. The remaining d × 1 patch encodes the qubit in
a d-qubit XX repetition code
|ψ
L
i =
α
2
(|+i
d
+|−i
d
)+
β
2
(|+i
d
|−i
d
) , (8)
where the logical Z operator is Z
d
, and the logical X
operator corresponds to the X operator on any of the
d qubits. Next, the d × 1 patch is shrunk to a 1 × 1
patch by measuring all red qubits in the Z basis. In
fact, the red and green measurements can be performed
simultaneously. The product of all Z measurement out-
comes (and also preceding stabilizer measurements) de-
termines an X Pauli correction on the remaining qubit,
which now stores the logical information in its physical
Pauli operators. Finally, a physical T gate is applied
to the remaining qubit, before it is measured in the X
basis.
Much like state injection, faulty T measurements are
not fault-tolerant protocols, in the sense that their er-
ror rate is proportional to the physical error rate and
does not decrease with the code distance. For a Pauli
error model, these error rates can be understood as the
probabilities of the Pauli-error operations in the dashed
boxes in Fig. 6. For simplicity, we will assume that
faulty T measurements have a Pauli error rate of p
phys
,
meaning that, effectively, the blue qubit is affected by
an X, Y or Z error with a probability of p
phys
/3 for
each Pauli. When used to execute a P
π/8
rotation, this
implies that this gate will have a P
π/4
, P
π/4
or P
π/2
error with a probability of p
phys
/3 for each error. This
assumption is actually very inaccurate for the protocol
(1)
(2) (3)
Step 2: d
m
code cycles
Step 1: 0 code cycles
Step 3: d
m
code cycles
Figure 9: Example of a faulty T measurement to perform a
(Z
1
Z
4
Z
5
)
π/8
rotation.
shown in Fig. 8, since, for this protocol, the error scales
with the code distance d, as any single-qubit X or Y er-
ror on the red qubits translates into a logical error in the
Accepted in Quantum 2019-10-30, click title to verify 8
Figure 10: Shrinking the |+i patch in the top right corner to a d
m
×1 patch produces a stabilizer configuration that is topologically
equivalent to the one shown in Fig. 9. This does not reduce the code distance compared to Fig. 9,
faulty T measurement. This can be avoided by changing
the measurement pattern, similar to the state-injection
protocols of Refs. [31, 33]. Moreover, the error rate can
be significantly suppressed by adding “pre-selection” to
the faulty-T -measurement protocol, the analogous op-
eration to post-selection in the state-injection protocol
of Ref. [32], wherein a faulty T measurement is deferred
until the stabilizer checks surrounding the sensitive blue
qubit report no syndromes for two code cycles. Because
such a faulty-T -measurement protocol is identical to the
state-injection protocol of Ref. [32], apart from the re-
versed order of operations, its error rate can be lower
than p
phys
, if single-qubit operations have a significantly
higher fidelity than two-qubit operations.
In any case, the specific choice of faulty-T -
measurement protocol (or state-injection protocol) will
not matter for the distillation protocols that we con-
struct in the following sections, which is why the more
sophisticated T -measurement protocols are discussed in
Appendix A. Moreover, we show in Appendix A that,
for many relevant distillation protocols, a higher error
rate for faulty T measurements has only a small effect
on the performance of the distillation protocols, even if
the error rate of faulty T measurements is assumed to
be higher by an order of magnitude, 10p
phys
instead of
p
phys
. In our following resource estimates, we will use
the simplified assumption of an error rate of p
phys
for
faulty T measurements.
We now show how to use faulty T measurements to
perform P
π/8
rotations in a surface-code arrangement
similar to Fig. 5b. Suppose that we want to execute
a P
π/8
= (Z
1
Z
4
Z
5
)
π/8
rotation on three qubits,
i.e., rotation 9 in Fig. 3. In the first step in Fig. 9,
we start with the five qubits of the 15-to-1 protocol,
(d
X
+ 4d
Z
) × d
X
unused qubits in the ancillary region
and an additional d
m
×d
m
unused qubits in the top right
corner. In this example, d
X
= 5, d
Z
= 3 and d
m
=
3. Next, following the circuit in Fig. 7, we initialize
a d
m
× d
m
patch in the |+i state and use multi-patch
lattice surgery [8, 28] to measure Z
1
Z
4
Z
5
Z
a
, where
Z
i
is the Z operator of qubit i and Z
a
is the Z operator
of the ancilla in the top right corner. The multi-patch
lattice-surgery operation [8] is performed by initializing
the physical qubits in the ancilla region in the |+i state
and measuring the new stabilizers for d
m
code cycles,
after which the qubits in the ancilla region are measured
in the X basis. The outcome of the Z
1
Z
4
Z
5
Z
a
measurement is encoded in the measurement outcomes
of the Z stabilizers in the ancilla region. Finally, in the
last step, a faulty T measurement is performed on the
d
m
× d
m
patch.
Next, we explain how we perform the error esti-
mate for this protocol. For a protocol that performs
a P
π/8
rotation, we consider Z and X storage errors on
qubits 1-5 and P
π/2
, P
π/4
and P
π/4
errors in the ap-
plication of the rotation. As explained previously, the
incorrect decoding of X and Z syndromes on qubits
1-5 leads to Z and X storage errors. These occur
with a probability of 0.5(d
X
/d
H
) · p
L
(p
phys
, d
H
) and
0.5(d
H
/d
X
)·p
L
(p
phys
, d
X
), respectively, where d
H
= d
X
for qubit 1, and d
H
= d
Z
for qubits 2-5. The incorrect
decoding of the X syndrome in the ancilla region causes
Z error strings connecting the X boundaries in the an-
cilla region. Depending on the exact location of these
error strings, this can cause Z errors on one or multiple
qubits. For instance, the error string highlighted in red
and labeled ‘(2)’ in Fig. 9 is topologically equivalent to
error string ‘(1)’, and therefore causes a Z error on qubit
1. Error string ‘(3)’, on the other hand, is less harmful,
since it is equivalent to a Z
1
Z
4
error affecting qubits
1 and 4. Still, as a simplified pessimistic estimate, we
will assume that any such error directly contributes to
the Z error of the output qubit, i.e., qubit 1. This hap-
pens with a probability of 0.5(l/d
X
) ·p
L
(p
phys
, d
X
) ·d
m
,
where l is the length of the ancilla patch. In our exam-
ple, l = d
X
+ 4d
Z
.
Accepted in Quantum 2019-10-30, click title to verify 9
Step 1: d
m
cycles
Arrangement
Step 2: 2d
m
cycles Step 3: 3d
m
cycles
Step 7: 6d
m
cyclesStep 6: 6d
m
cyclesStep 5: 5d
m
cyclesStep 4: 4d
m
cycles
out
out out out
Figure 11: Surface-code implementation of the (15-to-1)
d
X
,d
Z
,d
m
protocol using 2 · (d
X
+ 4d
Z
) · 3d
X
+ 4d
m
physical qubits for
6d
m
code cycles.
The incorrect decoding of the Z syndrome in the an-
cilla region causes X error strings. While there are no
Z boundaries in the ancilla region for the X errors to
connect to, in a space-time picture, these errors can
also condense at the temporal boundaries set by the ini-
tializations of the physical qubits in the |+i state and
their measurement in the X basis. Because the lattice-
surgery stabilizers are measured for d
m
code cycles,
the probability of this error is governed by d
m
. Since
0.5p
L
(p
phys
, d
m
) · d
m
is the X error rate of a d
m
× d
m
patch stored for d
m
code cycles, but the lattice surgery
takes place over an area of l×d
X
, we estimate the prob-
ability of such an error as 0.5l·d
X
/d
2
m
·p
L
(p
phys
, d
m
)·d
m
.
Such an error leads to an incorrect interpretation of
the lattice-surgery measurement outcome. As shown in
Fig. 7, incorrectly interpreting the outcome of the P Z
measurement causes an X error on the |+i qubit, which
turns the Z
π/8
rotation into a Z
π/8
rotation, causing
an on overall P
π/4
error.
Z and X storage errors affecting the |+i qubit in
the top right corner each occur with a probability of
0.5p
L
(p
phys
, d
m
) · d
m
and cause P
π/2
and P
π/4
errors,
respectively, as explained by inserting X and Z errors
on the |+i qubit in Fig. 7. Finally, the single-qubit X,
Y and Z errors with a probability of p
phys
/3 during the
faulty T measurement of Fig. 8 contribute to the P
π/4
,
P
π/4
and P
π/2
error, respectively.
In order to save qubits, we can shrink the d
m
× d
m
patch in the top right corner to a d
m
×1 patch. As shown
in Fig. 10, this produces a stabilizer configuration that
is topologically equivalent to the configuration of Fig. 9
and maintains the code distance. In principle, the Z
storage error on the |+i qubit is now reduced, but we
will still use the error estimate explained in the previous
paragraph.
To summarize our error estimate, in addition to the
usual storage errors, a Z error affects the output qubit
with a probability of 0.5(l/d
X
) · p
L
(p
phys
, d
X
) · d
m
.
P
π/2
, P
π/4
and P
π/4
errors each occur with a prob-
ability of p
phys
/3. Furthermore, a P
π/4
error occurs
with a probability of 0.5l · d
X
/d
2
m
· p
L
(p
phys
, d
m
) · d
m
.
Moreover, additional P
π/4
and P
π/2
errors occur with
a probability of 0.5p
L
(p
phys
, d
m
) · d
m
. Since we are not
running an actual decoder to obtain these error prob-
abilities, our error estimate should not be interpreted
as a rigorous simulation, but as a pessimistic ballpark
estimate.
3 15-to-1 distillation
Our implementation of the 15-to-1 protocol of Fig. 3 is
shown in Fig. 11. It starts by initializing qubits 2-4 in
the |+i state. In the first step, rotations 1-3 and 5 are
performed. The single-qubit rotations are performed
by initializing a d
Z
× d
m
surface-code patch in the |+i
state, measuring Z Z, and performing faulty T mea-
surement of the d
Z
× d
m
patch. Multi-qubit rotations
are performed via fast faulty T measurements. In step
2, qubit 1 is initialized in the |+i state and rotations 6
and 7 are performed. In step 3, qubit 5 is initialized in
the |+i state and rotations 4, 8 and 9 are performed.
In steps 4-6, rotations 10-15 are performed. Finally,
in step 7, qubits 2-5 are measured in the X basis. If
all measurement outcomes are +1, qubit 1 is a distilled
Accepted in Quantum 2019-10-30, click title to verify 10
Figure 12: Quantum circuit for delayed-choice P
π/8
rotations.
The choice of measurement basis for the single-qubit |+i mea-
surement decides whether a P
π/8
rotation is performed, or no
operation at all.
magic state which can be used to execute a low-error
P
π/8
rotation. The distillation block can now be used
to distill the next magic state.
In order to prevent the output state from blocking
the space reserved for qubit 1 for d
X
code cycles, the
consumption of the output state for the execution of a
P
π/8
rotation can already be initiated in step 5, since
this process takes d
X
code cycles. In the protocol shown
in Fig. 11, steps 5-7 and step 1 of the subsequent distil-
lation can be used to measure qubit 1, i.e., a total of 3d
m
code cycles. If d
X
> 3d
m
, the output state will block
the space reserved for qubit 1 and slow down the proto-
col. This can be prevented by reordering the rotations.
In any case, we only consider cases with d
X
3d
m
.
One may be concerned that, when consuming the out-
put state in step 5, we still do not know if the distilla-
tion protocol will succeed, i.e., whether the output state
is faulty or not. This problem can be solved by using
the circuit in Fig. 12, where an additional ancilla qubit
initialized in the |+i state is used. To execute a P
π/8
rotation, the operator P Z Z between the qubits,
the magic state and the additional ancilla state is mea-
sured. Depending on whether the |+i state is measured
in the Z or X basis, the P
π/8
rotation is applied or not.
This can be used to consume a magic state before the
outcome of its distillation is known. If the distillation
protocol for this magic state fails, the |+i state can be
measured in the X basis to prevent the faulty magic
state from being used.
In total, this 15-to-1 protocol has a space cost of
2 ·(d
X
+ 4d
Z
) ·3d
X
+ 4d
m
physical qubits, taking phys-
ical measurement ancillas into account. The time cost
is 6d
m
/(1 p
fail
) code cycles, where p
fail
is the failure
probability of the protocol. As discussed in Sec. 1, p
fail
and the output error p
out
are determined numerically.
To this end, the 5-qubit density matrix is simulated,
taking into account errors from storage and faulty T
measurements. All P
π/8
rotations have a P
π/2
, P
π/4
and P
π/4
error of p
phys
/3 and additional errors due
to the fast faulty T measurement protocol, as discussed
in Sec. 2, with the exception that, for rotations that do
not involve qubit 1, Z errors in the ancilla region do not
cause Z errors on qubit 1. For single-qubit rotations,
X errors on the d
Z
× d
m
qubit cause P
π/4
errors and
occur with a probability of 0.5d
Z
p
L
(p
phys
, d
m
), whereas
Z errors spread to the adjacent qubit with a probability
of 0.5(d
2
m
/d
Z
)p
L
(p
phys
, d
Z
).
After each step (apart from step 7), d
m
code cycles
worth of X and Z storage errors are applied to all five
qubits. In addition, Z or X errors on the d
X
× d
X
an-
cilla patch used to consume the output state are added
as Z or X storage errors to the output state for d
X
code cycles. Finally, the output error probability p
out
is computed as the infidelity 1 F(ρ
ideal
, ρ
out
).
We refer to a 15-to-1 protocol characterized by the
code distances d
X
, d
Z
and d
m
as a (15-to-1)
d
X
,d
Z
,d
m
protocol. For p
phys
= 10
4
, we find that the protocol
(15-to-1)
7,3,3
has a p
out
= 4.4 × 10
8
, where we round
the output error to two significant digits. It has a space
cost of 810 qubits and a time cost of 18.1 code cycles.
These two numbers can be multiplied to obtain a space-
time cost of 14,600 qubitcycles to three significant dig-
its. We also report the space-time cost in terms of the
full distance d
full
. Consider a 100-qubit quantum com-
putation with a T -gate count of n
T
, where n
T
< 1/p
out
.
Using the construction from Ref. [28], the entire com-
putation can be finished in n
T
· d
full
code cycles, if 231
distance-d
full
surface-code patches are used to store the
100 qubits. The probability that any of these qubits is
affected by a storage error that can spoil the outcome
of the computation is
p
storage error
= 231 · n
T
· d
full
· p
L
(p
phys
, d
full
) . (9)
The storage error can be higher, if the computation
does not exclusively consist of π/8 rotations, but also
involves Pauli product measurements, as this increases
the length of the computation. Using magic states with
p
out
for each T gate, there is a probability of n
T
· p
out
that a faulty T gate will spoil the outcome of the com-
putation. If we demand that storage errors leads to a
relative increase of the error probability by 1% in the
best case, then d = d
full
needs to satisfy
231 · n
T
· d · p
L
(p
phys
, d) < 0.01 · n
T
· p
out
. (10)
Evidently, d does not depend on n
T
, but only on the
number of qubits in the computation. For a 10,000-
qubit computation, the qubits need to be stored using
20,284 distance-d qubits, and the condition changes to
20284 · d · p
L
(p
phys
, d) < 0.01 · p
out
. (11)
We pick the smallest odd-integer d that satisfies the
condition. For p
out
= 4.4×10
8
, d = 11 in the 100-qubit
Accepted in Quantum 2019-10-30, click title to verify 11
L1
(a) Qubit arrangement for the (15-to-1) ×(15-to-1) protocol
(b) (Z
1
Z
2
Z
3
)
π/8
and (Z
1
Z
4
Z
5
)
π/8
rotation
(c) Auto-corrected π/8 rotation
L1
L1 L1
L1L1
L1 L1
L1L1
L1 L1
L1L1
L1 L1
Figure 13: (a) Example of a qubit arrangement used for two-level (15-to-1)
n
L1
d
X
,d
Z
,d
m
×(15-to-1)
d
X2
,d
Z2
,d
m2
distillation with n
L1
= 8
level-1 blocks. (b) Example of two rotations performed in this arrangement. (c) The rotations are performed using auto-corrected
π/8 rotations, which consume a magic state and apply the Clifford correction by appropriately choosing the measurement basis of
the |0i ancilla qubit.
case and d = 13 in the 10,000-qubit case. The space-
time cost reported in the last two columns of Tab. 1
is in terms of (physical data qubits)×(code cycles), i.e.,
5.49d
3
or 3.33d
3
. These numbers are remarkably low, as
the cost to explicitly perform, e.g., a logical CNOT gate
on two distance-d qubits [6] is 3d
3
. Still, the only truly
meaningful number quantifying the space-time cost is
the cost in terms of qubitcycles.
The other protocols reported in Tab. 1 for p
phys
=
10
4
are (15-to-1)
9,3,3
and (15-to-1)
11,5,5
, reducing the
error to p
out
= 9.3 × 10
10
and p
out
= 1.9 × 10
11
, re-
spectively. For a higher physical error rate of p
phys
=
10
3
, (15-to-1)
17,7,7
has a p
out
= 4.5 × 10
8
. Inter-
ested readers can verify these numbers using the Python
script or Mathematica notebook provided in the Supple-
mentary Material [37], where they can also try out other
parameters. The Supplementary Material contains the
resource cost calculations for all protocols considered in
this paper.
15-to-1 protocols cannot generate arbitrarily good
output states, as p
out
is limited by 10p
3
phys
. In order to
distill higher-fidelity magic states, we turn to two-level
protocols.
4 Two-level protocols
The idea of two-level protocols is to use distilled magic
states (level-1 states) to perform a second round of dis-
tillation. We first discuss (15-to-1) × (15-to-1) proto-
cols, where 15-to-1 output states are used for a second
level of 15-to-1 distillation. The qubit arrangement used
for this protocol is shown in Fig. 13a. It is described
by three additional code distances d
X2
, d
Z2
and d
m2
,
and the number of level-1 distillation blocks n
L1
, where
n
L1
is an even integer. The central region consists of
a (d
X2
+ 4d
Z2
) × 3d
X2
array of qubits where the sec-
ond level of distillation takes place. To the left and
right are the level-1 distillation blocks that feed level-
1 states into the upper and lower ancilla region of the
level-2 block, respectively. Each of these two level-1 re-
gion consists of n
L1
/2 level-1 blocks. These blocks are
characterized by the level-1 distances d
X
, d
Z
and d
m
,
such that each level-1 region produces one magic state
every 6d
m
/(1 p
fail
)/(n
L1
/2) code cycles.
In addition, each level-1 region has a 3d
m2
×4d
m2
ar-
ray of qubits that separates the level-1 blocks from the
level-2 block. As level-1 states are produced, they are
transported into this intermediate region. For this pur-
pose, each level-1 region has two spots that are reserved
for level-1 output states. While one of these spots is be-
ing filled with a newly generated level-1 state, the magic
state in the other spot can be consumed to execute a
P
π/8
rotation in the level-2 block, as shown in Fig. 13b.
These rotations are performed using the auto-corrected
π/8 rotation [28] shown in Fig. 13c. Here, the operator
P Z between the level-2 qubits and a level-1 magic
state |mi is measured. Simultaneously, Z Y between
|mi and an ancillary |0i state is measured. Depend-
ing on the outcome of the P Z measurement, the |0i
qubit is either read out in the X or Z basis, essentially
performing a Clifford correction or not.
As shown in Fig. 13b, this kind of measurement can
be performed in d
m2
code cycles using a configuration
similar to a faulty T measurement. While one magic
state in each level-1 region is used to execute a level-2
π/8 rotation, a second level-1 state is being transported
Accepted in Quantum 2019-10-30, click title to verify 12
Step 6: 6t
L1
cycles Step 7: 7t
L1
cycles
Step 8: 8t
L1
cycles
Step 1: t
L1
cycles Step 2: 2t
L1
cycles Step 3: 3t
L1
cycles
Step 4: 4t
L1
cycles
Step 5: 5t
L1
cycles
Figure 14: Surface-code implementation of the (15-to-1) × (15-to-1) protocol.
to be used for the subsequent level-2 rotation. In the
top and bottom ancilla region, one such level-2 rotation
can be performed every
t
L1
= max(d
m2
, 6d
m
/(1 p
fail
)/(n
L1
/2)) (12)
code cycles. If level-1 states are produced slowly, t
L1
is
determined by the output rate of the level-1 factories.
If these states are produced fast, t
L1
will be limited by
d
m2
, the duration of the auto-corrected π/8 rotations.
The entire (15-to-1) ×(15-to-1) protocol is shown in
Fig 14, focusing on the level-2 region. In steps 1 and
2, the first four rotations of the 15-to-1 circuit are per-
formed. Since these are single-qubit rotations, |0i ancil-
las are not required, as Clifford corrections correspond
to Pauli corrections for these first four rotations. In
steps 3-7, rotations 5-14 are performed. The consump-
tion of the output state is initiated in step 7. In step
8, qubit 4 is measured in the X basis and rotation 15
is performed in the bottom ancilla region. Since the
space reserved for qubit 4 is now empty, the top ancilla
region can be used to perform the first rotation of the
subsequent distillation round, such that the next round
of distillation will only take 7t
L1
instead of 8t
L1
code
cycles. For this reason, the time cost of this distillation
block is 7.5t
L1
code cycles. Again, the consumption of
the output state will slow down the protocol, if it takes
longer than 3d
m2
code cycles, so the distances should
be chosen such that d
X2
3d
m2
.
Error analysis. The level-1 blocks output level-1
states with an output error of p
out
= p
L1
. This error
contributes to the probability of a P
π/2
error, when this
state is used for a P
π/8
rotation. Furthermore, the level-
1 state accumulates additional errors as it is moved to
the level-2 block. As shown in Fig. 13b, it traverses a
region of width d
m2
and a maximum length of n
L1
/4 ·
(d
X
+ 4d
Z
) + 2d
m2
for d
m2
code cycles, before ending
up in a 2d
m2
× 2d
m2
region, where it stays for another
d
m2
code cycles. Therefore, we define
l
move
= n
L1
/4 · (d
X
+ 4d
Z
) + 10d
m2
, (13)
as the length of the ancilla region that increases the
storage error of the level-1 state. In this sense, the
X and Z error of the level-1 state are each increased
by 0.5l
move
· p
L
(p
phys
, d
m2
), contributing to the P
π/2
and P
π/4
error, respectively. The error analysis of
the level-2 block is analogous to faulty T measure-
ments. For an ancilla of length l, where l can be up
to d
X2
+ 4d
Z2
+ d
m2
, X errors lead to a P
π/4
error
with a probability of 0.5(l · d
X2
/d
m2
) · p
L
(p
phys
, d
m2
).
Moreover, if qubit 1 is part of the rotation, it is affected
by an additional Z storage error with a probability of
0.5(l · d
m2
/d
X2
) · p
L
(p
phys
, d
X2
). Finally, the X and Z
Accepted in Quantum 2019-10-30, click title to verify 13
Step 1: t
L1
cycles Step 2: 2t
L1
cycles
Step 3: 3t
L1
cycles
Step 4: 4t
L1
cycles
Step 5: 5t
L1
cycles
Step 7: 7t
L1
cyclesStep 6: 6t
L1
cycles
Step 8: 8t
L1
cycles
Step 9: 9t
L1
cycles
Step 10: 10t
L1
cycles
Figure 15: Surface-code implementation of the (15-to-1) × (20-to-4) protocol.
error of the output state increases by d
X2
·p
L
(p
phys
, d
X2
)
while it is being consumed.
Results. The space cost of the protocols has three
contributions: 2 · (d
X2
+ 4d
Z2
) · 3d
X2
qubits from the
level-2 block, 2n
L1
((d
X
+ 4d
Z
)(3d
X
+ d
m2
/2) + 2d
m
)
qubits from the level-1 blocks, including the ancilla
that they are connected to, and 2 · (20d
2
m2
+ 2d
X2
d
m2
)
qubits from the additional ancilla qubits. Note that,
in contrast to the 15-to-1 factory, the footprint of
the (15-to-1)×(15-to-1) factory is not necessarily rect-
angular. The time cost is 7.5t
L1
. Applying errors
numerically according to our error analysis, we ob-
tain the output error. We label our protocols as
(15-to-1)
n
L1
d
X
,d
Z
,d
m
× (15-to-1)
d
X2
,d
Z2
,d
m2
. For p
phys
=
10
4
, we find that a (15-to-1)
4
9,3,3
× (15-to-1)
25,9,9
pro-
tocol produces output states with p
out
= 6.3×10
25
for
1,260,000 qubitcycles. For p
phys
= 10
3
, the protocol
(15-to-1)
6
11,5,5
× (15-to-1)
29,11,13
produces magic states
with p
out
= 3.3 ×10
14
for 3,810,000 qubitcycles. More
protocols can be found in Tab. 1 or generated using the
Python script or Mathematica notebook provided in the
Supplementary Material [37].
20-to-4 dist illation. For output states with a de-
sired error rate that is lower than what is possible
with one level of 15-to-1, but higher than what can
be achieved with two levels of 15-to-1, it can be more
efficient to use a level-2 protocol that is cheaper, but
features less error suppression. One such protocol is
the 20-to-4 protocol of Fig. 4. The implementation of a
(15-to-1)×(20-to-4) is shown in Fig. 15 and is very sim-
ilar to a (15-to-1)×(15-to-1) protocol. The main differ-
ence is that the length of the level-2 region increases to
4d
X2
+3d
Z2
, as the 20-to-4 circuit acts on seven qubits,
four of which are output states.
For protocols that generate multiple output states,
it is particularly important to pick a suitable order in
which the rotations are performed, in order to avoid
congestion. If four output states are generated at the
end of the protocol, but the computation demands that
they are consumed one after the other, then they will
block the level-2 factory for many cycles. In step 1 of
our protocol, rotations 1 and 2 are performed. In step
2, qubit 1 is initialized in the |+i state and rotations 4
and 5 are performed. Simultaneously, the measurement
of qubit 1 as an output state can be initiated. In steps
3-6, rotations 3 and 6-12 are performed. In step 7, ro-
tations 13 and 14 are performed and the consumption
of output state 2 can be initiated. The snapshots in
Fig. 15 are drawn in a way that assumes that it takes
3t
L1
to consume a magic state, but this depends on
d
X2
. In steps 8-9, rotations 15-18 are performed and
Accepted in Quantum 2019-10-30, click title to verify 14
2
3
4
5
6 7 81
(a) Non-trivial representation of the identity
(b) 8-to-CCZ synthillation circuit
Figure 16: (a) These 15 rotations are non-trivially equivalent to the identity. (b) 8-to-CCZ synthillation circuit obtained by
multiplying the circuit in (a) with a 7-rotation CCZ gate.
the consumption of qubit 3 can be initiated. In step 10,
the last two rotations are performed and the protocol
finishes after 10t
L1
code cycles. The first three steps of
the subsequent round of distillation can be used to ini-
tiate the measurement of output state 4 and finish the
measurements of all remaining output states. If output
states are consumed one after the other, e.g., to perform
π/8 rotations one after the other, this protocol allocates
up to 2.5t
L1
code cycles for each output state, which is
sufficient for the parameters that we consider.
The error analysis of protocols labeled as
(15-to-1)
n
L1
d
X
,d
Z
,d
m
× (20-to-4)
d
X2
,d
Z2
,d
m2
is identical
to two-level 15-to-1 protocols, albeit with a dif-
ferent space and time cost, and the extra step of
dividing the space-time cost and output error by
four. Moreover, if multiple output qubits are part
of a rotation, the additional Z error due to Z-error
strings in the ancilla region with a probability of
0.5(l ·d
m2
/d
X2
) ·p
L
(p
phys
, d
X2
) is applied to all output
qubits that are part of the rotation. For p
phys
= 10
4
,
we find that the (15-to-1)
4
9,3,3
× (20-to-4)
15,7,9
protocol
generates states with an error of p
out
= 2.4 × 10
15
per output state and a space-time cost of 371,000
qubitcycles per magic state. For p
phys
= 10
3
,
the protocols (15-to-1)
6
13,5,5
× (20-to-4)
21,11,13
and
(15-to-1)
4
13,5,5
× (20-to-4)
27,13,15
have output errors of
1.4 ×10
10
and 2.6 ×10
11
per state with a space-time
cost of 1,410,000 and 1,840,000 qubitcycles per output
state.
What about coherent errors? As discussed in
Sec. 1, the performance of distillation protocols can be
worse, if the underlying error model features coherent
errors. In a circuit-level study of surface codes, coher-
ent errors are difficult to analyze. If the logical error
rate of surface codes can be maintained at p
L
(p
phys
, d)
even in the presence of coherent errors, and the only
effect of coherent errors is to under- or over-rotate the
physical T gate used in faulty T measurements, then
one can argue that the effect of coherent errors is not
too significant. These errors would then increase the
minimum achievable output error rate of the first level
of distillation, albeit by an error that is governed by the
single-qubit error rate. While this can be a problem for
single-level distillation schemes, this is less detrimen-
tal for two-level distillation schemes, as only the first
level is affected. Since level-1 blocks typically output
states that have a fidelity that is much lower than the
maximum achievable level-1 fidelity, the overall output
fidelity would be barely affected. For instance, the level-
1 block of the (15-to-1)
4
13,5,5
×(20-to-4)
27,13,15
protocol
outputs states with p
out
10
6
, whereas the lowest
possible error of the 15-to-1 protocol for p
phys
= 10
3
is p
out
10
8
. Still, a more careful treatment of coher-
ent errors is necessary, but is beyond the scope of this
work.
What about feed-forward? One possible limiting
factor in quantum computers is the feed-forward time,
i.e., the time it takes to react to measurement outcomes,
deciding which operation should be performed next. In
Accepted in Quantum 2019-10-30, click title to verify 15
our protocols, some qubits need to be measured in the
X or Z basis depending on previous measurement out-
comes, which is used to avoid Clifford corrections. In
Fig. 13a, these are the qubits in the intermediate region
between the level-1 blocks and the level-2 block. If the
feed-forward time is a bottleneck in a given architecture,
these qubits need to be stored for some additional time
before being read out. A slowdown due to feed-forward
can be avoided by using additional ancilla qubits in this
ancilla region. In any case, long feed-forward times in-
crease the overall space-time cost.
The constructions discussed in the previous sections
can be used to implement any distillation protocol that
can be expressed as a sequence of Z-type π/8 rotations,
e.g., all protocols that are based on triorthogonal ma-
trices [11, 19]. A very similar class of protocols are
synthillation protocols, whose implementation we dis-
cuss in the following section.
5 Synthillation
Synthillation [17, 38] is a portmanteau of the words
(gate) synthesis and distillation. The idea of synthilla-
tion is to generate resource states that do not execute
single T gates or π/8 rotations, but entire layers of
commuting π/8 rotations. The simplest example is the
|CCZi resource state, which is prepared by applying
a controlled-controlled-Z (CCZ) gate to a |+i
3
state.
This state can be used to perform a CCZ gate, which
can be written as a sequence of seven commuting π/8
rotations [28, 39]. However, it is also possible to exe-
cute CCZ gates using four T gates [40], or even only
two T gates, if these CCZ gates are part of a compute-
uncompute circuit [41].
Synthillation circuits can be obtained the same way
as ordinary distillation circuits. We start with a non-
trivial representation of the identity in Fig. 16a as a
sequence of 15 π/8 rotations on 4 qubits. Next, we
cancel the first three and last four rotations by multi-
plying the entire circuit with the corresponding +π/8
and π/8 rotations, i.e., the 7 rotations on the right-
hand side of Fig. 16b. These 7 rotations happen to
correspond to the decomposition of the CCZ gate into
7 π/8 rotations. In other words, the circuit on the left-
hand side of Fig. 16b prepares a |CCZi state on the
first three qubits and acts trivially on the fourth qubit.
Therefore, the fourth qubit can be used to detect errors.
If any one of the 8 rotations fails, the X measurement
outcome will flip. However, any pair of errors will go
undetected. Therefore, the output error to leading or-
der is 28p
2
under Z-Pauli noise.
Since the 8-to-CCZ circuit is a sequence of Z-type
π/8 rotations, it can be implemented the same way as
Step 1: t
L1
cycles
Step 2: 2t
L1
cycles
Step 3: 3t
L1
cycles
Step 4: 4t
L1
cycles
Figure 17: Surface-code implementation of the (15-to-1) ×
(8-to-CCZ) protocol.
a two-level distillation protocol, as shown in Fig. 17.
The level-2 block has a width of 3d
X2
+ d
Z2
. Per-
forming two rotations every t
L1
code cycles, the pro-
tocol finishes after 4t
L1
code cycles. In our numerical
error analysis, we obtain the output error as the in-
fidelity between the output state and the ideal state
ρ
ideal
= |CCZihCCZ| |+ih+|. Labelling our pro-
tocols as (15-to-1)
n
L1
d
X
,d
Z
,d
m
×(8-to-CCZ)
d
X2
,d
Z2
,d
m2
, we
find that, for p
phys
= 10
3
, the protocol (15-to-1)
6
13,7,7
×
(8-to-CCZ)
25,15,15
generates |CCZi states with a fi-
delity of p
out
= 5.2 × 10
11
and a space-time cost
of 2,820,000 qubitcycles. If the execution of a CCZ
Accepted in Quantum 2019-10-30, click title to verify 16
gate using ordinary magic states requires four of these
states, they need to have a quarter of this output er-
ror to achieve the same gate error. In comparison, a
(15-to-1) × (20-to-4) protocol can generate one T -gate
magic state with p
out
= 2.6 × 10
11
using 1,840,000
qubitcycles per state. In other words, four such states
would cost 7,360,000 qubitcycles, more than twice as
expensive as a |CCZi state.
While this implies that synthillation protocols can re-
duce the distillation cost, this ignores the fact that the
consumption of the output state can congest the dis-
tillation block. In the construction of Fig. 17, there
are 2t
L1
code cycles for the consumption of each state.
If all states can be consumed simultaneously, then this
might be sufficient. However, if the rest of the quantum
computer is such, that the three output states need to
be consumed one after the other, the synthillation pro-
tocol may be slowed down significantly, increasing the
overall space-time cost. This can be avoided by using
additional qubits to temporarily store the output state,
although this does not prevent an increase in the space-
time cost. Another possibility is to slow down the pro-
tocol to increase the time available for the consumption
of the output states, e.g, by using slow measurements
instead of fast measurements.
For p
phys
= 10
4
, we find that a (15-to-1)
4
7,3,3
×
(8-to-CCZ)
15,7,9
generates output states with p
out
=
7.2 × 10
14
using 447,000 qubitcycles, which is also
cheaper compared to the cost to distill ordinary magic
states with the same fidelity. Such synthillation proto-
cols can be used to generate resource states that execute
any arbitrary sequence of π/8 rotations. Schemes to
obtain such protocols are found in Ref. [17]. However,
note that the problem of output states congesting the
distillation block becomes more severe, if the generated
output state consists of many qubits.
6 Small-footprint protocols
So far, our protocols have focused on minimizing the
space-time cost. In this section, we outline how proto-
cols can be designed to minimize qubit footprint, i.e.,
the space cost, by sacrificing space-time overhead. We
discuss the example of (15-to-1) and (15-to-1)×(15-to-1)
protocols with a target output error of p
out
10
9
.
15-to-1. The footprint can be straightforwardly re-
duced by using only one region of ancilla qubits in-
stead of two. This reduces the footprint to 4(d
X
+
4d
Z
)d
X
+ 2d
m
physical qubits, as shown in Fig. 18.
Since only once ancilla region is used, the time cost
doubles compared to the protocol of Fig. 11 to 12d
m
.
The error estimate is identical to the ordinary 15-to-1
protocol. We find that, for p
phys
= 10
4
, the small-
Figure 18: Qubit arrangement of a small-footprint implemen-
tation of the 15-to-1 protocol.
(b) (Z
1
Z
3
Z
5
)
π/8
rotation
(a) Small-footprint (15-to-1)×(15-to-1)
Figure 19: (a) Qubit arrangement for the small-footprint imple-
mentation of the (15-to-1)×(15-to-1) protocol. (b) Example
of a (Z
1
Z
3
Z
5
)
π/8
rotation executed in 2d
m2
cycles.
footprint (15-to-1)
9,3,3
protocol produces magic states
with p
out
= 1.5×10
9
using 762 qubits. However, since
the protocol takes 36.2 code cycles, the space-time cost
of 27,600 qubitcycles is higher than for comparable pro-
tocols with a similar output error.
Two-level 15-to-1 distillation. For two levels of
15-to-1 with a small footprint, we use the arrangement
shown in Fig. 19a, which consists of a (d
X2
+ 4d
Z2
) ×
2d
X2
level-2 block, a (d
X
+4d
Z
)×3d
X
level-1 block, and
an intermediate region of size 2d
m2
×2d
m2
+d
m2
×d
X2
.
The level-1 block outputs level-1 states every 6d
m
code cycles. In principle, it is possible to use the small-
Accepted in Quantum 2019-10-30, click title to verify 17
footprint level-1 block of Fig. 18, but in the interest
of not increasing the space-time cost by too much, we
use the 15-to-1 block introduced in Sec. 3. When a
level-1 state is generated, it can be consumed for a
level-2 rotation in 2d
m2
code cycles. An example of
a (Z
1
Z
3
Z
5
)
π/8
is shown in Fig. 19b. In the first
d
m2
code cycles, a level-1 state is transported from the
level-1 block to the intermediate region. In the sec-
ond d
m2
code cycles, a Pauli product measurement is
performed, executing a P
π/8
rotation according to the
circuit in Fig. 13c. Transporting the level-1 state into
the intermediate region increases its X and Z error by
5d
m2
·p
L
(p
phys
, d
m2
). If we define the time that it takes
to execute a level-2 rotation as
t
L1
= max(2d
m2
, 6d
m
/(1 p
fail
)) , (14)
then the distillation protocol takes 15t
L1
to finish.
In our numerical analysis, we find that, for p
phys
=
10
3
, a small-footprint (15-to-1)
9,5,5
× (15-to-1)
21,9,11
protocol generates magic states with p
out
= 6.1 ×10
10
using only 7,780 physical qubits. With a time cost of
469 cycles, the space-time cost is 3,650,000 qubitcycles
per output state. While the space cost is very low,
this protocol sacrifices space-time cost, as an ordinary
(15-to-1) × (20-to-4) protocol can generate states with
p
out
= 1.4 × 10
10
for only 1,420,000 qubitcycles.
7 Conclusion
We have constructed magic state distillation protocols
that reduce the space-time cost by approximately 90%
compared to the previous state of the art. Since our
results were not obtained by simulating entire surface-
code patches and running an actual decoder, but via a
careful error analysis, the numbers reported in Tab. 1
should be taken with a grain of salt. The protocols dis-
cussed in this paper should rather be regarded as a proof
of principle, demonstrating that the overhead of distil-
lation can be reduced significantly by carefully tuning
the code distances of the different qubits that are part
of the distillation protocol. In any case, exact numbers
will have a strong dependence on the hardware-specific
error parameters and the decoding procedure.
There is still plenty of room for optimization. For one,
we only considered very simple distillation protocols,
i.e., the 15-to-1 distillation protocol, a 20-to-4 block-
code protocol and the synthillation of |CCZi states.
Perhaps, more sophisticated distillation circuits can fur-
ther reduce the cost. While the 20-to-4 protocol is part
of an entire family of (3k + 8)-to-k protocols, it seems
unlikely that a higher k will decrease the cost, since the
space cost is governed by k + 3, and the time cost is
governed by 3k + 8. Therefore, the space-time cost per
output state is governed by (3k+8)(k+3)/k, which hap-
pens to have minima at k = 2 and k = 4 for even integer
k. Still, since it is possible to generate arbitrarily many
such protocols based on triorthogonal codes [11, 19],
one could look for protocols that minimize the space-
time cost in this manner. For two-level distillation, it
remains unclear which combination of protocols reduces
the cost. It also remains unclear whether the space-time
cost can be decreased by using protocols that reduce
the number of π/8 rotation in the distillation circuit
by adding Clifford gates [3], or protocols that employ
catalyst states that need to be stored at a higher code
distance [21]. One could also construct protocols with
more than two levels of distillation.
The resource requirements for fault-tolerant surface-
code-based quantum computing can be daunting.
Hopefully, this work helps demonstrate that this is not
mainly due to the overhead of magic-state distillation,
but rather due to the low encoding rate of topological
codes, implying that thousands of physical qubits are
required to encode a single logical qubit.
Acknowledgments
I would like to thank Earl Campbell and Craig Gidney
for discussions about the advantages of two-level dis-
tillation protocols, Lingling Lao for discussions about
state injection, and Markus Kesselring for discussions
about the surface-code implementations. This work
has been supported by the Deutsche Forschungsgemein-
schaft (Bonn) within the network CRC TR 183.
References
[1] J. Preskill, Reliable quantum computers, Proc. Roy.
Soc. Lond. A 454, 385 (1998).
[2] B. M. Terhal, Quantum error correction for quan-
tum memories, Rev. Mod. Phys. 87, 307 (2015).
[3] E. T. Campbell, B. M. Terhal, and C. Vuil-
lot, Roads towards fault-tolerant universal quantum
computation, Nature 549, 172 (2017).
[4] A. Y. Kitaev, Fault-tolerant quantum computation
by anyons, Ann. Phys. 303, 2 (2003).
[5] A. G. Fowler, M. Mariantoni, J. M. Martinis, and
A. N. Cleland, Surface codes: Towards practical
large-scale quantum computation, Phys. Rev. A 86,
032324 (2012).
[6] C. Horsman, A. G. Fowler, S. Devitt, and R. V.
Meter, Surface code quantum computing by lattice
surgery, New J. Phys. 14, 123011 (2012).
[7] D. Litinski and F. v. Oppen, Lattice Surgery with a
Twist: Simplifying Clifford Gates of Surface Codes,
Quantum 2, 62 (2018).
Accepted in Quantum 2019-10-30, click title to verify 18
[8] A. G. Fowler and C. Gidney, Low over-
head quantum computation using lattice surgery,
arXiv:1808.06709 (2018).
[9] S. Bravyi and A. Kitaev, Universal quantum com-
putation with ideal Clifford gates and noisy ancil-
las, Phys. Rev. A 71, 022316 (2005).
[10] R. Babbush, C. Gidney, D. W. Berry, N. Wiebe,
J. McClean, A. Paler, A. Fowler, and H. Neven,
Encoding electronic spectra in quantum circuits
with linear T complexity, Phys. Rev. X 8, 041015
(2018).
[11] S. Bravyi and J. Haah, Magic-state distillation with
low overhead, Phys. Rev. A 86, 052329 (2012).
[12] A. G. Fowler, S. J. Devitt, and C. Jones, Surface
code implementation of block code state distillation,
Scientific Rep. 3, 1939 (2013).
[13] A. M. Meier, B. Eastin, and E. Knill, Magic-
state distillation with the four-qubit code, Quant.
Inf. Comp. 13, 195 (2013).
[14] C. Jones, Multilevel distillation of magic states
for quantum computing, Phys. Rev. A 87, 042305
(2013).
[15] G. Duclos-Cianci and K. M. Svore, Distillation of
nonstabilizer states for universal quantum compu-
tation, Phys. Rev. A 88, 042325 (2013).
[16] G. Duclos-Cianci and D. Poulin, Reducing the
quantum-computing overhead with complex gate
distillation, Phys. Rev. A 91, 042315 (2015).
[17] E. T. Campbell and M. Howard, Unified framework
for magic state distillation and multiqubit gate syn-
thesis with reduced resource cost, Phys. Rev. A 95,
022316 (2017).
[18] J. O’Gorman and E. T. Campbell, Quantum com-
putation with realistic magic-state factories, Phys.
Rev. A 95, 032338 (2017).
[19] J. Haah and M. B. Hastings, Codes and Protocols
for Distilling T , controlled-S, and Toffoli Gates,
Quantum 2, 71 (2018).
[20] E. T. Campbell and M. Howard, Magic state
parity-checker with pre-distilled components,
Quantum 2, 56 (2018).
[21] C. Gidney and A. G. Fowler, Efficient magic state
factories with a catalyzed |CCZi to 2|T i transfor-
mation, Quantum 3, 135 (2019).
[22] C. Jones, P. Brooks, and J. Harrington, Gauge
color codes in two dimensions, Phys. Rev. A 93,
052332 (2016).
[23] S. Bravyi and A. Cross, Doubled color codes,
arXiv:1509.03239 (2015).
[24] T. Jochym-O’Connor and S. D. Bartlett, Stacked
codes: Universal fault-tolerant quantum computa-
tion in a two-dimensional layout, Phys. Rev. A 93,
022323 (2016).
[25] H. Bombin, 2D quantum computation with 3D
topological codes, arXiv:1810.09571 (2018).
[26] C. Chamberland and A. W. Cross, Fault-tolerant
magic state preparation with flag qubits, Quantum
3, 143 (2019).
[27] B. J. Brown, A fault-tolerant non-Clifford
gate for the surface code in two dimensions,
arXiv:1903.11634 (2019).
[28] D. Litinski, A Game of Surface Codes: Large-Scale
Quantum Computing with Lattice Surgery, Quan-
tum 3, 128 (2019).
[29] M. Amy and M. Mosca, T-count optimization and
Reed-Muller codes, IEEE Transactions on Informa-
tion Theory , 1 (2019).
[30] J. Haah, M. B. Hastings, D. Poulin, and
D. Wecker, Magic state distillation with low space
overhead and optimal asymptotic input count,
Quantum 1, 31 (2017).
[31] A. J. Landahl and C. Ryan-Anderson, Quan-
tum computing by color-code lattice surgery,
arXiv:1407.5103 (2014).
[32] Y. Li, A magic states fidelity can be superior to the
operations that created it, New J. Phys. 17, 023037
(2015).
[33] J. Lodyga, P. Mazurek, A. Grudka, and
M. Horodecki, Simple scheme for encoding and de-
coding a qubit in unknown state for various topo-
logical codes, Scientific Rep. 5, 8975 (2015).
[34] L. Lao et al., Preparing high-fidelity magic states
with low costs, in preparation .
[35] Cramming more power into a quantum device,
https://www.ibm.com/blogs/research/2019/03/
power-quantum-device/, accessed: 2019-05-09.
[36] K. Wright, K. Beck, S. Debnath, J. Amini,
Y. Nam, N. Grzesiak, J.-S. Chen, N. Pisenti,
M. Chmielewski, C. Collins, et al., Benchmarking
an 11-qubit quantum computer, arXiv:1903.08181
(2019).
[37] The Python script and Mathematica
notebook can be found on GitHub, see
https://github.com/litinski/magicstates .
[38] E. T. Campbell and M. Howard, Unifying gate syn-
thesis and magic state distillation, Phys. Rev. Lett.
118, 060501 (2017).
[39] P. Selinger, Quantum circuits of T -depth one,
Phys. Rev. A 87, 042302 (2013).
[40] C. Jones, Low-overhead constructions for the fault-
tolerant Toffoli gate, Phys. Rev. A 87, 022328
(2013).
[41] C. Gidney, Halving the cost of quantum addition,
Quantum 2, 74 (2018).
[42] B. J. Brown and S. Roberts, Universal fault-
tolerant measurement-based quantum computation,
arXiv:1811.11780 (2018).
Accepted in Quantum 2019-10-30, click title to verify 19
Figure 20: Protocol of a faulty T measurement in the spirit of Ref. [33]. It can be thought of as a d × d patch being shrunk to a
(d 2) × (d 2) patch, then a (d 4) × (d 4) patch, and so on, until a physical T gate is performed on the remaining 1 × 1
patch (the blue qubit), before it is measured in the X basis. Since all these shrinking operations can be performed simultaneously,
the T -measurement protocol corresponds to just the last step of the figure, where all red qubits are measured in the Z basis, all
green qubits in the X basis, and a faulty T measurement is performed on the blue qubit.
A Faulty T measurements
This appendix discusses faulty T measurements in
more detail. Specifically, we discuss approaches to de-
crease their error rate to justify the use of a faulty-T -
measurement error rate of p
phys
in the main text and
the potential implications of a higher error rate.
The T -measurement protocol discussed in Sec. 2.2 is
simple, but the operation of shrinking a d×d patch to a
d ×1 patch is associated with an error that scales with
the code distance d. While this might not be problem-
atic for small code distances, it implies that this specific
T -measurement protocol will not work for larger code
distances. A more advanced version of this protocol is
shown in Fig. 20. This protocol essentially corresponds
to the state-injection protocol of Ref. [33] in reverse. A
d×d patch is shrunk to a (d2)×(d2) patch by mea-
suring physical qubits along the boundary in the X or
Z basis. Next, the patch is shrunk to a (d 4) ×(d 4)
patch, a (d6)×(d6) patch, etc., until we end up with
a 1 ×1 patch, i.e., a single physical qubit corresponding
to the logical qubit. This qubit is then measured in the
T basis. Since the qubit is in a distance-1 state only at
the very end of the protocol, the error rate of this T-
measurement protocol is not proportional to the code
distance d.
The shrinking operations can all be performed simul-
taneously, such that the faulty T measurement corre-
sponds to measuring all red qubits in the last step of
Fig. 20 in the Z basis, and all green qubits in the X
basis. A physical T gate is applied to the remaining
blue qubit, after which it is measured in the X basis.
This protocol is almost identical to the T-measurement
protocol of Sec. 2.2, with the main difference being the
position of the blue qubit. In the protocol of Sec. 2.2,
the blue qubit is in the corner of the patch, whereas in
Fig. 20, the blue qubit is in the center of the patch.
In order to examine the error sources of this pro-
tocol, we consider the space-time diagram of the T -
measurement protocol of Fig. 20, which is shown in
Fig. 21. It is identical to the space-time diagram of the
state-injection protocol of Ref. [33] shown in Ref. [42].
Such space-time diagrams are useful to visualize the
effect of physical Pauli errors and stabilizer measure-
ment errors on the encoded logical information. Strings
of errors connecting the red boundaries in the space-
time diagram lead to logical Z errors, whereas error
strings connecting the green boundaries lead to logical
X errors. In our usual estimates of idling distance-d
logical qubits, we take into account the weight-d error
strings that precede the faulty T measurement, i.e., er-
ror strings similar to the ones labelled (1) and (2) in
Fig. 21, which have a probability of p
L
(p
phys
, d). Dur-
ing state injection or faulty T measurements, there are
some additional low-weight errors corresponding to er-
ror strings that connect two time-like boundaries during
the T measurement, i.e., single-qubit errors on the blue
qubit (3) and strings similar to the one labelled (4) in
Fig. 21.
Pre-selection. These low-weight errors are the
dominant contribution to the error rate of such state-
injection of faulty-T -measurement protocols. Specifi-
cally, for circuit-level Pauli noise, single-qubit errors af-
fecting the qubits close to the blue qubit and failures
t
(1)
(2)
(3)
(4)
Figure 21: Space-time diagram of the T -measurement protocol
of Fig. 20 with time increasing to the right, and four examples
of error strings that cause logical errors.
Accepted in Quantum 2019-10-30, click title to verify 20
Step 1 Step 2 Step 3
Figure 22: The finite rejection rate of pre-selection can be prevented from significantly increasing the time cost of distillation
protocols by using an additional d
m
×d
m
patch during the execution of P
π/8
rotations. While one d
m
×d
m
patch is idling until the
pre-selection condition is satisfied i.e., until the stabilizers in a region surrounding the sensitive physical qubit have not reported
any syndromes for two code cycles a second d
m
× d
m
patch can be used to execute the next P
π/8
rotation.
of two-qubit gates involved in the syndrome readout of
check operators close to the blue qubit contribute to
these low-weight errors. In the case of state-injection
protocols, these errors can be suppressed through the
addition of post-selection [32], particularly, if the er-
ror rate of two-qubit gates is significantly higher than
the error rate of single-qubit operations. An injected
magic state is only accepted, if the stabilizer checks in
a certain region around the sensitive, blue qubit report
no syndrome for two code cycles. Since at least two
circuit-level two-qubit errors are required to generate
an unreported error, the dominant contribution to the
error rate of the state-injection protocol will be gov-
erned by single-qubit errors on the sensitive physical
qubits. However, this post-selection adds a certain fail-
ure probability to the state-injection protocol, as some
instances of state injection will be rejected. The proto-
col in Ref. [32] reduces the error rate of state injection
to 2p
phys
/3 for two-qubit error rates of p
2
= p
phys
and one-qubit error rates of p
1
= p
phys
/10 with a rejec-
tion rate of around 50% for p
phys
= 10
3
. In Ref. [32], a
distance-7 region around the sensitive qubit was used for
post-selection, leading to a high rejection rate. Prelim-
inary numerical results by Lao et al. [34] indicate that
by post-selecting a distance-3 region around the sensi-
tive qubit, the error rate of state injection still stays
below p
phys
, while the rejection rate drops to 4% for
p
phys
= 10
3
.
Since faulty T measurements are identical to state
injection, apart from the time direction being reversed,
the same approach can be used to reduce the error rate
of faulty-T -measurement protocols. The analogous op-
eration is pre-selection: The single-qubit measurements
in the protocol of Fig. 21 are delayed, until the check
operators in a region around the blue qubit have re-
ported no syndrome for two consecutive code cycles.
This justifies the use of a faulty-T -measurement error
rate of p
phys
in the main text, but also adds a time cost
to the fault-T-measurement protocol through the rejec-
tion rate. While one might be worried that, particularly
for high p
phys
, the finite rejection rate in this faulty-
T -measurement protocol increases the time it takes to
execute a faulty P
π/8
rotation, this does not need to be
the case, if a few extra qubits are added to the protocol,
as shown in Fig. 22. Here, in step 1, a P
π/8
rotation is
performed using the protocol of Fig. 9. The d
m
× d
m
patch corresponding to the qubit that needs to be read
out using a faulty T measurement is then left idling, un-
til the pre-selection condition is satisfied and the qubit
can be measured. While this patch is idling, a second
d
m
× d
m
patch is used to perform the next P
π/8
rota-
tion in step 2. With a high probability, the d
m
× d
m
patch left idling in step 1 will have been read out by the
beginning of step 3, such that the freed-up space can be
used to initialize a d
m
×d
m
patch for the next rotation.
In rare cases, the d
m
× d
m
patch left idling in step 1
will have failed the pre-selection after, in this example,
2d
m
code cycles, in which case it needs to keep idling,
increasing the overall time cost of the distillation pro-
tocol. If this happens sufficiently rarely, the additional
time cost is negligible.
The time required to perform a faulty T measure-
ment with pre-selection depends on the rejection rate.
In the above example, if, e.g., the probability of fail-
ing the pre-selection is 50% every two code cycles
and d
m
= 5, then the probability of the d
m
× d
m
patch having failed the pre-selection after 2d
m
code
cycles is 1%, implying that the overall time cost of
level-1 distillation protocols increases by 1% compared
to the numbers presented in Tab. 1. This increase is
even lower under the assumption of a rejection rate of
4%. However, the numbers in Tab. 1 do not take into
account the increased space cost due to the additional
d
m
× d
m
patches that are used to avoid the time-cost
increase. If, as in Fig. 22, one uses two d
m
×d
m
patches
for each measurement region, then the space cost for,
e.g., a (15-to-1)
11,5,5
protocol increases by 9% com-
pared to the numbers in Tab. 1. Note that, in two-
level protocols, the increased space cost only affects the
first level of distillation, but not the second, such that,
Accepted in Quantum 2019-10-30, click title to verify 21
Protocol p
phys
p
out
Qubits Cycles
Space-time cost per output state
Qubitcycles Full distance
(15-to-1)
9,3,3
10
4
2.1 × 10
8
1,150 18.2 20,900 4.75d
3
/d = 13 3.10d
3
/d = 15
(15-to-1)
6
7,3,3
× (20-to-4)
13,5,7
10
4
1.4 × 10
12
13,200 70.0 231,000 23.5d
3
/d = 17 16.9d
3
/d = 19
(15-to-1)
4
9,3,3
× (20-to-4)
15,7,9
10
4
6.6 × 10
15
16,400 91.2 374,000 27.3d
3
/d = 19 20.2d
3
/d = 21
(15-to-1)
4
9,3,3
× (15-to-1)
25,9,9
10
4
4.2 × 10
22
18,600 68.4 1,270,000 32.4d
3
/d = 27 26.1d
3
/d = 29
(15-to-1)
6
13,5,5
× (20-to-4)
21,11,13
10
3
5.7 × 10
9
40,700 130 1,325,000 33.7d
3
/d = 27 22.2d
3
/d = 31
(15-to-1)
6
11,5,5
× (15-to-1)
21,9,11
10
3
2.1 × 10
10
27,400 85.7 2,350,000 48.1d
3
/d = 29 32.7d
3
/d = 33
(15-to-1)
6
11,5,5
× (15-to-1)
23,11,11
10
3
2.5 × 10
11
29,500 85.7 2,530,000 42.5d
3
/d = 31 29.5d
3
/d = 35
(15-to-1)
6
11,5,5
× (15-to-1)
25,11,11
10
3
6.4 × 10
12
30,700 85.7 2,630,000 36.7d
3
/d = 33 26.0d
3
/d = 37
(15-to-1)
8
13,7,7
× (15-to-1)
29,13,13
10
3
1.5 × 10
13
52,400 97.5 5,110,000 59.6d
3
/d = 35 43.1d
3
/d = 39
Table 2: Selection of distillation protocols under the assumption of a faulty-T -measurement error rate of 10p
phys
instead of 1p
phys
as in Tab. 1.
e.g., for a (15-to-1)
6
11,5,5
×(15-to-1)
25,11,11
protocol, the
space cost increases by less than 2%. Furthermore, the
error estimate in the main text does not take into ac-
count that, in cases where the d
m
× d
m
patches is left
idling for many code cycles, the error rate of the cor-
responding P
π/8
rotations is increased. However, with
pre-selection, the error rate of faulty T measurements
can be lower than the p
phys
assumed in the main text, if
single-qubit operations are significantly less noisy than
two-qubit operations.
Replacing faulty T measurements with state
injection. After shrinking the d×d patch to a physical
qubit, the preceding stabilizer measurement outcomes
need to be decoded before a T (or T
) gate and an X
measurement can be applied to the physical qubit. If
the time required for this decoding operation is long
compared to the decoherence time of idling physical
qubits, the error rate of faulty T measurements will be
significantly increased. In this case, one should use state
injection instead of faulty T measurements, executing
P
π/8
rotations via auto-corrected π/8 rotations [28], the
same way P
π/8
rotations are executed in level-2 distil-
lation protocols (see Fig. 13). This has a higher space
cost compared to faulty T measurements.
What if faulty T measurements have a higher
error rate? If one uses a simple faulty-T -measurement
or state-injection protocol without post-selection of pre-
selection, one might be worried that an increased T -
measurement error rate could adversely affect the per-
formance of the distillation protocols discussed in the
main text. Specifically, we can consider the effect on
the output error rate p
out
, if the T -measurement Pauli
error rate is 10p
phys
instead of p
phys
. We find that, for
some of the distillation protocol shown in Tab. 1, an
increased T -measurement error has only a small effect
on the performance.
A collection of protocols is shown in Tab. 2. The
main effect of an increased faulty-T-measurement er-
ror rate is an increase of the lowest achievable out-
put error rate p
out
for a given family of distillation
protocols. For instance, the lowest achievable p
out
for
a one-level 15-to-1 protocol increases from 10.37p
3
to
10370p
3
. For example, for p
phys
= 10
4
, a one-level
15-to-1 protocol can no longer produce magic states
with p
out
10
11
and, instead, a (15-to-1) × (20-to-4)
protocol needs to be used, significantly increasing the
space-time cost to produce these states. On the other
hand, the (15-to-1)
4
9,3,3
× (20-to-4)
15,7,9
protocol pro-
duces states with a p
out
that is far away from the low-
est achievable p
out
of a (15-to-1) × (20-to-4) protocol
with p
phys
= 10
4
. Compared to Tab. 1, the out-
put error rate increases from p
out
= 2.4 × 10
15
to
p
out
= 6.6×10
15
. Due to an increased failure probabil-
ity, there is also a very small increase in space-time cost
by 0.8%. Finally, (15-to-1) ×(15-to-1) protocols can no
longer produce magic states with an output error below
p
out
10
22
.
For p
phys
= 10
3
, we find a similar trend. A one-
level protocol can no longer be used to produce magic
states with p
out
10
8
and a more expensive two-level
protocol needs to be used. For p
out
= 2.5 × 10
11
, a
(15-to-1) × (20-to-4) protocol needs to be replaced by
a (15-to-1) × (15-to-1) protocol, increasing the space-
time cost by 37.5%. Again, we find protocols which are
only mildly affected by the increased error rate, such
as the (15-to-1)
6
11,5,5
× (15-to-1)
25,11,11
protocol, which
previously produced magic states with an output error
of p
out
= 2.7 × 10
12
, but now produces magic states
with an output error of p
out
= 6.4×10
12
with a space-
time cost increase of 3.5% due to the increased failure
probability. The lowest achievable output error rate of
(15-to-1)×(15-to-1) protocols increases to p
out
10
14
,
making the production of magic states with error rates
close to this limit very costly. For such states and states
with a lower error rate, a three-level protocol might need
to be used.
Accepted in Quantum 2019-10-30, click title to verify 22