Fermat's Library | q-2019-12-02-205 annotated/explained version.

Magic State Distillation: Not as Costly as You Think

Daniel Litinski @ Dahlem Center for Complex Quantum Systems, Freie Universit

at Berlin, Arnimallee 14, 14195 Berlin, Germany

Despite signiﬁcant overhead reductions since

its ﬁrst proposal, magic state distillation is often

considered to be a very costly procedure that

dominates the resource cost of fault-tolerant

quantum computers. The goal of this work is

to demonstrate that this is not true. By writ-

ing distillation circuits in a form that separates

qubits that are capable of error detection from

those that are not, most logical qubits used for

distillation can be encoded at a very low code

distance. This signiﬁcantly reduces the space-

time cost of distillation, as well as the number

of qubits. In extreme cases, it can cost less to

distill a magic state than to perform a logical

Cliﬀord gate on full-distance logical qubits.

Quantum error correction is expected to be an es-

sential part of a large-scale quantum computer [1–3].

The most promising quantum error-correcting codes

are two-dimensional topological codes such as surface

codes [4, 5]. With these codes, the preparation of log-

ical Pauli eigenstates and the measurement of logical

Pauli product operators (e.g., via lattice surgery [6–

8]) are fault-tolerant operations, which is suﬃcient for

fault-tolerant logical Cliﬀord gates. However, logical

non-Cliﬀord gates, such as T gates, cannot be executed

directly. Instead, they can be performed by preparing

a resource state that is consumed to execute a non-

Cliﬀord gate [9]. For T gates, this resource state is a

magic state |mi = (|0i + e

iπ/4

|1i)/

√

2. Such a state

can be used to perform a π/8 rotation P

π/8

= e

−iP π/8

where P is an n-qubit Pauli product operator from the

set {X, Y, Z, }

⊗n

. Here, X, Y and Z are Pauli op-

erators, such that Z

π/8

corresponds to a T gate. If a

magic state is available, a logical n-qubit P

π/8

gate is

performed by measuring the logical Pauli product P ⊗Z

acting on the n qubits and the magic state, see Fig. 1.

The problem with magic states is that, with surface

codes, only faulty magic states can be prepared. They

are faulty in the sense that they are initialized with

an error probability proportional to the physical error

rate p

phys

, regardless of the code distance. If these

states are used to perform logical P

π/8

rotations, one

out of every ∼1/p

phys

logical gates is expected to be

faulty. Since faulty gates spoil the outcome of a com-

putation, but classically intractable quantum compu-

tations with a useful computational result typically in-

volve more than 10

T gates [10], low-error magic states

are required to execute gates with a low error probabil-

ity. One possibility to generate low-error magic states is

via a magic state distillation protocol. These protocols

are short error-detecting quantum computations that

use multiple high-error magic states to generate fewer

low-error states. Many such protocols [8, 11–21] have

been developed since magic state distillation was ﬁrst

proposed [9], gradually decreasing the cost of distilla-

tion. Even though state-of-the-art protocols are orders

of magnitude more eﬃcient than the earliest propos-

als, magic state distillation is still often described as

a costly procedure and the leading contributing factor

to the overhead of fault-tolerant quantum computing,

which is the primary motivation for research into alter-

natives to magic state distillation [22–27].

In this work, we reduce the cost of distillation by an-

other order of magnitude. On the level of circuits, none

of the distillation protocols discussed in this work are

new. Rather, the circuits are written in a way that the

number of qubits is low and the circuit depth is high.

The overhead reduction is achieved by ﬁnding surface-

code implementations of these protocols in which the

code distance of each surface-code patch is never higher

than required to achieve a speciﬁc output error prob-

ability, as was previously proposed in Ref. [21]. This

yields protocols that not only have a low space-time

cost, but also a small qubit footprint.

The cost of distillation. How does one quantify

the cost of a distillation protocol? The space-time cost

is often quantiﬁed in units of d

, where d is the code

distance. However, this can be confusing, if diﬀerent

code distances are used in diﬀerent parts of the quantum

computer. Consider an n

-qubit quantum computation

with n

T gates and a quantum computer consisting of

a block of qubits used to distill magic states and a block

of qubits used to store the n

data qubits and consume

Figure 1: A P

π/8

rotation on n qubits can be performed by

measuring the Pauli product operator P ⊗ Z acting on the n

qubits and a magic state |mi = (|0i + e

iπ/4

|1i)/

√

2. The

magic state is discarded via an X measurement. Measurement

outcomes of −1 of the P ⊗ Z or X measurement prompt a

π/4

or P

π/2

correction, respectively.

Accepted in Quantum 2019-10-30, click title to verify 1

arXiv:1905.06903v3 [quant-ph] 6 Nov 2019

Protocol p

phys

out

Qubits Cycles

Space-time cost per output state

Qubitcycles Full distance

(15-to-1)

7,3,3

−4

4.4 × 10

−8

810 18.1 14,600 5.49d

/d = 11 3.33d

/d = 13

(15-to-1)

9,3,3

−4

9.3 × 10

−10

1,150 18.1 20,700 4.71d

/d = 13 3.07d

/d = 15

(15-to-1)

11,5,5

−4

1.9 × 10

−11

2,070 30.0 62,000 9.19d

/d = 15 6.31d

/d = 17

(15-to-1)

9,3,3

× (20-to-4)

15,7,9

−4

2.4 × 10

−15

16,400 90.3 371,000 27.0d

/d = 19 20.0d

/d = 21

(15-to-1)

9,3,3

× (15-to-1)

25,9,9

−4

6.3 × 10

−25

18,600 67.8 1,260,000 25.9d

/d = 29 21.2d

/d = 31

(15-to-1)

17,7,7

−3

4.5 × 10

−8

4,620 42.6 197,000 6.30d

/d = 25 4.04d

/d = 29

(15-to-1)

13,5,5

× (20-to-4)

23,11,13

−3

1.4 × 10

−10

43,300 130 1,410,000 28.9d

/d = 29 19.6d

/d = 33

(15-to-1)

13,5,5

× (20-to-4)

27,13,15

−3

2.6 × 10

−11

46,800 157 1,840,000 30.9d

/d = 31 21.5d

/d = 35

(15-to-1)

11,5,5

× (15-to-1)

25,11,11

−3

2.7 × 10

−12

30,700 82.5 2,540,000 35.3d

/d = 33 25.0d

/d = 37

(15-to-1)

13,5,5

× (15-to-1)

29,11,13

−3

3.3 × 10

−14

39,100 97.5 3,810,000 37.6d

/d = 37 27.7d

/d = 41

(15-to-1)

17,7,7

× (15-to-1)

41,17,17

−3

4.5 × 10

−20

73,400 128 9,370,000 39.8d

/d = 49 31.5d

/d = 53

Small-footprint and synthillation protocols

(15-to-1)

9,3,3

−4

1.5 × 10

−9

762 36.2 27,600 6.27d

/d = 13 4.08d

/d = 15

(15-to-1)

9,5,5

× (15-to-1)

21,9,11

−3

6.1 × 10

−10

7,780 469 3,650,000 74.7d

/d = 29 50.7d

/d = 33

(15-to-1)

7,3,3

× (8-to-CCZ)

15,7,9

−4

7.2 × 10

−14

12,400 36.1 447,000 32.6d

/d = 19 24.1d

/d = 21

(15-to-1)

13,7,7

× (8-to-CCZ)

25,15,15

−3

5.2 × 10

−11

47,000 60.0 2,820,000 47.4d

/d = 31 32.9d

/d = 35

Historical numbers

(15-to-1) in Ref. [28] 10

−4

3.5 × 10

−11

3,720 143 532,000 121d

(d = 13)

(15-to-1) × (8-to-2) in Ref. [21] 10

−3

2.7 × 10

−11

148,000 202 14,900,000 251d

(d = 31)

(15-to-1) × (8-to-CCZ) in Ref. [21] 10

−3

5.3 × 10

−11

134,000 171 22,800,000 473d

(d = 31)

(15-to-1) × (15-to-1) in Ref. [8] 10

−3

1.0 × 10

−14

177,000 202 35,700,000 599d

(d = 31)

(15-to-1) × (15-to-1) in Ref. [5] 10

−3

3.0 × 10

−15

800,000 250 200,000,000 2544d

(d = 34)

Table 1: Comparison of diﬀerent distillation protocols with respect to the following characteristics: physical error rate p

phys

, output

error probability per output state p

out

, space cost in qubits, time cost in surface-code cycles, and space-time cost in qubitcycles.

The last two columns report the space-time cost in (physical data qubits) × (code cycles) measured in units of the full distance

, where d is the distance required for the data qubits of a 100-qubit (left column) or 10,000-qubit (right column) computation

with at most 1/p

out

T gates. The subscripts and superscripts in the protocol description indicate the code distances and number

of level-1 distillation blocks used in the protocol, as explained in Secs. 3-6.

magic states [28]. The distance required for the storage

of the data qubits depends on n

and n

, as it needs to

be high enough to guarantee that the probability of an

error on any of the n

qubits during the entire n

-T gate

computation is suﬃciently low. In other words, this

distance is governed by the space-time volume of the

computation ∼n

·n

, as weight-d/2 error strings in this

space-time volume can potentially corrupt the output of

the computation. We will refer to this distance as the

full distance. The code distances used for distillation, on

the other hand, are completely irrelevant. The protocol

merely needs to produce magic states with an output

error probability that is lower than 1/n

, for which it

uses a certain number of qubits for a certain number of

code cycles. Since the full distance depends on n

, but

the space-time cost of a distillation protocol does not,

it is more meaningful to quantify the space-time cost in

terms of qubitcycles, i.e., qubits · cycles.

Results. Table 1 shows the space-time costs of the

protocols that are constructed in the following sections.

These protocols generate states with diﬀerent output

error probabilities p

out

, assuming physical circuit-level

error rates p

phys

of 10

−3

and 10

−4

. The more T gates

need to be executed, the lower p

out

needs to be. Each

protocol is characterized by the space cost in terms of

physical qubits (including ancilla qubits) and the time

cost in terms of code cycles, where a code cycle cor-

responds to measuring all surface-code check operators

exactly once. These numbers can be multiplied to ob-

tain the space-time cost in qubitcycles. This is a mean-

ingful ﬁgure of merit that should be minimized. It is

more meaningful than only the space cost or only the

time cost, since distillation protocols can be straight-

forwardly parallelized, using twice as many qubits to

distill states twice as fast.

Even though this is not necessarily a meaningful

quantity, we report the space-time cost in terms of the

full distance d for two diﬀerent choices of d in the last

two columns of Tab. 1. While the smallest classically

intractable quantum computations require ∼100 qubits,

Accepted in Quantum 2019-10-30, click title to verify 2

Figure 2: A sequence of 16 π/8 rotations on 5 qubits that is non-trivially equivalent to the identity.

more complicated quantum algorithms use thousands of

qubits, such as factoring 2048-bit numbers using Shor’s

algorithm. The lower and higher values of d are cho-

sen such that they are suﬃcient for a 100-qubit and

10,000-qubit computation with at most 1/p

out

T gates,

respectively. The reported costs in terms of the full

distance are in terms of (physical data qubits)×(code

cycles), i.e., they do not consider physical measurement

ancillas and are therefore smaller by a factor of 2. This

is done to more easily compare the numbers to the cost

of storing a d × d surface-code patch for d code cycles,

which is 1d

in terms of (physical data qubits)×(code

cycles).

How to interpret the cost. Table 1 shows pro-

tocols that generate one magic state, 4 magic states,

or one |CCZi state that can be used to execute a Tof-

foli gate. For protocols that generate multiple magic

states, the space-time cost and output error are per

magic state. Our protocols feature order-of-magnitude

overhead reductions compared to the previous state of

the art for all parameter regimes. One example is the

(15-to-1)

9,3,3

protocol, where the subscripts label the

code distances used in the protocol, as explained in

Sec. 3. For p

phys

= 10

−4

, it generates magic states with

out

= 9.3 × 10

−10

, suﬃciently low for classically in-

tractable 100-qubit computations with 10

T gates. In

a quantum computer that can execute one T gate every

d code cycles, 231 logical qubits at d = 13 would be used

to store the 100 qubits with a low error rate [28], taking

into account the routing overhead. A space-time cost of

4.71d

for distillation implies that a footprint equivalent

to 4.71 full-distance qubits would be able to distill one

magic state every d code cycles. In this example, ≈2%

of the approximately 80,000 physical qubits are used for

distillation. The numbers become even more extreme

for the example of a 10,000-qubit computation with 10

T gates. Here, the 10,000 data qubits are stored using

∼20,000 logical qubits with d = 15, which means that

the space-time cost of distillation is 3.07d

per magic

state. For a quantum computation on more qubits or

with a lower overall error probability, distance-17 data

qubits might be required, reducing the cost to 2.11d

. In

this example, the cost to distill a magic state would be

lower than the space-time cost of a full-distance logical

CNOT gate, which is 3d

per qubit [6], demonstrating

that the cost of magic state distillation is not very high,

and that space-time costs that are quantiﬁed in units

of d

are of limited usefulness. These numbers are ad-

mittedly a bit contrived, but even in the more realistic

case of a 100-qubit computation with p

phys

= 10

−3

and

out

≈ 10

−10

, only ≈10% of all physical qubits are used

for distillation.

The main message is that magic state distillation is

not the dominant cost in a surface-code-based quantum

computer. Rather, the large overhead of surface codes

is due to their low encoding rate, which implies that a

large number of qubits is required to simply store all

data qubits of the computation.

Overview. In the following sections, we discuss how

the protocols in Tab. 1 are constructed. We start in

Sec. 1 by reviewing how distillation circuits work and

how their performance is quantiﬁed. Distillation pro-

tocols require faulty T gates on the level of logical

qubits, which are usually performed via state injection

and measurement. In Sec. 2, we introduce additional

protocols for faulty logical T gates based on shrinking

patches and faulty T measurements, which avoid Clif-

ford corrections and use fewer qubits and cycles. Next,

in Sec. 3, we go through the construction of the low-cost

15-to-1 protocol. In Sec. 4, we construct two-level pro-

tocols, where 15-to-1-distillation output states are fed

into a second level of 15-to-1 or 20-to-4. In Sec. 5, we

discuss synthillation protocols, i.e., the distillation of re-

source states that perform entire layers of π/8 rotations.

Speciﬁcally, we show the example of |CCZi state dis-

tillation, which can replace four T -gate magic states for

the execution of a controlled-controlled-Z gate. For the

protocols in Tab. 1, the distillation costs of CCZ states

are lower than the cost of four T -gate magic states with

a similar p

out

, indicating that synthillation can lower

Accepted in Quantum 2019-10-30, click title to verify 3

5 6 7 8 9 10 11 12 13 14 151-4

Figure 3: 15-to-1 distillation circuit.

the cost compared to the distillation of T -gate magic

states. Finally, in Sec. 6, we discuss how protocols with

a higher space-time cost, but smaller qubit footprint can

be constructed. The examples shown in Tab. 1 reduce

the error rate from 10

−3

or 10

−4

to ∼10

−9

, but use only

as few as 762 or 7,780 physical qubits.

1 Distillation circuits

Magic state distillation protocols can be understood in

terms of quantum error-correcting codes with transver-

sal T gates [9, 11], but it is conceptually simpler to

explain them in terms of circuits [17]. When writing

quantum circuits as sequences of Pauli product rota-

tions P

= e

−iP ϕ

, speciﬁcally π/8 rotations P

π/8

, cer-

tain sequences are equivalent to the identity. While

some of these sequences are trivial, e.g., P

π/8

followed

by P

−π/8

, there also exist non-trivial sequences. One

such sequence of 16 rotations on 5 qubits is shown in

Fig. 2. In general, such sequences are described by tri-

orthogonal matrices [11, 17]. The equivalent concept

of phase-polynomial identities is used in the context of

circuit optimization [29].

If we multiply the circuit in Fig. 2 by a single-qubit

rotation Z

−π/8

on the ﬁrst qubit, the ﬁrst rotation will

be cancelled and the remaining circuit will consist of

15 rotations, as in Fig. 3. Since the 16-rotation cir-

cuit is equivalent to the identity, the 15-rotation cir-

cuit is equivalent to a single Z

−π/8

rotation on the ﬁrst

qubit. In other words, if the initial state is |+i

⊗5

, where

|+i = (|0i+ |1i)/

√

2, then the circuit prepares the state

|emi⊗|+i

⊗4

. Here, |emi = (|0i+e

−iπ/4

|1i)/

√

2 is a state

that can be used to perform π/8 rotations in the same

way as |mi, but the outcome of the P ⊗Z measurement

in Fig. 1 needs to be interpreted diﬀerently, i.e., this

state is a magic state.

Because all rotations in Fig. 3 act non-trivially on

qubits 2-5, these qubits can be used to detect errors.

If the circuit is executed without errors, qubits 2-5 are

initialized in the |+i state and returned to the |+i state,

i.e., have an outcome of +1 upon X measurement. Er-

rors are detected, if any of these measurement outcomes

are −1, in which case the protocol fails and the state is

discarded.

The 15-to-1 protocol [9] is sometimes characterized

as having an output error probability of 35p

. This

assumes that every P

π/8

rotation generates a Pauli er-

ror P = P

π/2

with a probability of p. Since these are

Z-type Pauli errors, they will ﬂip all X measurement

outcomes of the qubits that they act on. Therefore,

any one faulty P

π/8

gate can be detected. Furthermore,

there is no combination of two faulty gates that can

go undetected. However, some combinations of three

faulty gates, e.g., rotations 5, 11 and 14, will cause a

Z Pauli error on the output state, but will not trigger

any ﬂipped X measurement outcomes. Since there are

35 such combinations, the probability to generate an

undetected error is 35p

to leading order.

To compute the subleading corrections to the output

error, this process can be simulated numerically. Start-

ing with the initial state ρ

init

= |+ih+|

⊗5

, each of the

15 rotations is applied by mapping

ρ → (1 − p) · P

π/8

ρP

†

π/8

+ p · P

5π/8

ρP

†

5π/8

. (1)

The output state is determined by projecting into the

subspace with the correct measurement outcomes using

the projectors Π

= ( + X)/2, i.e.,

out

1 − p

fail

( ⊗ Π

⊗4

)ρ( ⊗ Π

⊗4

) , (2)

where

fail

= 1 − tr



( ⊗ Π

⊗4

)ρ



(3)

is the failure probability of the protocol. The output

error probability is computed by comparing the ideal

output state ρ

ideal

= |emihem| ⊗ |+ih+|

⊗4

to the actual

output state ρ

out

. This is done by computing the inﬁ-

Accepted in Quantum 2019-10-30, click title to verify 4

4 5 6

9 11 12 13 14 15 16 17 18 19 20

1-3

Figure 4: 20-to-4 distillation circuit.

delity

out

= 1 − F (ρ

ideal

, ρ

out

)

= 1 − tr



√

out

ideal

√

out



= 1 − tr (ρ

ideal

out

) ,

(4)

where the last equality holds, because ρ

ideal

is a pure

state. The inﬁdelity corresponds to the probability that

a faulty magic state that is used to perform a gate

in a quantum circuit will lead to an error of this cir-

cuit’s outcome [30]. Notably, in the examples that we

consider, the trace distance tr

(ρ

ideal

− ρ

out

)

/2 yields

identical or at least similar results. For the example

of p = 10

−4

, the approximate output error probabil-

ity is 35p

= 3.5 × 10

−11

, whereas the exact result is

out

= 3.501 × 10

−11

Random Pauli errors. If faulty P

π/8

rotations are

performed by preparing faulty magic states and using

the circuit in Fig. 1, then the output error depends

on the error model for the preparation of faulty magic

states. In particular, if the faulty magic state is aﬀected

by a random Pauli error with probability p, i.e., by an

X, Y or Z error with probabilities p/3, respectively,

then this translates into a probability of p/3 of per-

forming either a P

−π/8

, P

3π/8

or P

5π/8

rotation instead

of a P

π/8

rotation. In other words, after each rotation,

the state is mapped to

ρ → (1 − p) · P

π/8

ρP

†

π/8

· P

−π/8

ρP

†

−π/8

· P

3π/8

ρP

†

3π/8

· P

5π/8

ρP

†

5π/8

(5)

so there is a p/3 probability of either a P

−π/4

, P

π/4

π/2

error. These ﬁrst two errors are more forgiving

than a proper P

π/2

Pauli error, since they eﬀectively

only lead to a Pauli error with 50% probability. As a

consequence, we expect each of the 35 combinations of

three faulty rotations to contribute to the output error

with (8/27)p

instead of 1p

: Out of the 27 combina-

tions in {P

−π/4

, P

π/4

, P

π/2

}

, there is one combination

with three P

π/2

’s, which leads to an undetected error.

There are 6 combinations with two P

π/2

’s leading to an

error with a 50% probability, 20 combinations with one

π/2

leading to an error with a 25% probability, and 8

combinations with no P

π/2

’s, leading to an error with a

12.5% probability. Therefore, the output error should

be p

out

= 35 ·

≈ 10.3704p

to leading order. In-

deed, a numerical treatment of the full density matrix

for p = 10

−4

yields p

out

= 1.03724 × 10

−11

Coherent errors. The previous two error models

randomly applied Pauli errors with a certain probabil-

ity. One might object that, for physical qubits, this is

not necessarily a realistic error model. A more realis-

tic error model would take coherent errors into account,

such as systematic under- and over-rotation. Distilla-

tion circuits can also detect these errors, but their per-

formance is indeed worse than for incoherent errors. For

example, consider the map

ρ → P

π/8+ϕ

ρP

†

π/8+ϕ

, (6)

which systematically over-rotates each gate by an ex-

cess angle ϕ. A gate that over-rotates by an an-

gle ϕ = arcsin(1/100) has the same gate ﬁdelity as

a gate that applies a Z error with a probability of

−4

. However, the inﬁdelity of the output magic state

out

= 1.22 × 10

−9

is higher by almost two orders of

magnitude compared to the incoherent case. In our

resource analysis, we will be working with incoherent

circuit-level Pauli noise, applying errors according to

Eq. (5), but with three diﬀerent probabilities for the

three diﬀerent errors. Still, we comment on how coher-

ent errors might aﬀect the output error in Sec. 4.

20-to-4 distillation. Distillation protocols can out-

put more than one magic state. If the 16-rotation circuit

in Fig. 2 is multiplied by two Z

−π/8

rotations, one on the

ﬁrst and one on the second qubit, a 14-rotation circuit is

Accepted in Quantum 2019-10-30, click title to verify 5

obtained that outputs a |emi⊗|emi⊗|+i

⊗3

state, i.e., two

magic states. Similarly, using a 24-rotation circuit that

non-trivially corresponds to the identity, a 20-rotation

circuit that outputs a |emi

⊗4

⊗ |+i

⊗3

state can be ob-

tained. This is the 20-to-4 protocol [11] shown in Fig. 4.

With a Z-Pauli error model, there are 22 pairs of rota-

tions that can lead to an output error. Therefore, the

probability of an output error is 22p

to leading order.

However, since four states are produced, one should in-

terpret this as p

out

= 5.5p

per magic state. In other

words, the probability that the resource state |emi

⊗4

will

cause an error in a circuit is 22p

, but, since this re-

source state executes four π/8 rotations, this translates

into a 5.5p

error probability per gate. In a numerical

simulation, the output error per state is determined via

the inﬁdelity between the projected output state and

the ideal output state |emihem|

⊗4

⊗|+ih+|

⊗3

divided by

four. For p = 10

−4

, this yields p

out

= 5.505 × 10

−8

per

output state.

2 Faulty logical T gates

We use the notation of Ref. [28] to draw arrangements of

logical surface-code qubits, where patches with dashed

and solid edges represent d×d surface-code patches with

X and Z boundaries. Logical operations are performed

by measuring products of logical Pauli operators via

lattice surgery [6–8]. A naive layout for the 15-to-1

protocol is shown in Fig. 5a, where the ﬁve qubits of the

15-to-1 circuit are placed next to each other with their

Z boundaries facing up and down. A 5d × d ancillary

space above and below these ﬁve qubits can be used to

measure Pauli product operators between these qubits

to perform π/8 rotations.

The code distance determines the logical error rate of

the encoded qubits, which also depends on the under-

lying error model. Here, we consider circuit-level noise,

where each physical gate, state initialization and mea-

surement outcome is aﬀected by a Pauli error with prob-

ability p

phys

. Using a minimum-weight perfect match-

ing decoder for such a noise model, the logical error rate

per code cycle [8] can be approximated as

phys

, d) = 0.1(100p

phys

)

(d+1)/2

. (7)

Since a failure to decode X or Z syndromes correctly

leads to logical Z or X errors, respectively, we will as-

sume that logical X and Z errors each occur with a

probability of 0.5p

phys

, d) per code cycle.

Not all errors are equally harmful in the context of

distillation protocols. Consider X and Z errors that af-

fect one of the ﬁve qubits during the 15-to-1 protocol.

Z errors aﬀecting the ﬁrst qubit (i.e., the output qubit)

are always detrimental, since they cannot be detected

ancilla

(b)(a)

Figure 5: A naive arrangement (a) of logical qubits could con-

sist of ﬁve d × d patches initialized in the |+i state and two

additional 5d × d ancilla regions for Pauli product measure-

ments. The arrangement that we consider (b) consists of one

×d

patch, four d

×d

patches, and two ancilla regions

with a width d

for Pauli product measurements.

and contribute to the overall output error of the proto-

col. The eﬀect of X errors on any of the ﬁve qubits is to

turn all previous P

π/8

rotations that acted on this qubit

into P

−π/8

rotations. For instance, consider an X error

on the third qubit after rotation 7 in Fig. 3. This X er-

ror can be commuted to the beginning of the circuit and

absorbed into the initial |+i state. The commutation

turns rotations 2, 5 and 6 into −π/8 rotations, since X

and Z anti-commute. As errors on multiple rotations

can lead to undetected errors, X errors should also be

avoided.

Z errors on qubits 2-5, on the other hand, are less

damaging. They are detectable, as they have the same

eﬀect as Z errors that aﬀect rotations 1-4. Therefore,

it is not necessary to encode the logical Z operators of

qubits 2-5 with the same distance as their logical X op-

erators. Instead, we encode these qubits using rectan-

gular d

×d

patches with d

≤ d

. Their probability

of X errors is 0.5(d

) ·p

phys

, d

) per code cycle,

since the X distance is d

, but the number of possible

X error strings is lower by a factor of (d

) compared

to a square patch. Correspondingly, the probability of Z

errors is 0.5(d

)·p

phys

, d

), since the Z distance

is d

, but the number of Z error strings is higher by a

factor of (d

) compared to a square patch. We also

ﬁx the distance used in the ancillary region to d

. Fi-

nally, there is a third distance d

which determines the

number of code cycles used in lattice surgery. This af-

fects the error of the Pauli product measurements used

for logical gates, which can be detected by the distilla-

tion protocol.

In total, we end up with the arrangement shown in

Fig. 5b that is characterized by three code distances

, d

and d

, where d

and d

are spatial distances,

and d

is the temporal distance. Before we construct

a surface-code implementation of the 15-to-1 protocol,

we ﬁrst discuss two diﬀerent ways of performing faulty

logical π/8 rotations with surface codes: the traditional

method based on state injection, and a protocol based

Accepted in Quantum 2019-10-30, click title to verify 6

(a)

(b)

Figure 6: A faulty T gate performed via state injection (a) and

a faulty T measurement (b).

on faulty T measurements.

2.1 State injection

The standard method to perform faulty T gates with

topological codes is via state injection and measure-

ment. State injection is a protocol that prepares an

arbitrary logical state |ψ

i from a corresponding arbi-

trary physical state |ψi. Several such state-injection

protocols exist [5, 6, 31–34], but none of them are fault-

tolerant, i.e., the error probability of |ψ

i is always pro-

portional to p

phys

. The simplest protocol [6] starts with

a physical state |ψi, i.e., a 1 × 1 surface-code patch,

and then grows it into a d × 1 patch, and ﬁnally into

a d × d patch. This is not a very eﬃcient protocol,

since growing patches involves measuring stabilizers for

d code cycles. The qubit, therefore, spends many cycles

in a distance-1 state, which increases the error proba-

bility.

More sophisticated state-injection protocols use post-

selection [32, 34] to decrease the error. If the er-

ror rate of single-qubit operations is signiﬁcantly lower

than the error rate of two-qubit gates, the error due

to state injection can even be lower than p

phys

. In

circuit-level noise, a single number p

phys

characterizes

all gates. However, physical systems typically fea-

ture signiﬁcantly better single-qubit operations than

two-qubit gates. In state-of-the-art superconducting-

qubit [35] and ion-trap [36] architectures, for instance,

the ﬁdelities of single-qubit and two-qubit gates diﬀer

by up to almost two orders of magnitude. Since two-

qubit gates are typically the lowest-ﬁdelity operations,

and syndrome-readout circuits of surface codes mostly

consist of two-qubit gates, the characteristic error rate

phys

in circuit-level noise will be largely determined by

the error rate of two-qubit gates. If the two-qubit error

rate is p

phys

, but the single-qubit error rate is p

phys

/10,

state injection can produce magic states with an error

as low as

phys

[34] in just two code cycles. However,

there is a certain failure rate of the protocol due to post-

selection, which increases the length of the protocol.

While state injection can be used to prepare faulty

Figure 7: A faulty P

π/8

rotation corresponds to a P ⊗ Z mea-

surement involving a |+i ancilla, followed by a faulty T mea-

surement of the ancilla.

magic states, it cannot be used to directly execute P

π/8

rotations. Instead, state injection is used indirectly by

preparing a faulty magic state and measuring P ⊗ Z

via lattice surgery, as shown in Fig. 1. With a 50%

probability, a P

π/4

correction is required. Performing

this correction operation either requires extra time or

extra space. In any case, this Cliﬀord correction has an

eﬀect on the distillation protocol and, therefore, needs

to be performed, increasing the space-time cost of the

protocol. For this reason, we will avoid state injection,

and instead construct a protocol that executes faulty

π/8

rotations without the need for Cliﬀord corrections.

2.2 Faulty T measurements

When a T gate is performed on a qubit |ψi via state in-

jection, a faulty magic state is prepared, entangled with

|ψi, and measured, as shown in Fig. 6a. The faulty

preparation can be treated as a T gate applied on a

|+i state followed by a random X, Y or Z Pauli error.

The idea of faulty T measurements is to avoid the Clif-

ford correction by reversing the order of the entangling

operation and the faulty T gate, as shown in Fig. 6b.

Here, |ψi is ﬁrst entangled with a |0i qubit. Next, a

sequence of a random Pauli error, a T gate and an X

measurement is performed, which we refer to as a faulty

T measurement. Now, the correction operation in re-

sponse to the X measurement is no longer a Cliﬀord

gate, but a Pauli Z operation, which requires no addi-

tional hardware operations. X, Y and Z errors lead to

†

, S and Z errors on |ψi, respectively. Thus, a P

π/8

rotation can be performed by measuring P ⊗Z involving

an ancilla qubit initialized in the |+i state, followed by

a faulty T measurement of the ancilla qubit, as shown

in Fig. 7.

With surface codes, protocols for faulty T measure-

ments are exactly identical to protocols for state injec-

tion, except that the order of operations is reversed.

Here, we describe a simpliﬁed protocol to demonstrate

the working principle of faulty T measurements. Sim-

ilarly to the case of state injection, one can construct

signiﬁcantly more sophisticated protocols, as we discuss

in Appendix A.

One particularly simple state-injection protocol is

Accepted in Quantum 2019-10-30, click title to verify 7

Figure 8: Surface-code implementation of a faulty T measure-

ment. Bright and dark faces correspond to Z-type and X-type

stabilizers, respectively.

performed by growing a physical qubit (a 1 × 1 patch)

into a d × 1 patch and then into a d × d patch [6].

The corresponding faulty-T -measurement protocol can

be performed by shrinking patches. Suppose that a log-

ical qubit |ψ

i = α |0

i + β |1

i is encoded in a d × d

patch as in Fig. 8, with logical Z operators as strings

from left to right. This d × d patch can be shrunk to

a d × 1 patch by measuring all green qubits in the X

basis. The remaining d × 1 patch encodes the qubit in

a d-qubit XX repetition code

|ψ

i =

√

(|+i

⊗d

+|−i

⊗d

√

(|+i

⊗d

−|−i

⊗d

) , (8)

where the logical Z operator is Z

⊗d

, and the logical X

operator corresponds to the X operator on any of the

d qubits. Next, the d × 1 patch is shrunk to a 1 × 1

patch by measuring all red qubits in the Z basis. In

fact, the red and green measurements can be performed

simultaneously. The product of all Z measurement out-

comes (and also preceding stabilizer measurements) de-

termines an X Pauli correction on the remaining qubit,

which now stores the logical information in its physical

Pauli operators. Finally, a physical T gate is applied

to the remaining qubit, before it is measured in the X

basis.

Much like state injection, faulty T measurements are

not fault-tolerant protocols, in the sense that their er-

ror rate is proportional to the physical error rate and

does not decrease with the code distance. For a Pauli

error model, these error rates can be understood as the

probabilities of the Pauli-error operations in the dashed

boxes in Fig. 6. For simplicity, we will assume that

faulty T measurements have a Pauli error rate of p

phys

meaning that, eﬀectively, the blue qubit is aﬀected by

an X, Y or Z error with a probability of p

phys

/3 for

each Pauli. When used to execute a P

π/8

rotation, this

implies that this gate will have a P

−π/4

, P

π/4

or P

π/2

error with a probability of p

phys

/3 for each error. This

assumption is actually very inaccurate for the protocol

(1)

(2) (3)

Step 2: d

code cycles

Step 1: 0 code cycles

Step 3: d

code cycles

Figure 9: Example of a faulty T measurement to perform a

⊗ Z

)

π/8

rotation.

shown in Fig. 8, since, for this protocol, the error scales

with the code distance d, as any single-qubit X or Y er-

ror on the red qubits translates into a logical error in the

Accepted in Quantum 2019-10-30, click title to verify 8

Figure 10: Shrinking the |+i patch in the top right corner to a d

×1 patch produces a stabilizer conﬁguration that is topologically

equivalent to the one shown in Fig. 9. This does not reduce the code distance compared to Fig. 9,

faulty T measurement. This can be avoided by changing

the measurement pattern, similar to the state-injection

protocols of Refs. [31, 33]. Moreover, the error rate can

be signiﬁcantly suppressed by adding “pre-selection” to

the faulty-T -measurement protocol, the analogous op-

eration to post-selection in the state-injection protocol

of Ref. [32], wherein a faulty T measurement is deferred

until the stabilizer checks surrounding the sensitive blue

qubit report no syndromes for two code cycles. Because

such a faulty-T -measurement protocol is identical to the

state-injection protocol of Ref. [32], apart from the re-

versed order of operations, its error rate can be lower

than p

phys

, if single-qubit operations have a signiﬁcantly

higher ﬁdelity than two-qubit operations.

In any case, the speciﬁc choice of faulty-T -

measurement protocol (or state-injection protocol) will

not matter for the distillation protocols that we con-

struct in the following sections, which is why the more

sophisticated T -measurement protocols are discussed in

Appendix A. Moreover, we show in Appendix A that,

for many relevant distillation protocols, a higher error

rate for faulty T measurements has only a small eﬀect

on the performance of the distillation protocols, even if

the error rate of faulty T measurements is assumed to

be higher by an order of magnitude, 10p

phys

instead of

phys

. In our following resource estimates, we will use

the simpliﬁed assumption of an error rate of p

phys

for

faulty T measurements.

We now show how to use faulty T measurements to

perform P

π/8

rotations in a surface-code arrangement

similar to Fig. 5b. Suppose that we want to execute

a P

π/8

= (Z

⊗ Z

)

π/8

rotation on three qubits,

i.e., rotation 9 in Fig. 3. In the ﬁrst step in Fig. 9,

we start with the ﬁve qubits of the 15-to-1 protocol,

+ 4d

) × d

unused qubits in the ancillary region

and an additional d

×d

unused qubits in the top right

corner. In this example, d

= 5, d

= 3 and d

3. Next, following the circuit in Fig. 7, we initialize

a d

× d

patch in the |+i state and use multi-patch

lattice surgery [8, 28] to measure Z

⊗Z

, where

is the Z operator of qubit i and Z

is the Z operator

of the ancilla in the top right corner. The multi-patch

lattice-surgery operation [8] is performed by initializing

the physical qubits in the ancilla region in the |+i state

and measuring the new stabilizers for d

code cycles,

after which the qubits in the ancilla region are measured

in the X basis. The outcome of the Z

⊗Z

measurement is encoded in the measurement outcomes

of the Z stabilizers in the ancilla region. Finally, in the

last step, a faulty T measurement is performed on the

× d

patch.

Next, we explain how we perform the error esti-

mate for this protocol. For a protocol that performs

a P

π/8

rotation, we consider Z and X storage errors on

qubits 1-5 and P

π/2

, P

π/4

and P

−π/4

errors in the ap-

plication of the rotation. As explained previously, the

incorrect decoding of X and Z syndromes on qubits

1-5 leads to Z and X storage errors. These occur

with a probability of 0.5(d

) · p

phys

, d

) and

0.5(d

)·p

phys

, d

), respectively, where d

= d

for qubit 1, and d

= d

for qubits 2-5. The incorrect

decoding of the X syndrome in the ancilla region causes

Z error strings connecting the X boundaries in the an-

cilla region. Depending on the exact location of these

error strings, this can cause Z errors on one or multiple

qubits. For instance, the error string highlighted in red

and labeled ‘(2)’ in Fig. 9 is topologically equivalent to

error string ‘(1)’, and therefore causes a Z error on qubit

1. Error string ‘(3)’, on the other hand, is less harmful,

since it is equivalent to a Z

⊗Z

error aﬀecting qubits

1 and 4. Still, as a simpliﬁed pessimistic estimate, we

will assume that any such error directly contributes to

the Z error of the output qubit, i.e., qubit 1. This hap-

pens with a probability of 0.5(l/d

) ·p

phys

, d

) ·d

where l is the length of the ancilla patch. In our exam-

ple, l = d

+ 4d

Accepted in Quantum 2019-10-30, click title to verify 9

Step 1: d

cycles

Arrangement

Step 2: 2d

cycles Step 3: 3d

cycles

Step 7: 6d

cyclesStep 6: 6d

cyclesStep 5: 5d

cyclesStep 4: 4d

cycles

out

out out out

Figure 11: Surface-code implementation of the (15-to-1)

protocol using 2 · (d

+ 4d

) · 3d

+ 4d

physical qubits for

code cycles.

The incorrect decoding of the Z syndrome in the an-

cilla region causes X error strings. While there are no

Z boundaries in the ancilla region for the X errors to

connect to, in a space-time picture, these errors can

also condense at the temporal boundaries set by the ini-

tializations of the physical qubits in the |+i state and

their measurement in the X basis. Because the lattice-

surgery stabilizers are measured for d

code cycles,

the probability of this error is governed by d

. Since

0.5p

phys

, d

) · d

is the X error rate of a d

× d

patch stored for d

code cycles, but the lattice surgery

takes place over an area of l×d

, we estimate the prob-

ability of such an error as 0.5l·d

·p

phys

, d

)·d

Such an error leads to an incorrect interpretation of

the lattice-surgery measurement outcome. As shown in

Fig. 7, incorrectly interpreting the outcome of the P ⊗Z

measurement causes an X error on the |+i qubit, which

turns the Z

π/8

rotation into a Z

−π/8

rotation, causing

an on overall P

−π/4

error.

Z and X storage errors aﬀecting the |+i qubit in

the top right corner each occur with a probability of

0.5p

phys

, d

) · d

and cause P

π/2

and P

−π/4

errors,

respectively, as explained by inserting X and Z errors

on the |+i qubit in Fig. 7. Finally, the single-qubit X,

Y and Z errors with a probability of p

phys

/3 during the

faulty T measurement of Fig. 8 contribute to the P

−π/4

π/4

and P

π/2

error, respectively.

In order to save qubits, we can shrink the d

× d

patch in the top right corner to a d

×1 patch. As shown

in Fig. 10, this produces a stabilizer conﬁguration that

is topologically equivalent to the conﬁguration of Fig. 9

and maintains the code distance. In principle, the Z

storage error on the |+i qubit is now reduced, but we

will still use the error estimate explained in the previous

paragraph.

To summarize our error estimate, in addition to the

usual storage errors, a Z error aﬀects the output qubit

with a probability of 0.5(l/d

) · p

phys

, d

) · d

π/2

, P

−π/4

and P

π/4

errors each occur with a prob-

ability of p

phys

/3. Furthermore, a P

−π/4

error occurs

with a probability of 0.5l · d

· p

phys

, d

) · d

Moreover, additional P

−π/4

and P

π/2

errors occur with

a probability of 0.5p

phys

, d

) · d

. Since we are not

running an actual decoder to obtain these error prob-

abilities, our error estimate should not be interpreted

as a rigorous simulation, but as a pessimistic ballpark

estimate.

3 15-to-1 distillation

Our implementation of the 15-to-1 protocol of Fig. 3 is

shown in Fig. 11. It starts by initializing qubits 2-4 in

the |+i state. In the ﬁrst step, rotations 1-3 and 5 are

performed. The single-qubit rotations are performed

by initializing a d

× d

surface-code patch in the |+i

state, measuring Z ⊗ Z, and performing faulty T mea-

surement of the d

× d

patch. Multi-qubit rotations

are performed via fast faulty T measurements. In step

2, qubit 1 is initialized in the |+i state and rotations 6

and 7 are performed. In step 3, qubit 5 is initialized in

the |+i state and rotations 4, 8 and 9 are performed.

In steps 4-6, rotations 10-15 are performed. Finally,

in step 7, qubits 2-5 are measured in the X basis. If

all measurement outcomes are +1, qubit 1 is a distilled

Accepted in Quantum 2019-10-30, click title to verify 10

Figure 12: Quantum circuit for delayed-choice P

π/8

rotations.

The choice of measurement basis for the single-qubit |+i mea-

surement decides whether a P

π/8

rotation is performed, or no

operation at all.

magic state which can be used to execute a low-error

π/8

rotation. The distillation block can now be used

to distill the next magic state.

In order to prevent the output state from blocking

the space reserved for qubit 1 for d

code cycles, the

consumption of the output state for the execution of a

π/8

rotation can already be initiated in step 5, since

this process takes d

code cycles. In the protocol shown

in Fig. 11, steps 5-7 and step 1 of the subsequent distil-

lation can be used to measure qubit 1, i.e., a total of 3d

code cycles. If d

> 3d

, the output state will block

the space reserved for qubit 1 and slow down the proto-

col. This can be prevented by reordering the rotations.

In any case, we only consider cases with d

≤ 3d

One may be concerned that, when consuming the out-

put state in step 5, we still do not know if the distilla-

tion protocol will succeed, i.e., whether the output state

is faulty or not. This problem can be solved by using

the circuit in Fig. 12, where an additional ancilla qubit

initialized in the |+i state is used. To execute a P

π/8

rotation, the operator P ⊗ Z ⊗ Z between the qubits,

the magic state and the additional ancilla state is mea-

sured. Depending on whether the |+i state is measured

in the Z or X basis, the P

π/8

rotation is applied or not.

This can be used to consume a magic state before the

outcome of its distillation is known. If the distillation

protocol for this magic state fails, the |+i state can be

measured in the X basis to prevent the faulty magic

state from being used.

In total, this 15-to-1 protocol has a space cost of

2 ·(d

+ 4d

) ·3d

+ 4d

physical qubits, taking phys-

ical measurement ancillas into account. The time cost

is 6d

/(1 − p

fail

) code cycles, where p

fail

is the failure

probability of the protocol. As discussed in Sec. 1, p

fail

and the output error p

out

are determined numerically.

To this end, the 5-qubit density matrix is simulated,

taking into account errors from storage and faulty T

measurements. All P

π/8

rotations have a P

π/2

, P

π/4

and P

−π/4

error of p

phys

/3 and additional errors due

to the fast faulty T measurement protocol, as discussed

in Sec. 2, with the exception that, for rotations that do

not involve qubit 1, Z errors in the ancilla region do not

cause Z errors on qubit 1. For single-qubit rotations,

X errors on the d

× d

qubit cause P

−π/4

errors and

occur with a probability of 0.5d

phys

, d

), whereas

Z errors spread to the adjacent qubit with a probability

of 0.5(d

phys

, d

After each step (apart from step 7), d

code cycles

worth of X and Z storage errors are applied to all ﬁve

qubits. In addition, Z or X errors on the d

× d

an-

cilla patch used to consume the output state are added

as Z or X storage errors to the output state for d

code cycles. Finally, the output error probability p

out

is computed as the inﬁdelity 1 −F(ρ

ideal

, ρ

out

We refer to a 15-to-1 protocol characterized by the

code distances d

, d

and d

as a (15-to-1)

protocol. For p

phys

= 10

−4

, we ﬁnd that the protocol

(15-to-1)

7,3,3

has a p

out

= 4.4 × 10

−8

, where we round

the output error to two signiﬁcant digits. It has a space

cost of 810 qubits and a time cost of 18.1 code cycles.

These two numbers can be multiplied to obtain a space-

time cost of 14,600 qubitcycles to three signiﬁcant dig-

its. We also report the space-time cost in terms of the

full distance d

full

. Consider a 100-qubit quantum com-

putation with a T -gate count of n

, where n

< 1/p

out

Using the construction from Ref. [28], the entire com-

putation can be ﬁnished in n

· d

full

code cycles, if 231

distance-d

full

surface-code patches are used to store the

100 qubits. The probability that any of these qubits is

aﬀected by a storage error that can spoil the outcome

of the computation is

storage error

= 231 · n

· d

full

· p

phys

, d

full

) . (9)

The storage error can be higher, if the computation

does not exclusively consist of π/8 rotations, but also

involves Pauli product measurements, as this increases

the length of the computation. Using magic states with

out

for each T gate, there is a probability of n

· p

out

that a faulty T gate will spoil the outcome of the com-

putation. If we demand that storage errors leads to a

relative increase of the error probability by 1% in the

best case, then d = d

full

needs to satisfy

231 · n

· d · p

phys

, d) < 0.01 · n

· p

out

. (10)

Evidently, d does not depend on n

, but only on the

number of qubits in the computation. For a 10,000-

qubit computation, the qubits need to be stored using

20,284 distance-d qubits, and the condition changes to

20284 · d · p

phys

, d) < 0.01 · p

out

. (11)

We pick the smallest odd-integer d that satisﬁes the

condition. For p

out

= 4.4×10

−8

, d = 11 in the 100-qubit

Accepted in Quantum 2019-10-30, click title to verify 11

(a) Qubit arrangement for the (15-to-1) ×(15-to-1) protocol

(b) (Z

⊗ Z

)

π/8

and (Z

⊗ Z

)

π/8

rotation

L1 L1

L1L1

L1 L1

L1L1

L1 L1

L1L1

L1 L1

Figure 13: (a) Example of a qubit arrangement used for two-level (15-to-1)

×(15-to-1)

distillation with n

= 8

level-1 blocks. (b) Example of two rotations performed in this arrangement. (c) The rotations are performed using auto-corrected

π/8 rotations, which consume a magic state and apply the Cliﬀord correction by appropriately choosing the measurement basis of

the |0i ancilla qubit.

case and d = 13 in the 10,000-qubit case. The space-

time cost reported in the last two columns of Tab. 1

is in terms of (physical data qubits)×(code cycles), i.e.,

5.49d

or 3.33d

. These numbers are remarkably low, as

the cost to explicitly perform, e.g., a logical CNOT gate

on two distance-d qubits [6] is 3d

. Still, the only truly

meaningful number quantifying the space-time cost is

the cost in terms of qubitcycles.

The other protocols reported in Tab. 1 for p

phys

−4

are (15-to-1)

9,3,3

and (15-to-1)

11,5,5

, reducing the

error to p

out

= 9.3 × 10

−10

and p

out

= 1.9 × 10

−11

, re-

spectively. For a higher physical error rate of p

phys

−3

, (15-to-1)

17,7,7

has a p

out

= 4.5 × 10

−8

. Inter-

ested readers can verify these numbers using the Python

script or Mathematica notebook provided in the Supple-

mentary Material [37], where they can also try out other

parameters. The Supplementary Material contains the

resource cost calculations for all protocols considered in

this paper.

15-to-1 protocols cannot generate arbitrarily good

output states, as p

out

is limited by ∼10p

phys

. In order to

distill higher-ﬁdelity magic states, we turn to two-level

protocols.

4 Two-level protocols

The idea of two-level protocols is to use distilled magic

states (level-1 states) to perform a second round of dis-

tillation. We ﬁrst discuss (15-to-1) × (15-to-1) proto-

cols, where 15-to-1 output states are used for a second

level of 15-to-1 distillation. The qubit arrangement used

for this protocol is shown in Fig. 13a. It is described

by three additional code distances d

, d

and d

and the number of level-1 distillation blocks n

, where

is an even integer. The central region consists of

a (d

+ 4d

) × 3d

array of qubits where the sec-

ond level of distillation takes place. To the left and

right are the level-1 distillation blocks that feed level-

1 states into the upper and lower ancilla region of the

level-2 block, respectively. Each of these two level-1 re-

gion consists of n

/2 level-1 blocks. These blocks are

characterized by the level-1 distances d

, d

and d

such that each level-1 region produces one magic state

every 6d

/(1 − p

fail

)/(n

/2) code cycles.

In addition, each level-1 region has a 3d

×4d

ar-

ray of qubits that separates the level-1 blocks from the

level-2 block. As level-1 states are produced, they are

transported into this intermediate region. For this pur-

pose, each level-1 region has two spots that are reserved

for level-1 output states. While one of these spots is be-

ing ﬁlled with a newly generated level-1 state, the magic

state in the other spot can be consumed to execute a

π/8

rotation in the level-2 block, as shown in Fig. 13b.

These rotations are performed using the auto-corrected

π/8 rotation [28] shown in Fig. 13c. Here, the operator

P ⊗ Z between the level-2 qubits and a level-1 magic

state |mi is measured. Simultaneously, Z ⊗ Y between

|mi and an ancillary |0i state is measured. Depend-

ing on the outcome of the P ⊗ Z measurement, the |0i

qubit is either read out in the X or Z basis, essentially

performing a Cliﬀord correction or not.

As shown in Fig. 13b, this kind of measurement can

be performed in d

code cycles using a conﬁguration

similar to a faulty T measurement. While one magic

state in each level-1 region is used to execute a level-2

π/8 rotation, a second level-1 state is being transported

Accepted in Quantum 2019-10-30, click title to verify 12

Step 6: 6t

cycles Step 7: 7t

cycles

Step 8: 8t

cycles

Step 1: t

cycles Step 2: 2t

cycles Step 3: 3t

cycles

Step 4: 4t

cycles

Step 5: 5t

cycles

Figure 14: Surface-code implementation of the (15-to-1) × (15-to-1) protocol.

to be used for the subsequent level-2 rotation. In the

top and bottom ancilla region, one such level-2 rotation

can be performed every

= max(d

, 6d

/(1 − p

fail

)/(n

/2)) (12)

code cycles. If level-1 states are produced slowly, t

determined by the output rate of the level-1 factories.

If these states are produced fast, t

will be limited by

, the duration of the auto-corrected π/8 rotations.

The entire (15-to-1) ×(15-to-1) protocol is shown in

Fig 14, focusing on the level-2 region. In steps 1 and

2, the ﬁrst four rotations of the 15-to-1 circuit are per-

formed. Since these are single-qubit rotations, |0i ancil-

las are not required, as Cliﬀord corrections correspond

to Pauli corrections for these ﬁrst four rotations. In

steps 3-7, rotations 5-14 are performed. The consump-

tion of the output state is initiated in step 7. In step

8, qubit 4 is measured in the X basis and rotation 15

is performed in the bottom ancilla region. Since the

space reserved for qubit 4 is now empty, the top ancilla

region can be used to perform the ﬁrst rotation of the

subsequent distillation round, such that the next round

of distillation will only take 7t

instead of 8t

code

cycles. For this reason, the time cost of this distillation

block is 7.5t

code cycles. Again, the consumption of

the output state will slow down the protocol, if it takes

longer than 3d

code cycles, so the distances should

be chosen such that d

≤ 3d

Error analysis. The level-1 blocks output level-1

states with an output error of p

out

= p

. This error

contributes to the probability of a P

π/2

error, when this

state is used for a P

π/8

rotation. Furthermore, the level-

1 state accumulates additional errors as it is moved to

the level-2 block. As shown in Fig. 13b, it traverses a

region of width d

and a maximum length of n

/4 ·

+ 4d

) + 2d

for d

code cycles, before ending

up in a 2d

× 2d

region, where it stays for another

code cycles. Therefore, we deﬁne

move

= n

/4 · (d

+ 4d

) + 10d

, (13)

as the length of the ancilla region that increases the

storage error of the level-1 state. In this sense, the

X and Z error of the level-1 state are each increased

by 0.5l

move

· p

phys

, d

), contributing to the P

π/2

and P

−π/4

error, respectively. The error analysis of

the level-2 block is analogous to faulty T measure-

ments. For an ancilla of length l, where l can be up

to d

+ 4d

+ d

, X errors lead to a P

−π/4

error

with a probability of 0.5(l · d

) · p

phys

, d

Moreover, if qubit 1 is part of the rotation, it is aﬀected

by an additional Z storage error with a probability of

0.5(l · d

) · p

phys

, d

). Finally, the X and Z

Accepted in Quantum 2019-10-30, click title to verify 13

Step 1: t

cycles Step 2: 2t

cycles

Step 3: 3t

cycles

Step 4: 4t

cycles

Step 5: 5t

cycles

Step 7: 7t

cyclesStep 6: 6t

cycles

Step 8: 8t

cycles

Step 9: 9t

cycles

Step 10: 10t

cycles

Figure 15: Surface-code implementation of the (15-to-1) × (20-to-4) protocol.

error of the output state increases by d

·p

phys

, d

)

while it is being consumed.

Results. The space cost of the protocols has three

contributions: 2 · (d

+ 4d

) · 3d

qubits from the

level-2 block, 2n

((d

+ 4d

)(3d

+ d

/2) + 2d

)

qubits from the level-1 blocks, including the ancilla

that they are connected to, and 2 · (20d

+ 2d

)

qubits from the additional ancilla qubits. Note that,

in contrast to the 15-to-1 factory, the footprint of

the (15-to-1)×(15-to-1) factory is not necessarily rect-

angular. The time cost is 7.5t

. Applying errors

numerically according to our error analysis, we ob-

tain the output error. We label our protocols as

(15-to-1)

× (15-to-1)

. For p

phys

−4

, we ﬁnd that a (15-to-1)

9,3,3

× (15-to-1)

25,9,9

pro-

tocol produces output states with p

out

= 6.3×10

−25

for

1,260,000 qubitcycles. For p

phys

= 10

−3

, the protocol

(15-to-1)

11,5,5

× (15-to-1)

29,11,13

produces magic states

with p

out

= 3.3 ×10

−14

for 3,810,000 qubitcycles. More

protocols can be found in Tab. 1 or generated using the

Python script or Mathematica notebook provided in the

Supplementary Material [37].

20-to-4 dist illation. For output states with a de-

sired error rate that is lower than what is possible

with one level of 15-to-1, but higher than what can

be achieved with two levels of 15-to-1, it can be more

eﬃcient to use a level-2 protocol that is cheaper, but

features less error suppression. One such protocol is

the 20-to-4 protocol of Fig. 4. The implementation of a

(15-to-1)×(20-to-4) is shown in Fig. 15 and is very sim-

ilar to a (15-to-1)×(15-to-1) protocol. The main diﬀer-

ence is that the length of the level-2 region increases to

+3d

, as the 20-to-4 circuit acts on seven qubits,

four of which are output states.

For protocols that generate multiple output states,

it is particularly important to pick a suitable order in

which the rotations are performed, in order to avoid

congestion. If four output states are generated at the

end of the protocol, but the computation demands that

they are consumed one after the other, then they will

block the level-2 factory for many cycles. In step 1 of

our protocol, rotations 1 and 2 are performed. In step

2, qubit 1 is initialized in the |+i state and rotations 4

and 5 are performed. Simultaneously, the measurement

of qubit 1 as an output state can be initiated. In steps

3-6, rotations 3 and 6-12 are performed. In step 7, ro-

tations 13 and 14 are performed and the consumption

of output state 2 can be initiated. The snapshots in

Fig. 15 are drawn in a way that assumes that it takes

to consume a magic state, but this depends on

. In steps 8-9, rotations 15-18 are performed and

Accepted in Quantum 2019-10-30, click title to verify 14

6 7 81

(a) Non-trivial representation of the identity

(b) 8-to-CCZ synthillation circuit

Figure 16: (a) These 15 rotations are non-trivially equivalent to the identity. (b) 8-to-CCZ synthillation circuit obtained by

multiplying the circuit in (a) with a 7-rotation CCZ gate.

the consumption of qubit 3 can be initiated. In step 10,

the last two rotations are performed and the protocol

ﬁnishes after 10t

code cycles. The ﬁrst three steps of

the subsequent round of distillation can be used to ini-

tiate the measurement of output state 4 and ﬁnish the

measurements of all remaining output states. If output

states are consumed one after the other, e.g., to perform

π/8 rotations one after the other, this protocol allocates

up to 2.5t

code cycles for each output state, which is

suﬃcient for the parameters that we consider.

The error analysis of protocols labeled as

(15-to-1)

× (20-to-4)

is identical

to two-level 15-to-1 protocols, albeit with a dif-

ferent space and time cost, and the extra step of

dividing the space-time cost and output error by

four. Moreover, if multiple output qubits are part

of a rotation, the additional Z error due to Z-error

strings in the ancilla region with a probability of

0.5(l ·d

) ·p

phys

, d

) is applied to all output

qubits that are part of the rotation. For p

phys

= 10

−4

we ﬁnd that the (15-to-1)

9,3,3

× (20-to-4)

15,7,9

protocol

generates states with an error of p

out

= 2.4 × 10

−15

per output state and a space-time cost of 371,000

qubitcycles per magic state. For p

phys

= 10

−3

the protocols (15-to-1)

13,5,5

× (20-to-4)

21,11,13

and

(15-to-1)

13,5,5

× (20-to-4)

27,13,15

have output errors of

1.4 ×10

−10

and 2.6 ×10

−11

per state with a space-time

cost of 1,410,000 and 1,840,000 qubitcycles per output

state.

What about coherent errors? As discussed in

Sec. 1, the performance of distillation protocols can be

worse, if the underlying error model features coherent

errors. In a circuit-level study of surface codes, coher-

ent errors are diﬃcult to analyze. If the logical error

rate of surface codes can be maintained at p

phys

, d)

even in the presence of coherent errors, and the only

eﬀect of coherent errors is to under- or over-rotate the

physical T gate used in faulty T measurements, then

one can argue that the eﬀect of coherent errors is not

too signiﬁcant. These errors would then increase the

minimum achievable output error rate of the ﬁrst level

of distillation, albeit by an error that is governed by the

single-qubit error rate. While this can be a problem for

single-level distillation schemes, this is less detrimen-

tal for two-level distillation schemes, as only the ﬁrst

level is aﬀected. Since level-1 blocks typically output

states that have a ﬁdelity that is much lower than the

maximum achievable level-1 ﬁdelity, the overall output

ﬁdelity would be barely aﬀected. For instance, the level-

1 block of the (15-to-1)

13,5,5

×(20-to-4)

27,13,15

protocol

outputs states with p

out

≈ 10

−6

, whereas the lowest

possible error of the 15-to-1 protocol for p

phys

= 10

−3

is p

out

≈ 10

−8

. Still, a more careful treatment of coher-

ent errors is necessary, but is beyond the scope of this

work.

What about feed-forward? One possible limiting

factor in quantum computers is the feed-forward time,

i.e., the time it takes to react to measurement outcomes,

deciding which operation should be performed next. In

Accepted in Quantum 2019-10-30, click title to verify 15

our protocols, some qubits need to be measured in the

X or Z basis depending on previous measurement out-

comes, which is used to avoid Cliﬀord corrections. In

Fig. 13a, these are the qubits in the intermediate region

between the level-1 blocks and the level-2 block. If the

feed-forward time is a bottleneck in a given architecture,

these qubits need to be stored for some additional time

before being read out. A slowdown due to feed-forward

can be avoided by using additional ancilla qubits in this

ancilla region. In any case, long feed-forward times in-

crease the overall space-time cost.

The constructions discussed in the previous sections

can be used to implement any distillation protocol that

can be expressed as a sequence of Z-type π/8 rotations,

e.g., all protocols that are based on triorthogonal ma-

trices [11, 19]. A very similar class of protocols are

synthillation protocols, whose implementation we dis-

cuss in the following section.

5 Synthillation

Synthillation [17, 38] is a portmanteau of the words

(gate) synthesis and distillation. The idea of synthilla-

tion is to generate resource states that do not execute

single T gates or π/8 rotations, but entire layers of

commuting π/8 rotations. The simplest example is the

|CCZi resource state, which is prepared by applying

a controlled-controlled-Z (CCZ) gate to a |+i

⊗3

state.

This state can be used to perform a CCZ gate, which

can be written as a sequence of seven commuting π/8

rotations [28, 39]. However, it is also possible to exe-

cute CCZ gates using four T gates [40], or even only

two T gates, if these CCZ gates are part of a compute-

uncompute circuit [41].

Synthillation circuits can be obtained the same way

as ordinary distillation circuits. We start with a non-

trivial representation of the identity in Fig. 16a as a

sequence of 15 π/8 rotations on 4 qubits. Next, we

cancel the ﬁrst three and last four rotations by multi-

plying the entire circuit with the corresponding +π/8

and −π/8 rotations, i.e., the 7 rotations on the right-

hand side of Fig. 16b. These 7 rotations happen to

correspond to the decomposition of the CCZ gate into

7 π/8 rotations. In other words, the circuit on the left-

hand side of Fig. 16b prepares a |CCZi state on the

ﬁrst three qubits and acts trivially on the fourth qubit.

Therefore, the fourth qubit can be used to detect errors.

If any one of the 8 rotations fails, the X measurement

outcome will ﬂip. However, any pair of errors will go

undetected. Therefore, the output error to leading or-

der is 28p

under Z-Pauli noise.

Since the 8-to-CCZ circuit is a sequence of Z-type

π/8 rotations, it can be implemented the same way as

Step 1: t

cycles

Step 2: 2t

cycles

Step 3: 3t

cycles

Step 4: 4t

cycles

Figure 17: Surface-code implementation of the (15-to-1) ×

(8-to-CCZ) protocol.

a two-level distillation protocol, as shown in Fig. 17.

The level-2 block has a width of 3d

+ d

. Per-

forming two rotations every t

code cycles, the pro-

tocol ﬁnishes after 4t

code cycles. In our numerical

error analysis, we obtain the output error as the in-

ﬁdelity between the output state and the ideal state

ideal

= |CCZihCCZ| ⊗ |+ih+|. Labelling our pro-

tocols as (15-to-1)

×(8-to-CCZ)

, we

ﬁnd that, for p

phys

= 10

−3

, the protocol (15-to-1)

13,7,7

(8-to-CCZ)

25,15,15

generates |CCZi states with a ﬁ-

delity of p

out

= 5.2 × 10

−11

and a space-time cost

of 2,820,000 qubitcycles. If the execution of a CCZ

Accepted in Quantum 2019-10-30, click title to verify 16

gate using ordinary magic states requires four of these

states, they need to have a quarter of this output er-

ror to achieve the same gate error. In comparison, a

(15-to-1) × (20-to-4) protocol can generate one T -gate

magic state with p

out

= 2.6 × 10

−11

using 1,840,000

qubitcycles per state. In other words, four such states

would cost 7,360,000 qubitcycles, more than twice as

expensive as a |CCZi state.

While this implies that synthillation protocols can re-

duce the distillation cost, this ignores the fact that the

consumption of the output state can congest the dis-

tillation block. In the construction of Fig. 17, there

are 2t

code cycles for the consumption of each state.

If all states can be consumed simultaneously, then this

might be suﬃcient. However, if the rest of the quantum

computer is such, that the three output states need to

be consumed one after the other, the synthillation pro-

tocol may be slowed down signiﬁcantly, increasing the

overall space-time cost. This can be avoided by using

additional qubits to temporarily store the output state,

although this does not prevent an increase in the space-

time cost. Another possibility is to slow down the pro-

tocol to increase the time available for the consumption

of the output states, e.g, by using slow measurements

instead of fast measurements.

For p

phys

= 10

−4

, we ﬁnd that a (15-to-1)

7,3,3

(8-to-CCZ)

15,7,9

generates output states with p

out

7.2 × 10

−14

using 447,000 qubitcycles, which is also

cheaper compared to the cost to distill ordinary magic

states with the same ﬁdelity. Such synthillation proto-

cols can be used to generate resource states that execute

any arbitrary sequence of π/8 rotations. Schemes to

obtain such protocols are found in Ref. [17]. However,

note that the problem of output states congesting the

distillation block becomes more severe, if the generated

output state consists of many qubits.

6 Small-footprint protocols

So far, our protocols have focused on minimizing the

space-time cost. In this section, we outline how proto-

cols can be designed to minimize qubit footprint, i.e.,

the space cost, by sacriﬁcing space-time overhead. We

discuss the example of (15-to-1) and (15-to-1)×(15-to-1)

protocols with a target output error of p

out

≈ 10

−9

15-to-1. The footprint can be straightforwardly re-

duced by using only one region of ancilla qubits in-

stead of two. This reduces the footprint to 4(d

+ 2d

physical qubits, as shown in Fig. 18.

Since only once ancilla region is used, the time cost

doubles compared to the protocol of Fig. 11 to 12d

The error estimate is identical to the ordinary 15-to-1

protocol. We ﬁnd that, for p

phys

= 10

−4

, the small-

Figure 18: Qubit arrangement of a small-footprint implemen-

tation of the 15-to-1 protocol.

(b) (Z

⊗ Z

)

π/8

rotation

(a) Small-footprint (15-to-1)×(15-to-1)

Figure 19: (a) Qubit arrangement for the small-footprint imple-

mentation of the (15-to-1)×(15-to-1) protocol. (b) Example

of a (Z

⊗ Z

)

π/8

rotation executed in 2d

cycles.

footprint (15-to-1)

9,3,3

protocol produces magic states

with p

out

= 1.5×10

−9

using 762 qubits. However, since

the protocol takes 36.2 code cycles, the space-time cost

of 27,600 qubitcycles is higher than for comparable pro-

tocols with a similar output error.

Two-level 15-to-1 distillation. For two levels of

15-to-1 with a small footprint, we use the arrangement

shown in Fig. 19a, which consists of a (d

+ 4d

) ×

level-2 block, a (d

+4d

)×3d

level-1 block, and

an intermediate region of size 2d

×2d

×d

The level-1 block outputs level-1 states every ∼6d

code cycles. In principle, it is possible to use the small-

Accepted in Quantum 2019-10-30, click title to verify 17

footprint level-1 block of Fig. 18, but in the interest

of not increasing the space-time cost by too much, we

use the 15-to-1 block introduced in Sec. 3. When a

level-1 state is generated, it can be consumed for a

level-2 rotation in 2d

code cycles. An example of

a (Z

⊗ Z

)

π/8

is shown in Fig. 19b. In the ﬁrst

code cycles, a level-1 state is transported from the

level-1 block to the intermediate region. In the sec-

ond d

code cycles, a Pauli product measurement is

performed, executing a P

π/8

rotation according to the

circuit in Fig. 13c. Transporting the level-1 state into

the intermediate region increases its X and Z error by

·p

phys

, d

). If we deﬁne the time that it takes

to execute a level-2 rotation as

= max(2d

, 6d

/(1 − p

fail

)) , (14)

then the distillation protocol takes 15t

to ﬁnish.

In our numerical analysis, we ﬁnd that, for p

phys

−3

, a small-footprint (15-to-1)

9,5,5

× (15-to-1)

21,9,11

protocol generates magic states with p

out

= 6.1 ×10

−10

using only 7,780 physical qubits. With a time cost of

469 cycles, the space-time cost is 3,650,000 qubitcycles

per output state. While the space cost is very low,

this protocol sacriﬁces space-time cost, as an ordinary

(15-to-1) × (20-to-4) protocol can generate states with

out

= 1.4 × 10

−10

for only 1,420,000 qubitcycles.

7 Conclusion

We have constructed magic state distillation protocols

that reduce the space-time cost by approximately 90%

compared to the previous state of the art. Since our

results were not obtained by simulating entire surface-

code patches and running an actual decoder, but via a

careful error analysis, the numbers reported in Tab. 1

should be taken with a grain of salt. The protocols dis-

cussed in this paper should rather be regarded as a proof

of principle, demonstrating that the overhead of distil-

lation can be reduced signiﬁcantly by carefully tuning

the code distances of the diﬀerent qubits that are part

of the distillation protocol. In any case, exact numbers

will have a strong dependence on the hardware-speciﬁc

error parameters and the decoding procedure.

There is still plenty of room for optimization. For one,

we only considered very simple distillation protocols,

i.e., the 15-to-1 distillation protocol, a 20-to-4 block-

code protocol and the synthillation of |CCZi states.

Perhaps, more sophisticated distillation circuits can fur-

ther reduce the cost. While the 20-to-4 protocol is part

of an entire family of (3k + 8)-to-k protocols, it seems

unlikely that a higher k will decrease the cost, since the

space cost is governed by k + 3, and the time cost is

governed by 3k + 8. Therefore, the space-time cost per

output state is governed by (3k+8)(k+3)/k, which hap-

pens to have minima at k = 2 and k = 4 for even integer

k. Still, since it is possible to generate arbitrarily many

such protocols based on triorthogonal codes [11, 19],

one could look for protocols that minimize the space-

time cost in this manner. For two-level distillation, it

remains unclear which combination of protocols reduces

the cost. It also remains unclear whether the space-time

cost can be decreased by using protocols that reduce

the number of π/8 rotation in the distillation circuit

by adding Cliﬀord gates [3], or protocols that employ

catalyst states that need to be stored at a higher code

distance [21]. One could also construct protocols with

more than two levels of distillation.

The resource requirements for fault-tolerant surface-

code-based quantum computing can be daunting.

Hopefully, this work helps demonstrate that this is not

mainly due to the overhead of magic-state distillation,

but rather due to the low encoding rate of topological

codes, implying that thousands of physical qubits are

required to encode a single logical qubit.

Acknowledgments

I would like to thank Earl Campbell and Craig Gidney

for discussions about the advantages of two-level dis-

tillation protocols, Lingling Lao for discussions about

state injection, and Markus Kesselring for discussions

about the surface-code implementations. This work

has been supported by the Deutsche Forschungsgemein-

schaft (Bonn) within the network CRC TR 183.

References

[1] J. Preskill, Reliable quantum computers, Proc. Roy.

Soc. Lond. A 454, 385 (1998).

[2] B. M. Terhal, Quantum error correction for quan-

tum memories, Rev. Mod. Phys. 87, 307 (2015).

[3] E. T. Campbell, B. M. Terhal, and C. Vuil-

lot, Roads towards fault-tolerant universal quantum

computation, Nature 549, 172 (2017).

[4] A. Y. Kitaev, Fault-tolerant quantum computation

by anyons, Ann. Phys. 303, 2 (2003).

[5] A. G. Fowler, M. Mariantoni, J. M. Martinis, and

A. N. Cleland, Surface codes: Towards practical

large-scale quantum computation, Phys. Rev. A 86,

032324 (2012).

[6] C. Horsman, A. G. Fowler, S. Devitt, and R. V.

Meter, Surface code quantum computing by lattice

surgery, New J. Phys. 14, 123011 (2012).

[7] D. Litinski and F. v. Oppen, Lattice Surgery with a

Twist: Simplifying Cliﬀord Gates of Surface Codes,

Quantum 2, 62 (2018).

Accepted in Quantum 2019-10-30, click title to verify 18

[8] A. G. Fowler and C. Gidney, Low over-

head quantum computation using lattice surgery,

arXiv:1808.06709 (2018).

[9] S. Bravyi and A. Kitaev, Universal quantum com-

putation with ideal Cliﬀord gates and noisy ancil-

las, Phys. Rev. A 71, 022316 (2005).

[10] R. Babbush, C. Gidney, D. W. Berry, N. Wiebe,

J. McClean, A. Paler, A. Fowler, and H. Neven,

Encoding electronic spectra in quantum circuits

with linear T complexity, Phys. Rev. X 8, 041015

(2018).

[11] S. Bravyi and J. Haah, Magic-state distillation with

low overhead, Phys. Rev. A 86, 052329 (2012).

[12] A. G. Fowler, S. J. Devitt, and C. Jones, Surface

code implementation of block code state distillation,

Scientiﬁc Rep. 3, 1939 (2013).

[13] A. M. Meier, B. Eastin, and E. Knill, Magic-

state distillation with the four-qubit code, Quant.

Inf. Comp. 13, 195 (2013).

[14] C. Jones, Multilevel distillation of magic states

for quantum computing, Phys. Rev. A 87, 042305

(2013).

[15] G. Duclos-Cianci and K. M. Svore, Distillation of

nonstabilizer states for universal quantum compu-

tation, Phys. Rev. A 88, 042325 (2013).

[16] G. Duclos-Cianci and D. Poulin, Reducing the

quantum-computing overhead with complex gate

distillation, Phys. Rev. A 91, 042315 (2015).

[17] E. T. Campbell and M. Howard, Uniﬁed framework

for magic state distillation and multiqubit gate syn-

thesis with reduced resource cost, Phys. Rev. A 95,

022316 (2017).

[18] J. O’Gorman and E. T. Campbell, Quantum com-

putation with realistic magic-state factories, Phys.

Rev. A 95, 032338 (2017).

[19] J. Haah and M. B. Hastings, Codes and Protocols

for Distilling T , controlled-S, and Toﬀoli Gates,

Quantum 2, 71 (2018).

[20] E. T. Campbell and M. Howard, Magic state

parity-checker with pre-distilled components,

Quantum 2, 56 (2018).

[21] C. Gidney and A. G. Fowler, Eﬃcient magic state

factories with a catalyzed |CCZi to 2|T i transfor-

mation, Quantum 3, 135 (2019).

[22] C. Jones, P. Brooks, and J. Harrington, Gauge

color codes in two dimensions, Phys. Rev. A 93,

052332 (2016).

[23] S. Bravyi and A. Cross, Doubled color codes,

arXiv:1509.03239 (2015).

[24] T. Jochym-O’Connor and S. D. Bartlett, Stacked

codes: Universal fault-tolerant quantum computa-

tion in a two-dimensional layout, Phys. Rev. A 93,

022323 (2016).

[25] H. Bombin, 2D quantum computation with 3D

topological codes, arXiv:1810.09571 (2018).

[26] C. Chamberland and A. W. Cross, Fault-tolerant

magic state preparation with ﬂag qubits, Quantum

3, 143 (2019).

[27] B. J. Brown, A fault-tolerant non-Cliﬀord

gate for the surface code in two dimensions,

arXiv:1903.11634 (2019).

[28] D. Litinski, A Game of Surface Codes: Large-Scale

Quantum Computing with Lattice Surgery, Quan-

tum 3, 128 (2019).

[29] M. Amy and M. Mosca, T-count optimization and

Reed-Muller codes, IEEE Transactions on Informa-

tion Theory , 1 (2019).

[30] J. Haah, M. B. Hastings, D. Poulin, and

D. Wecker, Magic state distillation with low space

overhead and optimal asymptotic input count,

Quantum 1, 31 (2017).

[31] A. J. Landahl and C. Ryan-Anderson, Quan-

tum computing by color-code lattice surgery,

arXiv:1407.5103 (2014).

[32] Y. Li, A magic states ﬁdelity can be superior to the

operations that created it, New J. Phys. 17, 023037

(2015).

[33] J. Lodyga, P. Mazurek, A. Grudka, and

M. Horodecki, Simple scheme for encoding and de-

coding a qubit in unknown state for various topo-

logical codes, Scientiﬁc Rep. 5, 8975 (2015).

[34] L. Lao et al., Preparing high-ﬁdelity magic states

with low costs, in preparation .

[35] Cramming more power into a quantum device,

https://www.ibm.com/blogs/research/2019/03/

power-quantum-device/, accessed: 2019-05-09.

[36] K. Wright, K. Beck, S. Debnath, J. Amini,

Y. Nam, N. Grzesiak, J.-S. Chen, N. Pisenti,

M. Chmielewski, C. Collins, et al., Benchmarking

an 11-qubit quantum computer, arXiv:1903.08181

(2019).

[37] The Python script and Mathematica

notebook can be found on GitHub, see

https://github.com/litinski/magicstates .

[38] E. T. Campbell and M. Howard, Unifying gate syn-

thesis and magic state distillation, Phys. Rev. Lett.

118, 060501 (2017).

[39] P. Selinger, Quantum circuits of T -depth one,

Phys. Rev. A 87, 042302 (2013).

[40] C. Jones, Low-overhead constructions for the fault-

tolerant Toﬀoli gate, Phys. Rev. A 87, 022328

(2013).

[41] C. Gidney, Halving the cost of quantum addition,

Quantum 2, 74 (2018).

[42] B. J. Brown and S. Roberts, Universal fault-

tolerant measurement-based quantum computation,

arXiv:1811.11780 (2018).

Accepted in Quantum 2019-10-30, click title to verify 19

Figure 20: Protocol of a faulty T measurement in the spirit of Ref. [33]. It can be thought of as a d × d patch being shrunk to a

(d − 2) × (d − 2) patch, then a (d − 4) × (d − 4) patch, and so on, until a physical T gate is performed on the remaining 1 × 1

patch (the blue qubit), before it is measured in the X basis. Since all these shrinking operations can be performed simultaneously,

the T -measurement protocol corresponds to just the last step of the ﬁgure, where all red qubits are measured in the Z basis, all

green qubits in the X basis, and a faulty T measurement is performed on the blue qubit.

A Faulty T measurements

This appendix discusses faulty T measurements in

more detail. Speciﬁcally, we discuss approaches to de-

crease their error rate to justify the use of a faulty-T -

measurement error rate of p

phys

in the main text and

the potential implications of a higher error rate.

The T -measurement protocol discussed in Sec. 2.2 is

simple, but the operation of shrinking a d×d patch to a

d ×1 patch is associated with an error that scales with

the code distance d. While this might not be problem-

atic for small code distances, it implies that this speciﬁc

T -measurement protocol will not work for larger code

distances. A more advanced version of this protocol is

shown in Fig. 20. This protocol essentially corresponds

to the state-injection protocol of Ref. [33] in reverse. A

d×d patch is shrunk to a (d−2)×(d−2) patch by mea-

suring physical qubits along the boundary in the X or

Z basis. Next, the patch is shrunk to a (d −4) ×(d −4)

patch, a (d−6)×(d−6) patch, etc., until we end up with

a 1 ×1 patch, i.e., a single physical qubit corresponding

to the logical qubit. This qubit is then measured in the

T basis. Since the qubit is in a distance-1 state only at

the very end of the protocol, the error rate of this T-

measurement protocol is not proportional to the code

distance d.

The shrinking operations can all be performed simul-

taneously, such that the faulty T measurement corre-

sponds to measuring all red qubits in the last step of

Fig. 20 in the Z basis, and all green qubits in the X

basis. A physical T gate is applied to the remaining

blue qubit, after which it is measured in the X basis.

This protocol is almost identical to the T-measurement

protocol of Sec. 2.2, with the main diﬀerence being the

position of the blue qubit. In the protocol of Sec. 2.2,

the blue qubit is in the corner of the patch, whereas in

Fig. 20, the blue qubit is in the center of the patch.

In order to examine the error sources of this pro-

tocol, we consider the space-time diagram of the T -

measurement protocol of Fig. 20, which is shown in

Fig. 21. It is identical to the space-time diagram of the

state-injection protocol of Ref. [33] shown in Ref. [42].

Such space-time diagrams are useful to visualize the

eﬀect of physical Pauli errors and stabilizer measure-

ment errors on the encoded logical information. Strings

of errors connecting the red boundaries in the space-

time diagram lead to logical Z errors, whereas error

strings connecting the green boundaries lead to logical

X errors. In our usual estimates of idling distance-d

logical qubits, we take into account the weight-d error

strings that precede the faulty T measurement, i.e., er-

ror strings similar to the ones labelled (1) and (2) in

Fig. 21, which have a probability of p

phys

, d). Dur-

ing state injection or faulty T measurements, there are

some additional low-weight errors corresponding to er-

ror strings that connect two time-like boundaries during

the T measurement, i.e., single-qubit errors on the blue

qubit (3) and strings similar to the one labelled (4) in

Fig. 21.

Pre-selection. These low-weight errors are the

dominant contribution to the error rate of such state-

injection of faulty-T -measurement protocols. Speciﬁ-

cally, for circuit-level Pauli noise, single-qubit errors af-

fecting the qubits close to the blue qubit and failures

(1)

(2)

(3)

(4)

Figure 21: Space-time diagram of the T -measurement protocol

of Fig. 20 with time increasing to the right, and four examples

of error strings that cause logical errors.

Accepted in Quantum 2019-10-30, click title to verify 20

Step 1 Step 2 Step 3

Figure 22: The ﬁnite rejection rate of pre-selection can be prevented from signiﬁcantly increasing the time cost of distillation

protocols by using an additional d

×d

patch during the execution of P

π/8

rotations. While one d

×d

patch is idling until the

pre-selection condition is satisﬁed – i.e., until the stabilizers in a region surrounding the sensitive physical qubit have not reported

any syndromes for two code cycles – a second d

× d

patch can be used to execute the next P

π/8

rotation.

of two-qubit gates involved in the syndrome readout of

check operators close to the blue qubit contribute to

these low-weight errors. In the case of state-injection

protocols, these errors can be suppressed through the

addition of post-selection [32], particularly, if the er-

ror rate of two-qubit gates is signiﬁcantly higher than

the error rate of single-qubit operations. An injected

magic state is only accepted, if the stabilizer checks in

a certain region around the sensitive, blue qubit report

no syndrome for two code cycles. Since at least two

circuit-level two-qubit errors are required to generate

an unreported error, the dominant contribution to the

error rate of the state-injection protocol will be gov-

erned by single-qubit errors on the sensitive physical

qubits. However, this post-selection adds a certain fail-

ure probability to the state-injection protocol, as some

instances of state injection will be rejected. The proto-

col in Ref. [32] reduces the error rate of state injection

to ∼2p

phys

/3 for two-qubit error rates of p

= p

phys

and one-qubit error rates of p

= p

phys

/10 with a rejec-

tion rate of around 50% for p

phys

= 10

−3

. In Ref. [32], a

distance-7 region around the sensitive qubit was used for

post-selection, leading to a high rejection rate. Prelim-

inary numerical results by Lao et al. [34] indicate that

by post-selecting a distance-3 region around the sensi-

tive qubit, the error rate of state injection still stays

below p

phys

, while the rejection rate drops to ∼4% for

phys

= 10

−3

Since faulty T measurements are identical to state

injection, apart from the time direction being reversed,

the same approach can be used to reduce the error rate

of faulty-T -measurement protocols. The analogous op-

eration is pre-selection: The single-qubit measurements

in the protocol of Fig. 21 are delayed, until the check

operators in a region around the blue qubit have re-

ported no syndrome for two consecutive code cycles.

This justiﬁes the use of a faulty-T -measurement error

rate of p

phys

in the main text, but also adds a time cost

to the fault-T-measurement protocol through the rejec-

tion rate. While one might be worried that, particularly

for high p

phys

, the ﬁnite rejection rate in this faulty-

T -measurement protocol increases the time it takes to

execute a faulty P

π/8

rotation, this does not need to be

the case, if a few extra qubits are added to the protocol,

as shown in Fig. 22. Here, in step 1, a P

π/8

rotation is

performed using the protocol of Fig. 9. The d

× d

patch corresponding to the qubit that needs to be read

out using a faulty T measurement is then left idling, un-

til the pre-selection condition is satisﬁed and the qubit

can be measured. While this patch is idling, a second

× d

patch is used to perform the next P

π/8

rota-

tion in step 2. With a high probability, the d

× d

patch left idling in step 1 will have been read out by the

beginning of step 3, such that the freed-up space can be

used to initialize a d

×d

patch for the next rotation.

In rare cases, the d

× d

patch left idling in step 1

will have failed the pre-selection after, in this example,

code cycles, in which case it needs to keep idling,

increasing the overall time cost of the distillation pro-

tocol. If this happens suﬃciently rarely, the additional

time cost is negligible.

The time required to perform a faulty T measure-

ment with pre-selection depends on the rejection rate.

In the above example, if, e.g., the probability of fail-

ing the pre-selection is 50% every two code cycles

and d

= 5, then the probability of the d

× d

patch having failed the pre-selection after 2d

code

cycles is ∼1%, implying that the overall time cost of

level-1 distillation protocols increases by ∼1% compared

to the numbers presented in Tab. 1. This increase is

even lower under the assumption of a rejection rate of

∼4%. However, the numbers in Tab. 1 do not take into

account the increased space cost due to the additional

× d

patches that are used to avoid the time-cost

increase. If, as in Fig. 22, one uses two d

×d

patches

for each measurement region, then the space cost for,

e.g., a (15-to-1)

11,5,5

protocol increases by ∼9% com-

pared to the numbers in Tab. 1. Note that, in two-

level protocols, the increased space cost only aﬀects the

ﬁrst level of distillation, but not the second, such that,

Accepted in Quantum 2019-10-30, click title to verify 21

Protocol p

phys

out

Qubits Cycles

Space-time cost per output state

Qubitcycles Full distance

(15-to-1)

9,3,3

−4

2.1 × 10

−8

1,150 18.2 20,900 4.75d

/d = 13 3.10d

/d = 15

(15-to-1)

7,3,3

× (20-to-4)

13,5,7

−4

1.4 × 10

−12

13,200 70.0 231,000 23.5d

/d = 17 16.9d

/d = 19

(15-to-1)

9,3,3

× (20-to-4)

15,7,9

−4

6.6 × 10

−15

16,400 91.2 374,000 27.3d

/d = 19 20.2d

/d = 21

(15-to-1)

9,3,3

× (15-to-1)

25,9,9

−4

4.2 × 10

−22

18,600 68.4 1,270,000 32.4d

/d = 27 26.1d

/d = 29

(15-to-1)

13,5,5

× (20-to-4)

21,11,13

−3

5.7 × 10

−9

40,700 130 1,325,000 33.7d

/d = 27 22.2d

/d = 31

(15-to-1)

11,5,5

× (15-to-1)

21,9,11

−3

2.1 × 10

−10

27,400 85.7 2,350,000 48.1d

/d = 29 32.7d

/d = 33

(15-to-1)

11,5,5

× (15-to-1)

23,11,11

−3

2.5 × 10

−11

29,500 85.7 2,530,000 42.5d

/d = 31 29.5d

/d = 35

(15-to-1)

11,5,5

× (15-to-1)

25,11,11

−3

6.4 × 10

−12

30,700 85.7 2,630,000 36.7d

/d = 33 26.0d

/d = 37

(15-to-1)

13,7,7

× (15-to-1)

29,13,13

−3

1.5 × 10

−13

52,400 97.5 5,110,000 59.6d

/d = 35 43.1d

/d = 39

Table 2: Selection of distillation protocols under the assumption of a faulty-T -measurement error rate of 10p

phys

instead of 1p

phys

as in Tab. 1.

e.g., for a (15-to-1)

11,5,5

×(15-to-1)

25,11,11

protocol, the

space cost increases by less than 2%. Furthermore, the

error estimate in the main text does not take into ac-

count that, in cases where the d

× d

patches is left

idling for many code cycles, the error rate of the cor-

responding P

π/8

rotations is increased. However, with

pre-selection, the error rate of faulty T measurements

can be lower than the p

phys

assumed in the main text, if

single-qubit operations are signiﬁcantly less noisy than

two-qubit operations.

Replacing faulty T measurements with state

injection. After shrinking the d×d patch to a physical

qubit, the preceding stabilizer measurement outcomes

need to be decoded before a T (or T

†

) gate and an X

measurement can be applied to the physical qubit. If

the time required for this decoding operation is long

compared to the decoherence time of idling physical

qubits, the error rate of faulty T measurements will be

signiﬁcantly increased. In this case, one should use state

injection instead of faulty T measurements, executing

π/8

rotations via auto-corrected π/8 rotations [28], the

same way P

π/8

rotations are executed in level-2 distil-

lation protocols (see Fig. 13). This has a higher space

cost compared to faulty T measurements.

What if faulty T measurements have a higher

error rate? If one uses a simple faulty-T -measurement

or state-injection protocol without post-selection of pre-

selection, one might be worried that an increased T -

measurement error rate could adversely aﬀect the per-

formance of the distillation protocols discussed in the

main text. Speciﬁcally, we can consider the eﬀect on

the output error rate p

out

, if the T -measurement Pauli

error rate is 10p

phys

instead of p

phys

. We ﬁnd that, for

some of the distillation protocol shown in Tab. 1, an

increased T -measurement error has only a small eﬀect

on the performance.

A collection of protocols is shown in Tab. 2. The

main eﬀect of an increased faulty-T-measurement er-

ror rate is an increase of the lowest achievable out-

put error rate p

out

for a given family of distillation

protocols. For instance, the lowest achievable p

out

for

a one-level 15-to-1 protocol increases from 10.37p

10370p

. For example, for p

phys

= 10

−4

, a one-level

15-to-1 protocol can no longer produce magic states

with p

out

≈ 10

−11

and, instead, a (15-to-1) × (20-to-4)

protocol needs to be used, signiﬁcantly increasing the

space-time cost to produce these states. On the other

hand, the (15-to-1)

9,3,3

× (20-to-4)

15,7,9

protocol pro-

duces states with a p

out

that is far away from the low-

est achievable p

out

of a (15-to-1) × (20-to-4) protocol

with p

phys

= 10

−4

. Compared to Tab. 1, the out-

put error rate increases from p

out

= 2.4 × 10

−15

out

= 6.6×10

−15

. Due to an increased failure probabil-

ity, there is also a very small increase in space-time cost

by 0.8%. Finally, (15-to-1) ×(15-to-1) protocols can no

longer produce magic states with an output error below

out

≈ 10

−22

For p

phys

= 10

−3

, we ﬁnd a similar trend. A one-

level protocol can no longer be used to produce magic

states with p

out

≈ 10

−8

and a more expensive two-level

protocol needs to be used. For p

out

= 2.5 × 10

−11

, a

(15-to-1) × (20-to-4) protocol needs to be replaced by

a (15-to-1) × (15-to-1) protocol, increasing the space-

time cost by 37.5%. Again, we ﬁnd protocols which are

only mildly aﬀected by the increased error rate, such

as the (15-to-1)

11,5,5

× (15-to-1)

25,11,11

protocol, which

previously produced magic states with an output error

of p

out

= 2.7 × 10

−12

, but now produces magic states

with an output error of p

out

= 6.4×10

−12

with a space-

time cost increase of 3.5% due to the increased failure

probability. The lowest achievable output error rate of

(15-to-1)×(15-to-1) protocols increases to p

out

≈ 10

−14

making the production of magic states with error rates

close to this limit very costly. For such states and states

with a lower error rate, a three-level protocol might need

to be used.

Accepted in Quantum 2019-10-30, click title to verify 22

Comments

Products

Project