Fermat's Library | q-2020-08-06-304 annotated/explained version.

On maximum-likelihood decoding with circuit-level errors

Leonid P. Pryadko

Department of Physics & Astronomy, University of California, Riverside, California 92521, USA

(Dated: July 15, 2020)

Error probability distribution associated with a given Cliﬀord measurement circuit is described

exactly in terms of the circuit error-equivalence group, or the circuit subsystem code previously

introduced by Bacon, Flammia, Harrow, and Shi. This gives a prescription for maximum-likelihood

decoding with a given measurement circuit. Marginal distributions for subsets of circuit errors are

also analyzed; these generate a family of related asymmetric LDPC codes of varying degeneracy.

More generally, such a family is associated with any quantum code. Implications for decoding

highly-degenerate quantum codes are discussed.

I. INTRODUCTION

Quantum computation oﬀers exponential algorithmic

speed-up for some classically hard problems. This

promise is conditional in a fundamental way upon the

use of quantum error correction (QEC). However, despite

an enormous progress achieved both in theory and ex-

periment during the quarter century since the invention

of QEC[1], universal repetitive QEC protecting against

both phase and amplitude errors has not yet been demon-

strated in any of the qubit systems constructed to date.

This illustrates the enormous diﬃculty of building quan-

tum computers (QCs) with suﬃciently small error rates.

Given also the engineering diﬃculties with scaling up

the number of qubits[2], it is important that on the algo-

rithmic side one tries to achieve every optimization possi-

ble. Among other things, we would like to maximize the

probability of successful syndrome-based decoding with

QEC. Given the available hardware, this requires choos-

ing the optimal code, the optimal measurement circuit,

and the optimal decoder. In particular, a decoder should

be designed for the speciﬁc syndrome measurement cir-

cuit, as it must be aware of the associated correlations

between the errors[3–5]—at suﬃciently high error proba-

bilities such correlations are present even in fault-tolerant

(FT) circuits designed to prevent a single fault from af-

fecting multiple data qubits.

In contrast, the standard approach is to use a de-

coder optimized for the underlying code, regardless of

the actual circuit used for syndrome measurement. In

some cases, e.g., in the case of surface codes with

minimum-weight perfect matching (MWPM) decoder,

some leading-order correlations can be included in the

edge weights of the graph used for decoding[4, 6–8]. How-

ever, such a scheme is limited to codes and measure-

ment circuits where MWPM can be done eﬃciently, e.g.,

surface codes with single-ancilla measurement circuits[9],

and certain other classes or topological codes[10, 11].

Further, there is necessarily a decoding accuracy loss

when measurement circuit is not simply repeated, e.g.,

with code deformation or lattice surgery[12].

A feasible approach to building a decoder optimized for

the speciﬁc measurement circuit is to train a neural net-

work (NN) using extensive simulation data[13–20]. How-

ever, as is commonly the case with NNs, there is always

a question whether training has been suﬃcient to achieve

optimal decoding performance.

A parameterization of the probability distribution of

correlated quantum errors in terms of a spin model has

been recently considered by Chubb and Flammia[5]. In

particular, they describe how such a formalism can be

used for maximum-likelihood (ML) decoding in the pres-

ence of circuit-level noise. However, Chubb and Flam-

mia focused on larger circuits composed of measurement

blocks. Errors are assumed uncorrelated between the

blocks, while a model of error correlations at the output

of each block has to be constructed oﬄine, e.g., using

error propagation for a Cliﬀord measurement circuit. In

particular, Chubb and Flammia stopped short of analyz-

ing circuit-level errors, and only considered numerically

a toy model of correlated errors.

The goal of this work is to give an explicit numeri-

cally eﬃcient algorithm for analyzing error correlations

resulting from a given qubit-based Cliﬀord measurement

circuit, and for constructing decoders optimized for such

a circuit. Main result is that such correlations can be

accounted for by using phenomenological error model

(no error correlations) with the circuit-associated subsys-

tem code constructed by Bacon, Flammia, Harrow, and

Shi[21, 22]. Thus, any generic decoder capable of deal-

ing with uncorrelated data and syndrome measurement

errors in highly-degenerate sparse-generator subsystem

codes can be rendered to account for circuit-level error

correlations. Error correlations needed for implementing

the scheme of Chubb and Flammia are recovered by cal-

culating the marginals of the constructed distribution,

which can be done in practice using Ising model star-

polygon transformations[23]. The construction naturally

includes additional correlations between errors in diﬀer-

ent fault locations on the circuit.

An immediate application is for designing decoders

approaching true ML decoding for Cliﬀord circuits, to

be used in quantum memory. In particular, more ac-

curate decoding and optimization, with the ability to

account for error rates’ variations on individual qubits,

could help improve QEC to the level suﬃcient to pass

the break-even point and show coherent lifetimes longer

than that of an unprotected qubit in present-day or near-

future devices. Related techniques could also be more

arXiv:1909.06732v4 [quant-ph] 31 Jul 2020

widely applicable for design and analysis of FT protocols

and control schemes. Examples include error correction

with FT gadgets like ﬂag error correction[24–26], schemes

for protected evolution, e.g., using code deformations

where conventional approaches may result in a reduced

distance[12], and optimized single-shot error-correction

protocols[27–29].

In addition, analysis of marginal error distributions

and associated families of asymmetric codes of varying

degeneracy could become an important tool in the theory

of QECCs. In particular, it could help constructing bet-

ter decoders for highly-degenerate quantum LDPC codes.

Also, thorough analytical understanding of error correla-

tions could be useful for fundamental analysis of thresh-

olds, e.g., in order to extend and/or improve the bounds

in Refs. 30 and 31.

Paper outline: After this Introduction, an overview

of relevant notations and results in quantum codes and

multi-variable Bernoulli distributions is given in Sec. II.

Pauli error channels with correlated errors, including

circuit-level correlations, are discussed in Sec. III. Cor-

responding marginal distributions are constructed in

Sec. IV, followed by a discussion of possible implications

for exact and approximate ML decoding with circuit-level

errors in Sec. V. Sec. VI gives the concluding remarks.

II. BACKGROUND

A. Quantum codes

Generally, an n-qubit quantum code Q is a subspace

of the n-qubit Hilbert space H

⊗n

. A quantum [[n, k, d]]

stabilizer code is a 2

-dimensional subspace speciﬁed as a

common +1 eigenspace of all operators in an Abelian sta-

bilizer group S ⊂ P

, −11 6∈ S, where the n-qubit Pauli

group P

is generated by tensor products of single-qubit

Pauli operators. The stabilizer is typically speciﬁed in

terms of its generators, S = hS

, . . . , S

n−k

i. The weight

of a Pauli operator is the number of qubits that it af-

fects. The distance d of a quantum code is the minimum

weight of a Pauli operator E ∈ P

which commutes with

all operators from the stabilizer S, but is not a part of

the stabilizer, E 6∈ S. Such operators act non-trivially in

the code and are called logical operators.

Subsystem codes[32, 33] are a generalization of sta-

bilizer codes where only some of the logical qubits are

used. More precisely, given a stabilizer group S, the sta-

bilized subspace Q

is further decomposed into a tensor

product of two subspaces, Q

= Q

⊗ Q

, where Q

is a 2

-dimensional “logical” subspace used to store the

quantum information, while we do not care about the

state of the “gauge” qubits in the subspace Q

after the

recovery. Logical operators of the original code which

act non-trivially only in Q

, together with the operators

from the stabilizer group S, generate the non-Abelian

gauge group G which fully characterizes the subsystem

code. In particular, the center Z(G) is formed by the

elements of the original stabilizer S, up to a phase i

m ∈ {0, 1, 2, 3}.

A Pauli error E that anticommutes with an element

of the stabilizer, ES = −SE, S ∈ S, is called de-

tectable. Such an error results in a non-zero syndrome

s = {s

, . . . , s

} whose bits s

∈ {0, 1} are obtained by

measuring the eigenvalues (−1)

of the chosen set of sta-

bilizer generators S

, i = {1, . . . , r}, r ≥ n − k. Unlike

in the case of classical codes, there may be many equiv-

alent (mutually degenerate) errors resulting in the same

syndrome. For a subsystem code, errors E

degenerate

with E, denoted E

' E, have the form E

= EG, where

G ∈ G is an element of the gauge group; such errors

can not and need not be distinguished. Non-trivial logi-

cal operators of the subsystem code are logical operators

of the original stabilizer code which act non-trivially in

. A bare logical operator U acts trivially in Q

and

commutes with any element of G. Such restrictions do

not apply to dressed logical operators which are only re-

quired to commute with elements of the stabilizer S. In

any case, multiplication by a logical operator U gives an

error with the same syndrome but from a diﬀerent equiv-

alence class, EU 6' E.

Analysis of error correction is conveniently done using

quaternary, or an equivalent binary, representation of the

Pauli group[34, 35]. A Pauli operator U ≡ i

where u, v ∈ {0, 1}

⊗n

and X

= X

. . . X

, Z

. . . Z

, is mapped, up to a phase, to a length-2n

binary vector e = (u|v). Two Pauli operators U

, U

commute iﬀ the symplectic inner product

? e

≡ e

Σ e

, Σ ≡





, (1)

of the corresponding binary vectors is zero, e

? e

= 0.

Here I

is the n × n identity matrix. Thus, if H =

) is an m × 2n binary matrix whose rows rep-

resent stabilizer generators, and

H ≡ H Σ = (H

then the syndrome s of an error U

with binary represen-

tation e = (u|v) is given by s

= He

. If we similarly

denote G = (G

) an r × 2n matrix formed by the

gauge group generators (for a stabilizer code, G =

H),

the errors with binary representation e and e

= e + αG

are equivalent, e

' e, for any length-r binary string

α ∈ F

⊗m

. Generally, GH

= 0, and

rank H = n − k − κ, rank G = n −k + κ, (2)

where κ is the number of gauge qubits. Matrices G and

H can be viewed as generator matrices of an auxiliary

length-2n CSS code which encodes 2k qubits. These cor-

respond to a basis set of 2k independent logical oper-

ators of the original subsystem code. It will be conve-

nient to introduce a logical generating matrix L such that

= 0, rank L = 2k. Rows of L map to basis logical

operators; a non-zero linear combination of the rows of

L must be linearly independent from the rows of G.

Gauge or stabilizer generators can be measured with

a Cliﬀord circuit which consists of ancillary qubit initial-

ization and measurement in the preferred (e.g., Z) ba-

sis, and a set of unitary Cliﬀord gates, e.g., single-qubit

Hadamard H and phase P gates and two-qubit CNOT.

Generally, a Cliﬀord unitary U maps Pauli operators to

Pauli operators, E

= U

†

EU , E, E

∈ P

. Ignoring the

overall phase (for complete description, see Refs. 36 and

37), this corresponds to a linear map of the correspond-

ing binary vectors, (e

)

= Ce

, where C ≡ C

is a

symplectic matrix with the property C

ΣC = Σ.

B. Bernoulli distribution

Multi-variate Bernoulli distribution describes a joint

probability distribution of m single-bit variables x

∈ F

e.g., components of a binary vector x ∈ F

. Most gener-

ally, such a distribution can be speciﬁed in terms of 2

probabilities p

≥ 0, with normalization

x∈F

= 1.

A convenient representation of such a distribution as an

exponent of a polynomial of m binary variables with real

coeﬃcients was given by Dai et al. in Ref. 38. Namely, for

a single variable x ∈ {0, 1}, we can write

(x) = p

1−x

as a product of two terms, where p

+ p

= 1 are the out-

come probabilities. For m variables we have, similarly,

the product of 2

terms,

(x) = p

(1−x

)(1−x

)...(1−x

)

00...0

×p

(1−x

)(1−x

)...(1−x

m−1

00...01

. . . p

...x

11...1

, (3)

where the probabilities are assumed positive, p

> 0. For

any given x ∈ F

only one exponent is non-zero so that

the result is p

. Taking the logarithm and expanding,

one obtains the corresponding “energy” E ≡ −ln

(x)

as a polynomial of m binary variables. The correspond-

ing coeﬃcients can be viewed as binary cumulants[38];

presence of high-degree terms indicates a complex prob-

ability distribution with highly non-trivial correlations.

For applications it is more convenient to work with

spin variables s

= (−1)

≡ 1 − 2x

∈ ±1, and rewrite

the energy function using the general Ising representation

ﬁrst introduced by Wegner[39],

E ≡ E ({s

}) = −

+ const, R

, (4)

parameterized by the binary spin-bond incidence matrix

θ with m rows, and the bond coeﬃcients K

. While most

general m-variate Bernoulli distribution requires 2

− 1

bonds, in the absence of high-order correlations signiﬁ-

cantly fewer terms may be needed. Of particular inter-

est are distributions with sparse matrices θ, e.g., with

bounded row and column weights, which also limits the

number of columns. In the simplest case of independent

identically-distributed (i.i.d.) bits with equal set proba-

bilities p

= p, p

= 1 −p, we can take θ = I

, the iden-

tity matrix, and all coeﬃcients equal, K = ln[(1−p)/p]/2.

The logarithm ln[(1 − p)/p] is commonly called a log-

likelihood ratio (LLR); the coeﬃcient K here and K

Eq. (4) are thus called half-LLR coeﬃcients.

III. ERROR CORRELATIONS IN A CLIFFORD

MEASUREMENT CIRCUIT

A. Pauli error channel with correlations.

Consider the most general Pauli error channel

ρ →

e∈F

(e)E

ρE

†

, (5)

where

(e) is the probability of an error E

with binary

representation e, with the irrelevant phase disregarded.

Technically,

(e) describes a 2n-variate Bernoulli distri-

bution. The probability

(e) can be parameterized in

terms of a 2n×m binary coupling matrix θ with m < 2

columns, and a set of coeﬃcients K

, b ∈ {1, . . . , m},

(e; θ, {K

}) = Z

−1

exp

(−1)

[e θ]

, (6)

where [e θ]

in the exponent is the corresponding compo-

nent of the row-vector e θ. The normalization constant

Z ≡ Z(θ, {K

}) in Eq. (6) is the partition function of the

Ising model in Wegner’s form, cf. Eq. (4),

Z(θ, {K

}) =

∈±1}

. (7)

In the simplest case of independent X and Z errors, θ =

, the identity matrix, while e

= (1 − p

)/p

for

b ≤ n, and the corresponding expression with p

→ p

for n < b ≤ 2n. In the case of the depolarizing channel

with error probability p, we have, instead,

θ =



0 I



, e

= 3(1 − p)/p, (8)

where the additional column block is to account for corre-

lations between X and Z errors. Additional correlations

between the errors can be introduced by adding columns

to matrix θ and the corresponding coeﬃcients K

Given a probability distribution in the form (6), it is

easy to construct an expression for probability of an error

equivalent to e in a subsystem code with gauge generator

matrix G, extending the approach of Refs. 6, 40, and

41, and reproducing some of the results from Ref. 5. A

substitution e → e + αG and a summation over α gives

P(e

∈ F

' e) = 2

rank G−r



Gθ, {K

(−1)

[e θ]

}



Z (θ, {K

})

(9)

Here G is an r × 2n binary matrix, cf. Eq. (2), and the

prefactor accounts for a possible redundancy in the sum-

mation. Notice that the partition function in the nu-

merator has the same number of bonds m as that in the

denominator, but with the signs of the coeﬃcients K

corresponding to non-zero elements of the binary vector

 ≡ e θ ﬂipped. When both the gauge generator matrix

G and the error correlation matrix θ are sparse, the inci-

dence matrix Θ ≡ Gθ in the numerator of Eq. (9) must

also be sparse.

B. Cliﬀord measurement circuit and associated

input/output codes.

Let us now consider error correlations resulting from

a Cliﬀord measurement circuit. Speciﬁcally, following

Refs. 21 and 22, consider an n-qubit circuit, n = n

with n

data qubits and n

ancillary qubits. First, an-

cillary qubits are initialized to |0i, second, a collection

of Cliﬀord gates forms a unitary U, and ﬁnally the an-

cillary qubits are measured in the Z basis, see Fig. 1.

In the absence of errors and in the event of all measure-

ments returning +1 (zero syndrome), the corresponding

post-selected evolution is described by the matrix

V = (I

⊗n

⊗ h0|

⊗n

) U (I

⊗n

⊗ |0i

⊗n

), (10)

where I is a single-qubit identity operator. The circuit is

assumed to be a good error-detecting circuit (good EDC),

namely, V

†

V be proportional to the projector onto a sub-

space Q

⊆ H

†

V = cΠ

, c > 0 (11)

see Def. 4 in Ref. 22. Here Q

, called the input code,

is an [[n

, k, d

]] stabilizer code encoding k qubits with

distance d

|ψi

|ψ

′

|0i

FIG. 1. Generic circuit with n

data and n

ancillary qubits

initialized to |0i and measured in Z basis. In practice, an-

cillary qubit can be measured during evolution, and subse-

quently reused after initialization. However, there is no mech-

anism for adapting the gates to the measurement results.

A good EDC also deﬁnes an output code Q

⊆ H

⊗n

which encodes the same number of qubits k. Indeed,

since V

†

V = cΠ

, matrix V has only one non-zero singu-

lar value,

√

c; this immediately gives V V

†

= cΠ

, with

the projector onto a space Q

⊆ H

⊗n

of the same di-

mension 2

, the output code. Moreover, for any input

state |ψi ∈ Q

, the output |ψi

≡ V |ψi ∈ Q

is in the

output code, and the corresponding transformation is a

(scaled) Cliﬀord unitary.

Even though the map between Q

and Q

is unitary,

the distance d

of the output code does not necessarily

equals d

. In particular, adding a unitary decoding cir-

cuit on output data qubits may be used to render d

= 1.

C. Errors in a Cliﬀord circuit.

Using standard circuit identities, any circuit error E

can be propagated forward to the output of the circuit,

thus giving an equivalent data error E

(E) ∈ P

and

the (gauge) syndrome σ

(E) ∈ F

corresponding to the

measurement results. Clearly, there is a big redundancy

even if phases are ignored, as many circuit errors can

result in the same or equivalent E

(E) and σ

(E). The

goal is to ﬁnd the conditional probability distribution for

the equivalence class of the output error E

(E) given the

measured value of σ

(E).

In a given Cliﬀord circuit, consider N possible error lo-

cations, portions of horizontal wires starting and ending

on a gate or an input/output end of the wire. For exam-

ple, the circuit in Fig. 2 has N = 15 error locations. A

Pauli error may occur on any of these locations. A circuit

error E is a set of N single-qubit Pauli operators with-

out the phase, P

∈ {I, X, Y, Z}, i ∈ {1, . . . , N }. When

two circuit errors are applied sequentially, the result is

a circuit error whose elements are pointwise products of

Pauli operators with the phases dropped. The algebra

deﬁned by such a product is isomorphic to the N -qubit

Pauli group without the phase. This is an Abelian group

which also admits a representation in terms of length-2N

binary vectors e ≡ (u|v) ∈ F

. Multiplication of two

circuit errors amounts to addition of the corresponding

binary vectors e.

•

|0i

4 9 14

|0i

5 10 15

FIG. 2. Circuit measuring generators Z

and Z

of a

three-qubit toy code. Digits indicate distinct circuit locations.

Identity operators I are inserted so that the number of circuit

locations along each qubit wire be odd.

With these deﬁnitions, error propagation through a

Cliﬀord circuit can be described as products of the orig-

inal circuit error E with trivial circuit errors which have

no eﬀect on the outcome. The collection of all such errors

forms error equivalence group (EEG) of the circuit. The

generators of this group involved in error propagation are

trivial errors, each formed as a union of a single-qubit

Pauli P

, i ∈ {1, . . . , N} (but not in the output for data

qubits, or right before the measurements for ancillary

qubits) with the result of error propagation of P

across

the subsequent gate. For example, for the circuit in

Fig. 2, some of the EEG generators, with identity opera-

tors dropped, are {Z

, Z

}, {X

, X

}, {X

, X

}, and

, Z

} (propagation across the leftmost CNOT),

and {X

, X

}, {Z

, Z

} (propagation across the leftmost

identity gate labeled I).

In addition, for a circuit which includes qubits’ ini-

tialization and measurement, the list of EEG generators

includes errors Z

on ancillary qubits right after initial-

ization to |0i and right before the Z-basis measurement.

For the circuit in Fig. 2, these are {Z

}, {Z

} (ancillary

qubits right after initialization), and {Z

}, {Z

} (an-

cillary qubits just before measurement). It is important

that stabilizer group of the input code, after its elements

are promoted as circuit errors, forms a subgroup of thus

constructed full circuit EEG[22]. Same applies to the

stabilizer group of the output code.

In the binary representation, a single-qubit Hadamard

gate connecting circuit positions i and i

corre-

sponds to a pair of generators with non-zero elements

, v

) = (10|01) and (01|10) (X

propagates into

and v.v.), a phase gate similarly corresponds to

(10|10) and (01|11) (Z

propagates into Z

and X

into

), and a CNOT gate (10 00|10 00) (input Z

on the con-

trol and an output Z

on the same wire), (01 00|01 01)

(input X

on the control and X

, X

on the out-

puts), (00 01|00 01) (target X pass-through), and, ﬁnally,

(00 10|10 10). The generators for the single-qubit trivial

errors are even simpler, since a Z

maps to the weight-one

vector (u

, v

) = (10).

For a given circuit error E with binary form e ∈ F

any equivalent error can be obtained by adding a linear

combination of thus constructed circuit EEG generators

. It is convenient to combine the corresponding gen-

erators into a generator matrix G with 2N columns. As

discussed in the Appendix A, for a good EDC with the

rank of the input-code stabilizer group r

= n

− k and

the constant c = 1/2

, see Eq. (10) and below, the gen-

erator matrix has the rank

rank G = 2N − 2n

− f, f ≡ n

− κ −r

≥ 0. (12)

Generally, a non-zero value f > 0 indicates a measure-

ment redundancy.

With the circuit EEG generator matrix in place, it is

easy to construct a formal expression for the probability

of an error in a given equivalence class. Namely, if the

circuit error probability distribution is given by an analog

of Eq. (6), the probability of an error equivalent to a given

e ∈ F

is proportional to the Ising partition function,

P(e

∈ F

' e) = Const Z(Gθ, {K

(−1)

[eθ]

}), (13)

cf. Eq. (9). It is also easy to ﬁnd the conditional probabil-

ity distribution of output errors with a given syndrome.

Namely, given the binary form of an output data error

∈ F

⊗2n

and a syndrome vector σ

∈ F

⊗n

, we need

to form the corresponding vector e ≡ e(e

, σ

) ∈ F

ﬁlling only the components corresponding to the output

data qubits and ancillary X

just before the measure-

ments which give non-zero syndrome bits in σ

For ML decoding, we compare thus computed prob-

abilities for all inequivalent errors consistent with the

measured syndrome σ

. In particular, these include er-

rors that diﬀer by a logical operator of the input (or,

equivalently, the output) code, since non-trivial logical

operators are outside the circuit EEG. In other words,

length-2N binary vectors corresponding to logical opera-

tors are linearly independent from the rows of the circuit

EEG generator matrix G. It is convenient to deﬁne a

binary logical generator matrix L of dimension 2k ×2N,

whose rows correspond to mutually inequivalent logical

operators of the input code.

For the ease of decoding, it is also convenient to intro-

duce the parity-check matrix H, also with 2N columns,

whose rows are orthogonal to the rows of both matrices

G and L, and whose rank satisﬁes

rank H = 2N − rank G −rank L = n

− κ + r

. (14)

Clearly, H is dual to a matrix combing the rows of G and

L. As in the case of a subsystem code, see Eq. (2), these

matrices can be seen as forming a half of a CSS code,

with stabilizer generator matrices G

= G, G

= H,

and X-logical operator generator matrix L

= L.

The orthogonality requirement for rows of H can also

be interpreted in terms of the N -qubit Pauli group as-

sociated with the circuit. Namely, the Pauli operators

corresponding to the rows of the symplectic dual matrix

H = HΣ, see Eq. (1) and below, must commute with

generators of the circuit EEG. This guarantees that each

of these operators be a spackle, i.e., a circuit error where

the single-qubit Pauli operators in any time layer can be

obtained by error propagation from those in the previos

time layer, see Ref. 22. Respectively, row weights of H

scale linearly with the circuit depth.

D. Circuit subsystem code

The discussion in Ref. 22 focused on the special case

of good EDCs where each qubit line is required to have

an odd number of locations. In this special case, the

circuit EEG can be seen as the gauge group of a quan-

tum subsystem code of length N which encodes the same

number of qubits k as the input/output codes, and has

a distance not exceeding the corresponding distances,

d ≤ min(d

, d

). Respectively, rows of the matrix

which correspond to generators of the stabilizer group of

the subsystem code are necessarily given by linear com-

binations of the rows of G. In addition, circuit errors

corresponding to the rows of the matrix L can be seen as

dressed logical operators, while the bare logical operators

which commute with the elements of the circuit EEG can

be constructed as spackles.

In practice, any circuit can be easily modiﬁed to satisfy

this additional requirement by inserting a null (identity)

gate into each qubit line with an even number of loca-

tions, see Fig. 2 for an example. While it is not strictly

necessary to work with circuits that satisfy this require-

ment, it is convenient, as the additional structure of the

subsystem code can be used to verify the validity of the

constructed matrices.

However, once the matrices G, H, and L are con-

structed, there is no need to refer to the subsystem code.

In fact, the generator matrix row-reduction transforma-

tion described in the following Section IV preserves the

orthogonality and the relation Eq. (14) between the ranks

inherent in the CSS code map, but not the structure of

the circuit subsystem code.

E. Circuit code distance.

How good can a measurement protocol be? What are

the bounds on the distance d of the subsystem code as-

sociated with the circuit?

Generally, if d

and d

are the distances of the input

and the output codes, the distance of the corresponding

circuit code satisﬁes d ≤ min(d

, d

). This follows from

the fact that a logical operator of the input code, e.g.,

is naturally mapped to a (dressed) logical operator of

the circuit code. An important result in Ref. [22] is that

one can always design a fault-tolerant circuit so that the

distance d of the corresponding subsystem code be as

good as that of the input code, d = d

Unfortunately, circuit-code distance d does not have a

direct relation to the probability distribution of the out-

put errors; even single-qubit output errors may remain

undetected. This is a well known “feature” of quan-

tum error-correcting codes operating in a fault-tolerant

regime, even for codes with single-shot properties[27–29].

Indeed, regardless of the circuit structure, errors on the

data qubits in the locations just before the circuit output

will not be detected.

In comparison, with a formally deﬁned circuit code,

such an error can be propagated back to the input layer

and (when it is detectable, e.g., if its weight is smaller

than the distance d

of the output code) it would nec-

essarily anticommute with one or more combination(s)

of the ancillary qubits. The original error would thus

be detectable in the circuit code. Causality does not

permit such an operation with actual circuit evolution.

Formally, this functionality is removed due to assumed

ancillary qubit initialization to |0i.

Of course, even if a small-weight error goes undetected,

it may get corrected after one or more additional mea-

surement rounds. In practice, when an error-correcting

code is analyzed in a fault-tolerant setting, the standard

numerical procedure is to add a layer of perfect stabilizer

measurements (no measurement errors). This guarantees

that all small-weight errors at the end of the simulation

be detected, and thus recovers the distance d > 1 of the

circuit code, without the need to violate causality.

IV. MARGINAL DISTRIBUTIONS FOR

CORRELATED ERRORS

The circuit EEG fully describes correlations between

the circuit errors. However, it also contains a lot of ex-

cessive information: for the purpose of error correction,

we are only interested in the distribution of the output

errors and the syndrome, which are all supported at the

rightmost locations of the circuit. In addition, the large

size and sparsity of the circuit generator matrix G makes

decoding diﬃcult, except with the simplest circuits.

Present goal is to reduce the number of independent

variables, by constructing the marginal distribution for a

given subset of the variables. This amounts to a summa-

tion over the variables one is not interested in, e.g.,

P(e

s+1

, . . . , e

) =

. . .

P(e

, e

, . . . , e

). (15)

In the case of binary variables e

∈ {0, 1}, both the orig-

inal and the resulting marginal distributions are multi-

variate Bernoulli distributions, and each can be described

in terms of the Ising energy function (4).

A. Row-reduction transformation

1. Generator matrix and the coupling coeﬃcients

Given an n-variate Bernoulli distribution described by

the coupling matrix Θ, e.g., Θ = Gθ in Eq. (9), and a set

of half-LLR coeﬃcients K

, 1 ≤ b ≤ m, what are the cor-

responding parameters of the marginal distribution (15)?

In the equivalent Ising-model representation (4), the goal

is to describe the couplings between the remaining spins

after a partial summation. Such a star-polygon transfor-

mation for a general Ising model has been constructed in

Ref. 23. The transformation includes two special cases

long known in the Ising model literature: the Onsager’s

star-triangle transformation[42] and the (inverse) deco-

ration transformation[43, 44].

It is convenient to derive the result directly, focusing

on the marginal distribution after a summation over just

one spin variable s

∈ ±1 corresponding to i th row of

Θ. Without limiting generality, assume that the chosen

row has w non-zero elements in positions 1, 2, . . . , w,

decompose the corresponding bond variables R

= s

∈ ±1, 1 ≤ b ≤ w, and perform the summation explic-

itly (with the additional one-half factor for convenience),

≡

=±1

exp



b=1



= cosh



b=1



, (16)

where τ ∈ F

is a composite index with elements τ

such that T

= (−1)

. To exponentiate this expression,

rewrite B

by analogy with Eq. (3),

= B

1+τ

...

1+τ

00...0

1+τ

...

1+τ

w−1

1−τ

00...01

×. . . B

1−τ

...

1−τ

11...1

, (17)

where the coeﬃcients B

...

in the base of the exponents

are the hyperbolic cosines (16) of the sum of coeﬃcients

±K

with the signs ﬁxed, and matching exactly the signs

in the exponents. As in Eq. (3), after a substitution

of any given τ ∈ F

, only one term with the correct

index τ remains in the product. The modiﬁed bonds

and the corresponding coeﬃcients K

are obtained by

expanding the polynomial in the exponent of Eq. (17).

Because of the symmetry of the hyperbolic cosine, only

even-weight products of the original bonds result from

this expansion. Thus, in general, for an original row of

weight w, the corresponding w columns are combined to

produce w

= 2

w−1

−1 even-weight column combinations,

a change of ∆w = 2

w−1

− w − 1 columns.

Speciﬁcally, for a row of weight w = 1, the transfor-

mation amounts to simply dropping the row and the cor-

responding column of Θ. The values of K

remain the

same, except for the one value that is dropped.

For a row of weight w = 2, only the sum of the corre-

sponding columns is retained in Θ, with the coeﬃcient

1,2

cosh(K

+ K

)

cosh(K

− K

)

≡

cf. Eq. (16). Equivalently, tanh K

1,2

= tanh K

tanh K

For a row of weigth w = 3, the three columns of the

original matrix Θ are replaced by their pairwise sums,

with the coeﬃcient

1,2

000

001

010

011

for the combination of the ﬁrst two columns. The re-

maining coeﬃcients K

2,3

and K

3,1

can be obtained with

cyclic permutations of the indices.

For a row of weight w = 4, the four columns of the

original matrix Θ are replaced with six pairwise sums and

the seventh column combining all four original columns,

with the coeﬃcients, e.g.,

1,2

0000

0001

0010

0011

0100

0101

0110

0111

1,2,3,4

0000

0011

0101

0110

0001

0010

0100

0111

In general, the coeﬃcient K

in front of the product of

with indices b in an (even) subset J ⊆ {1, 2, . . . , w} is

given by the sum of logarithms of the hyperbolic cosines

with τ

= 0 (this accounts for symmetry of hyperbolic

cosine), with the coeﬃcients ±1/2

w−1

, where the sign is

determined by the parity of the weight of the subset τ [J]

restricted to J. It is easy to check that the numbers

of positive and negative coeﬃcients always match. Re-

spectively, the coeﬃcients for high-weight products are

typically small in magnitude.

2. Transformation for a parity check matrix

The row-reduction transformation can also be writ-

ten in terms of the parity-check matrix H, also with m

columns, and dual to Θ, such that

HΘ

= 0 and rank H = m −rank Θ. (18)

To this end, consider the row-reduction of Θ as a combi-

nation of the following elementary column steps:

(i) The 1st column of Θ is added to columns 2, 3, . . . ,

w of Θ; as a result the i th row of Θ has a non-zero

element only in the ﬁrst position.

(ii) The 1st column of thus modiﬁed Θ is dropped,

which leaves the i th row zero—it may be dropped

as well;

(iii) If w > 2, 2

w−1

−w combinations of two, three, . . . ,

w − 1 columns with indices b ≤ w − 1 are added

to the matrix Θ. These can be sorted by weight

so that each added column be a combination of

exactly two existing columns in the modiﬁed Θ.

The corresponding steps for H are:

) Columns b ∈ {2, 3, . . . , w} of H are added to its 1st

column, which becomes identically zero as a result.

(ii

) Drop all-zero 1st column from thus modiﬁed H.

(iii

) For each column, e.g., b

, added to Θ as a linear

combination of two existing columns b

and b

, H

acquires a new row with the support on {b

, b

}

to express this linear relation.

It is easy to check that row orthogonality, HΘ

= 0, is

preserved. Also, the rank of Θ is reduced by one, while

the increase of the rank of H matches the number of

columns added in step (iii

), so that the exact duality

(18) is preserved.

B. Marginal distribution for error equivalence

classes

This analysis is easily carried over to the problem of

syndrome-based ML decoding for an [[n, k]] subsystem

code under a Pauli channel characterized by a 2n × m

matrix θ and a set of m half-LLR coeﬃcients {K

}, see

Sec. III A. Given a gauge generator matrix G, the prob-

ability of an error equivalent to e is given by Eq. (9).

Generally, for ML decoding we need to choose the largest

of the 2

partition functions

Z(Gθ, {K

(−1)

θ]

), e

= e + αL, α ∈ F

. (19)

Typically, this needs to be done for a large number of er-

ror vectors e. Can the calculation be simpliﬁed en masse

by doing partial summation over the spins corresponding

to all rows of Gθ as described in the previous section?

The structure of the logical operators can be accounted

for by extending the rows of the generator matrix which

now has two row blocks,

Θ =

Gθ

Lθ

= K

(−1)

[eθ]

, (20)

and the half-LLR coeﬃcients

, b ∈ {1, 2, . . . , m}. A

matching parity check matrix H is dual to Θ, see Eq. (18).

1. Independent X and Z errors

Let us ﬁrst consider the simpler case of independent

X and Z errors, where matrix θ = I

. In this case H =

H =

HΣ, where

H is a generator matrix of the code’s

stabilizer group, see Eq. (2). Marginal distribution being

independent of the choice of the generator matrix, use

row transformations and column permutations to render

Θ =

A B

, (21)

H =



+ C



, (22)

where matrices B and C have ` ≡ 2n − r − 2k columns,

and r = rank(G). The matrices Θ and H are mutually

dual as can be immediately veriﬁed.

Row-reduction operations applied to each row in the

upper block of Θ correspond to:

) Column operations to set both blocks A and B to

zero, and conjugate column operations on H to set

its left-most column block to zero.

(ii

) Drop the upper row-block of the obtained Θ, as

well as the left-most column blocks of the resulting

Θ and H.

(iii

) Add an extra column block M

+ CM

to the re-

sulting Θ, where columns of M

and M

specify

the linear combinations of the columns in its two

remaining blocks, and a matching row-block to H.

As a result, the transformed matrices acquire the form



C M

+ CM



, (23)

, (24)

where double vertical lines are used to separate the newly

added columns.

When the described transformation is applied to any

of the original partition functions (19), the result is just

an exponential exp



(ﬁn)



of the sum of the ﬁnal

half-LLR coeﬃcients. Can identical columns of the ﬁnal

generator matrix (23) be similarly combined to simplify

the structure of the ﬁnal marginal distribution for error

equivalence classes? The answer is yes, as long as we

account for the eﬀect of the error e on the values of the

coeﬃcients K

(ﬁn)

In fact, it is easy to check that the row-reduction trans-

formations in Sec. IV A are such that the additional signs

in Eq. (19) only aﬀect the signs of the coeﬃcients K

(ﬁn)

Moreover, these signs correspond to the bits of the trans-

formed error vector, cf. Eq. (23),

(ﬁn)



+ ε



, (25)

where the vector ε

selects the equivalence class, and

= s + ε

C, with s ≡ eH

the original syndrome.

Clearly, the right-most blocks in Eqs. (23) and (25) are

obtained from ε ≡ (ε

|ε

) with 2k + ` = 2n − r com-

ponents as a right product with the combined matrix

M =





. All `

≡ 2n − r components of ε being inde-

pendent, there are

max

= 2

− 1 (26)

non-trivial combinations. Combining identical columns

in the transformation from ε to e

(ﬁn)

, we can ensure that

the ﬁnal matrices contain no more than m

max

columns.

With Eq. (2), r = n−k+κ, so that `

= n+k−κ, where

κ is the number of gauge qubits in the subsystem code.

For a stabilizer code, κ = 0, thus `

= n + k. Clearly, the

latter is just the number of logical generators, 2k, plus

the number of independent syndrome bits, n − k.

2. A more general Pauli channel

Instead of considering most general situation, consider

an important case of a Pauli channel where any single-

qubit error has a non-zero probability. Then, the inci-

dence matrix can be chosen to have an identity-matrix

block, θ = (I

|T ), where T is an (m − 2n) × 2n binary

matrix. As a result, the matrix Θ and the half-LLR co-

eﬃcients in Eq. (20) both acquire additional blocks of

linearly-dependent components, while the parity-check

matrix dual to Θ can be chosen in the form

H =

H 0

m−2n

. (27)

Since the relevant error in Eq. (20) is eθ = (e|eT ), the

lower row-block of H gives a zero syndrome, just like the

lower row-block in Eq. (24). Basically, after degeneracy

is taken into account, the number of independent com-

ponents of eθ is 2n − r, the same as for e; ﬁnal matrices

and H

can always be constructed to have m

≤ m

max

components given by Eq. (26). Of course, the actual re-

sulting matrices, as well as the ﬁnal half-LLR coeﬃcients

do depend on the assumed error model.

How much freedom is there to choose the matrices Θ

and H

? For the purpose of ML decoding, we need to

go over the entire linear space C

generated by the rows

of Θ

; the choice of basis is irrelevant. The same is true

regarding the parity check matrix H

. These are the same

symmetries as for a generator and a parity-check matrices

of a binary code.

In essence, the original quantum subsystem code with

gauge G and logical L generator matrices has been trans-

formed into a classical binary code, with the transfor-

mation dependent in a non-trivial fashion on the error

probability distribution. This binary code has length

≤ m

max

, and it encodes 2k bits.

Further, any non-trivial linear combination of rows of

Lθ with rows of Gθ has weight lower-bounded by the dis-

tance of the quantum code, which gives the lower bound

≥ d on the distance of this binary code. In general,

given the structure of the row-reduction transformation,

the distance may be quite a bit larger, possibly scaling

linearly with m

. Notice, however, that with highly non-

uniform error probability distribution, more relevant pa-

rameter is not the distance, but the corresponding quan-

tity weighted with half-LLR coeﬃcients K

(ﬁn)

, related

to the probability of the most-likely logical error. By

construction, this quantity is exactly the same as for the

original quantum code.

Let us consider an important case of a sparse origi-

nal parity check matrix H, e.g., with row and column

weights bounded. This requires a low-density parity-

check (LDPC) code with a sparse stabilizer generator

matrix

H = HΣ, and an error channel with the gen-

erator matrix θ = (I

|T ) also sparse. When acting

on the parity check matrix, each row-reduction trans-

formation drops a column of H, and may also add one

or more rows of weight 3, see Sec. IV A 2. Thus, the

parity-check matrix of the classical binary code describ-

ing the marginal probability distribution of error equiv-

alence classes must also be sparse, with row weights not

exceeding w

≤ max(w, 3), where w is the maximum row

weight of H in Eq. (27).

C. Marginal distribution for output errors in a

good measurement circuit

This discussion also applies to the code associated with

the error equivalence group of a good EDC. In this case

the matrices G and H have 2N columns each, where N is

the number of circuit locations, and their ranks are given

by Eqs. (12) and (14). Just as any circuit error can be

pushed all the way to the right, row-reduction can also be

done starting with the bits at the beginning of the circuit

and pushing toward its output. This way, a circuit error

equivalence class can be characterized by

= 2N −rank G = 2n

+ f = (n

+ k) + (n

−κ) (28)

bits, where n

+ k is the number of linearly-independent

error equivalence classes in an [[n

, k]] stabilizer code and

− κ is the number of syndrome bits measured in the

circuit. Alternatively, as in a subsystem code with κ

gauge qubits, `

= n

+k −κ is the exponent in Eq. (26),

and n

is the number of additional error positions right

before the measurements. As in the previous section, this

gives an upper bound on the maximum number M

columns in the matrices Θ

and H

that may be necessary,

≤ M

max

= 2

− 1. (29)

After row-reduction for all generators of the circuit EEG,

we get a classical [M

, 2k, d

] binary code, where k is the

number of encoded qubits, and d

≥ d.

Is this an LDPC code? This question has not been

answered in the previous section: the parity check matrix

H of the circuit code is not necessarily sparse as its row

weights scale linearly with circuit depth. To analyze the

sparsity of the output-error parity-check matrix H

, write

it in a block form similar to Eq. (27),

, (30)

where the upper row-block contains only the original

columns of the circuit EEG parity check matrix H re-

maining after row-reduction steps (i

) and (ii

), while any

row with exactly three non-zero entries added in a step

(iii

) goes to the lower row-block. Further, if we assume

circuit EEG H in the form (27), with the lower row-

block of bounded weight (e.g., w = 3 for depolarizing

errors), any potentially large-weight row in H

must cor-

respond to (i) a stabilizer generator of the output code,

or (ii) a generator of the group H

, see Ref. 22. All of

these have bounded weights for any bounded-weight sta-

bilizer LDPC code with a measurement circuit where sta-

bilizer generators are measured independently and with a

bounded number of ancillary qubits. On the other hand,

these conditions do not generally hold for any family of

concatenated codes based on a given code, or for sub-

system codes where a stabilizer generator may have an

unbounded weight or cannot be expressed as a product of

gauge generators with a bounded number of terms. Nev-

ertheless, even in these cases, given a potentially very

large number of columns (29), it is reasonable to expect

the ﬁnal H

to be a sparse matrix, with only a small frac-

tion of non-zero elements.

V. DECODING STRATEGIES

A. Decoding based on circuit EEG

The present approach is to analyze error propagation

in a Cliﬀord measurement circuit in terms of its circuit

EEG characterized by a logical generator matrix L and

either a generator matrix G or a parity check matrix H.

With a minor constraint on the circuit, these correspond

to the circuit subsystem code constructed in Refs. 21 and

22. More generally, these matrices form (a half of) a CSS

code, with generator matrices G

= G and G

= H,

while rows of L deﬁne X-type logical operators. Simply

put, any decoder capable of dealing with a CSS code with

uncorrelated X-type errors can now be used to account

for error correlations in a given measurement circuit.

Additional correlations can also be accounted for. As-

suming a Pauli error channel (with or without correla-

tions between diﬀerent circuit locations) characterized by

a coupling matrix θ and a set of half-LLR coeﬃcients K

(one per column), probability of a circuit error equivalent

to a given one is given by the Ising partition function (13),

with the spin-bond coupling matrix Gθ.

As already discussed, for ML decoding we need to ﬁnd

the largest of the Ising partition functions (19). Such

a calculation can be expensive. Indeed, given a code

encoding k qubits, we need to compute and choose the

largest of all 2

partition functions corresponding to all

non-trivial sectors; this can only be done in reasonable

time for a code with k small. Previously, a computa-

tionally eﬃcient ML decoder using this approach and 2D

tensor network contraction for computing the partition

functions has been constructed for surface codes[45] in

the channel model (perfect syndrome measurement).

Feasible approaches for evaluating the partition func-

tions (19) include tensor network contraction (see, e.g.,

in Ref. 46, for a 3D network contraction with complexity

scaling as ∝ nχ

, where χ is the bond dimension) and

Monte-Carlo (MC) methods constructed speciﬁcally for

eﬃcient calculations of free energy diﬀerences, e.g., the

non-equilibrium dynamics method [47] or the classical

Bennett acceptance ratio[48]. In application to surface

codes in FT regime, MC calculations are in essence sim-

ulations of bond-disordered 3D Ising model; such calcula-

tions can be done using GPU[49], FPGA[50], or TPU[51]

hardware acceleration.

Notice also that the circuit code is extremely degener-

ate: with Hadamard, Phase, and CNOT gates the rows

of the generator matrix G have (quaternary) weights not

exceeding three. On the other hand, the row weights

of the parity-check matrix H scale linearly with the cir-

cuit depth. By these reasons, iterative decoders like be-

lief propagation (BP) are expected to fare even worse

than with the usual (not so degenerate) quantum LDPC

codes[52–54].

B. Generator-based decoding via marginal

distributions

The calculation of the Ising partition functions needed

for ML decoding can be simpliﬁed via partial summation

over a subset of the spins. Technically, this corresponds

to using a marginal distribution for the subset of variables

needed, as discussed in Sec. IV.

Denote V the set of rows of the original generator ma-

trix G [also, rows in the upper block of Θ in Eq. (20)].

Then, row-reduction in an increasing sequence of subsets

≡ ∅ ⊂ I

⊂ I

⊂ . . . ⊂ I

≡ V (31)

deﬁnes a sequence of mutually-dual pairs of matrices

{Θ

(j)

, H

(j)

} with m

columns, where

(j)

, j ∈ {0, . . . , s}. (32)

The ranks of logical generator matrices remain the same,

rank L

(j)

= 2k, while the sequence r

≡ rank G

(j)

decreasing with increasing j, ending at r

= 0. In

essence, this deﬁnes a sequence of asymmetric (half) CSS

codes [[m

, 2k]], with generator matrices G

(j)

= G

(j)

and

(j)

= H

(j)

, where rows of L

(j)

deﬁne X-type logical

operators. The sequence ends with a CSS code with an

empty X-generator matrix, i.e., a classical binary code

with the parity check matrix H

(s)

For each of the codes in the sequence (32), there is

also a set of half-LLR coeﬃcients {K

(j)

, 1 ≤ b ≤ m

Given an error vector e ∈ F

matching the syndrome,

ML decoding can be done by choosing the largest of the

Ising partition functions [cf. Eq. (19)]



(j)

, {K

(j)

(−1)

}



, e

= e + αL

(j)

, α ∈ F

(33)

By construction, the result should be the same for ev-

ery j ≤ s. However, the complexities for computing the

partition functions may diﬀer dramatically.

One possibility for exact ML decoding is thus to choose

a subset I ⊂ V to optimize the partition function calcula-

tion. In particular, one option is to choose I

such that

all rows of weights one, two, and three are eliminated

from the corresponding matrix G

(1)

. This minimizes the

number of columns in the exact generator matrix of the

marginal-distribution. Even though all rows in the origi-

nal circuit EEG generator matrix G have row weights not

exceeding three, row-reduction on rows of weight three

tends to create higher-weight rows. Thus, in general,

> 0: the resulting partition function is not expected

to be trivial.

To quote some numbers, Tables I and II give the dimen-

sions of generator matrices and row-reduced generator

matrices for two code families: the toy codes [[n

, 1, d

= 1]] with stabilizer generators Z

i+1 mod n

0 ≤ i < n

, and the rotated toric codes (Ex. 11 in Ref. 55)

[[n

, 1, 2t + 1]] with n

= t

+ (t + 1)

and stabilizer gen-

erators Z

i+t mod n

i+t+1 mod n

i+2t+1 mod n

, where

0 ≤ i < n

and t = 1, 2, . . ., with up to three measure-

ments cycles. The simplest examples of the measurement

circuits used for these two code families are shown in

Figs. 3 and 4, respectively. In particular, while the origi-

nal generator matrix for the circuit EEG of the [[13, 1, 5]]

code with measurements repeated n

cyc

= 3 times has di-

mensions 1066 × 1092, row-reduction of rows of weight

up to three reduces the dimension to 182 × 247.

•

input

•

output

•

|0i

4 10 16

|0i

5 11 17

|0i

6 12 18





















FIG. 3. Single-cycle circuit measuring the overcomplete set

, Z

} of stabilizer generators for the toy code

with n

= 3 data and n

= 3 ancillary qubits.

Second possibility for exact ML decoding is to perform

a summation over all spins, as described in Sec. IV C.

This gives a probability distribution directly for the er-

ror equivalence classes; the logarithm of probability of

an error equivalent to e is given, up to an additive con-

stant, by a weighted sum of the half-LLR coeﬃcients

−2

(ﬁn)

. Unfortunately, such an exact expression

is expected to have an exponentially long list of coeﬃ-

cients, see Eqs. (28) and (29) for an upper bound. The

corresponding column numbers are large even for the rel-

atively simple circuits in Tables I and II. In fact, the

circuit parameters generator matrix dimensions remaining columns m



cyc

orig w = 2 w = 3 ﬁnal w

ﬁn

 = 10

−2

 = 10

−1

3 3 1 3 30 × 36 3 × 10 0 × 10 0 × 10 0 10 10 7

5 5 1 5 50 × 60 5 × 16 0 × 16 0 × 16 0 16 16 11

7 7 1 7 70 × 84 7 × 22 0 × 22 0 × 22 0 22 22 15

3 6 2 3 72 × 78 9 × 19 3 × 19 2 × 43 21 38 21 10

5 10 2 5 120 × 130 15 × 31 5 × 31 3 × 79 21 69 34 16

7 14 2 7 168 × 182 21 × 43 7 × 43 4 × 115 21 99 48 22

3 9 3 3 102 × 108 15 × 28 6 × 27 4 × 75 35 69 31 13

5 15 3 5 170 × 180 25 × 46 10 × 45 7 × 116 20 104 50 21

7 21 3 7 238 × 252 35 × 64 14 × 63 10 × 157 20 140 69 29

TABLE I. Parameters of the original and row-reduced generator matrices for repetition code circuits as in Fig. 3 but with n

cyc

measurement cycles, n

data and n

= n

cyc

ancillary qubits. Also shown are dimensions of row-reduced generator matrices

with rows of weights w = 2 and w = 3 (and smaller) eliminated; w

ﬁn

is the minimum row-weight of the ﬁnal generator matrix

with the smallest number of rows remaining. Remaining columns m



is the number of columns after columns with |K

| <  are

dropped from the ﬁnal generator matrix, assuming p = 0.05 corresponding to a half-LLR value K ≈ 1.472. The last column

gives the value of `

in the upper bound (29) on the number of columns.

circuit parameters generator matrix dimensions remaining columns m



cyc

orig w = 2 w = 3 w = 4 ﬁnal w

ﬁn

 = 10

−3

 = 10

−2

 = 10

−1

5 5 1 3 130 × 140 25 × 35 20 × 35 16 × 43 13 × 75 10 65 47 23 11

5 10 2 3 280 × 290 55 × 65 45 × 65 37 × 81 32 × 139 10 112 82 46 16

5 15 3 3 410 × 420 85 × 95 70 × 95 58 × 119 51 × 203 10 159 118 70 21

13 13 1 5 338 × 364 65 × 91 52 × 91 40 × 115 32 × 163 9 163 120 61 27

13 26 2 5 728 × 754 143 × 169 117 × 169 93 × 217 83 × 291 9 273 213 118 40

13 39 3 5 1066 × 1092 221 × 247 182 × 247 146 × 319 134 × 419 9 384 305 182 53

25 25 1 7 650 × 700 125 × 175 100 × 175 76 × 223 58 × 331 9 331 234 116 51

25 50 2 7 1400 × 1450 275 × 325 225 × 325 177 × 421 157 × 555 9 528 413 223 76

TABLE II. Same as in Tab. I but for rotated toric codes [[t

+ (t + 1)

, 1, 2t + 1]] with t = 1, 2, 3, represented as single-generator

cyclic codes, see Example 11 in Ref. 55.

•

• •

•

• •

•

• •

•

• •

•

• •

•

i |0i

⊕ ⊕ ⊕ ⊕

i |0i

⊕ ⊕ ⊕ ⊕

i |0i

⊕ ⊕ ⊕ ⊕

i |0i

⊕ ⊕ ⊕ ⊕

i |0i

⊕ ⊕ ⊕ ⊕

FIG. 4. Cingle-cycle measurement circuit for the ﬁve-qubit

code with n

= 5 ancillary qubits.

Mathematica program (which was not written for eﬃ-

ciency) failed to complete full row-reduction except for

the simplest repetition codes with n

cyc

= 1 round of mea-

surements.

In practice, the list of coeﬃcients K

(ﬁn)

often con-

tains a large number of entries with small magnitudes.

This suggests a range of approximations, where only suf-

ﬁciently large coeﬃcients are preserved, e.g., |K

(ﬁn)

| > ,

for a given  > 0. For incompletely reduced matrices in

Tables I and II, with  = 0.1, the number of columns is re-

duced, roughly, by a factor of two. On general grounds,

the reduction factor is expected to be much larger for

fully-reduced generator matrices.

Alternatively, only a ﬁxed number χ of the largest

in magnitude coeﬃcients may be preserved. This latter

approach is similar in spirit to approximate tensor net-

work contraction using singular value decomposition and

a ﬁxed maximum bond dimension. Notice that if only the

coeﬃcients corresponding to columns of the matrix H

Eq. (30) are preserved, we get an approximation similar

to the conventional phenomenological noise model.

The “history code” decoding algorithm suggested by

Chubb and Flammia[5], can be seen as a special case of

generator-based decoding. Here the measurement circuit

is assumed to have a block structure, and summation is

done over all circuit locations with the exception of those

at the output of each block. With the circuits in Tables

I and II, a block corresponds to a single measurement

cycle. The full probability distribution is then recov-

ered using Bayes rules, assuming no correlations between

measurements in diﬀerent blocks. Evidently, even in this

case, the complexity of the probability distribution ac-

counting for full error correlations could be prohibitive

for exact ML decoding.

C. Parity-based decoding via marginal

distributions

Summations over the spins in the subsets I

with in-

creasing j from a sequence like (31) give marginal distri-

butions which account exactly for increasing numbers of

alternative spin conﬁgurations. Respectively, the degen-

eracy of the corresponding row-reduced half CSS codes

decreases with increasing j, down to a classical code with

no degeneracy at the end of the sequence, j = s. A gen-

eral expectation is that the accuracy of minimum-energy

(ME) decoding would be increasing with increasing j.

ME decoding becomes strictly equivalent to ML decod-

ing for the end-of-the-sequence classical code.

Formally, given a parity-check matrix H and a set of

LLR weights K

, ME decoding aims to ﬁnd an error vec-

tor e which gives the correst syndrome eH

= s and

maximizes the error likelihood

(−1)

or, equiva-

lently, minimizes the error energy E =

. Com-

pared to generator-based decoding, an obvious advantage

is that there is no need to go over all 2

logical oper-

ators. Unfortunately, for generic codes, even the rela-

tively simple problem of ME decoding has an exponential

complexity[56]. Given that the intermediate codes in the

sequence (31) tend to be long, this makes it unpractical

to use generic ME decoding algorithms with exponential

complexity, e.g., the information subset[57, 58] or the

classical Viterbi[59] algorithm.

Notice, however, that the sparsity of parity-check ma-

trices H

(j)

increase with j. The ﬁnal matrix H

(s)

is ex-

pected to be sparse whenever the output code is an LDPC

code. Ideally, this classical code could be decoded with a

linear complexity using a variant of belief propagation

(BP) algorithm[60–63]. Assuming a suﬃciently small

fraction of failed convergence cases, the result would be

equivalent to ML decoding of the correlated errors.

Unfortunately, this does not look so promising given

the fact that this ﬁnal code is expected to have an expo-

nential length, see Eq. (29). Additionally, as conﬁrmed

with limited numerical simulations[64], having a large

number of small in magnitude LLR coeﬃcients tends to

reduce the convergence rate. Using approximate decod-

ing schemes with reduced number of LLR coeﬃcients as

discussed in Sec. V B is expected to help with both is-

sues. Notice, however, that for such a reduction certain

columns in the generator matrix Θ are merely dropped

(puncturing). The corresponding transformation for the

parity-check matrix H is shortening[65], which may re-

duce the sparsity of the resulting matrix and, in turn,

negatively aﬀect the convergence of BP decoding.

VI. DISCUSSION

Improving the accuracy of syndrome-based decoding in

the presence of circuit-level error correlations would both

increase the threshold to scalable quantum computation

and improve the ﬁnite-size performance of quantum er-

ror correction. Present results make two steps in this

direction. First, the observation that ML decoding un-

der these conditions amounts to decoding the code[21, 22]

associated with the circuit EEG, in the absence of cor-

relations. Second, the structure of this latter code can

be signiﬁcantly simpliﬁed using row-reduction transfor-

mations, while leaving the probability of ML decoding

unchanged. A variety of approximate decoding schemes

naturally follow, which interpolate between the exact ML

decoding and the decoding within a relatively simple phe-

nomenological error model, with an additional handle on

the degeneracy of the actual code to be used for decoding.

Designing practical decoding algorithms, e.g., in appli-

cation to surface codes with well-developed near-optimal

syndrome measurement circuits, would require a substan-

tial additional eﬀort. However, this does not look like an

unsolvable problem.

Decoding quantum LDPC codes is a major problem in

the theory of QECCs, especially in a fault-tolerant regime

with syndrome measurement errors present. While a

substantial progress has been made in recent years[66–

70], this problem remains open in application to generic

highly-degenerate codes. Transformations discussed in

Sec. IV change the degeneracy of a quantum code, and

can even map from a quantum to a classical code. This

opens new avenues to explore in decoding, in particular,

new ways to apply existing iterative decoding algorithms

to highly degenerate codes.

Circuit optimization: Traditional approach to quan-

tum error correction is to start with a code, come up

with an FT measurement circuit, compile it to a set of

gates available on a speciﬁc quantum computer, and then

ﬁnally design a decoder. Instead, one could start with the

list of permitted two-qubit gates on a particular device

and enumerate all good error-detecting circuits, increas-

ing circuit depth and the number of gates. Given the cir-

cuit, it is easy to ﬁnd the parameters of the input/output

codes, as well as construct the associated circuit code.

While at the end we would still need to evaluate and

compare the performance of thus constructed codes, such

a procedure could oﬀer a shortcut to circuit optimization

for speciﬁc hardware.

Acknowledgment: This work was supported in part

by the NSF Division of Physics via grant No. 1820939.

Appendix A: Ranks of the matrices

1. Rank of the circuit-EEG generator matrix

Consider a good error-detecting circuit in Fig. 1 with

the constant c = 1/2

in Eq. (11) and the input (or out-

put) code encoding k = n

− r

qubits, where r

is the

rank of the input-code stabilizer group. Here κ is an

integer[22]. Also assume that f measurements are re-

dundant, so that the total number of ancillary qubits is

= κ + r

+ f . Generators of the circuit EEG can be

used to propagate any circuit error all the way to the

output layer on the right, which requires 2N −2(n

)

independent generators, where N is the number of circuit

locations. In addition, there are n

ancillary Z

gener-

ators on the input and n

on the output layers. How-

ever, not all of them are independent. Indeed, when

the n

ancillary Z

operators are propagated to the

right, we get r

independent operators, each containing

a product of ancillary Z

and an element of the stabi-

lizer group, κ independent operators containing ancillary

, and f additional combinations that are redundant

except for a product of ancillary Z

. Overall, this gives

rank G = 2N − 2(n

+ n

) + n

+ κ + r

, which is the

same as in Eq. (12).

2. Rank of the circuit stabilizer group

The circuit stabilizer group must commute with any

EEG generator and also with any logical operator of

the output code (say). Necessarily, its elements must

be spackles[22]. Any spackle is uniquely determined by

its support on the input layer, thus, there are total of

2(n

+ n

) independent spackles. We also have to en-

sure commutativity with ancillary Z

generators on the

left (drop n

spackles) and on the right (drop κ spack-

les). Additional r

spackles have to be removed to en-

sure commutativity with the elements of the output code

stabilizer generators, and 2k spackles to ensure commu-

tativity with the output code logical operators. Overall,

this leaves

rank H = 2(n

+ n

) − n

− κ −r

− 2(n

− r

)

= n

− κ + r

which is exactly the result in Eq. (14).

[1] P. W. Shor, “Scheme for reducing decoherence in quan-

tum computer memory,” Phys. Rev. A 52, R2493 (1995).

[2] C. G. Almudever, L. Lao, X. Fu, N. Khammassi,

I. Ashraf, D. Iorga, S. Varsamopoulos, C. Eichler,

A. Wallraﬀ, L. Geck, A. Kruth, J. Knoch, H. Bluhm,

and K. Bertels, “The engineering challenges in quantum

computing,” in Design, Automation Test in Europe Con-

ference Exhibition (DATE), 2017 (2017) pp. 836–845.

[3] P. Aliferis, D. Gottesman, and J. Preskill, “Quan-

tum accuracy threshold for concatenated distance-3

codes,” Quantum Inf. Comput. 6, 97–165 (2006), quant-

ph/0504218.

[4] David S. Wang, Austin G. Fowler, and Lloyd C. L. Hol-

lenberg, “Surface code quantum computing with error

rates over 1%,” Phys. Rev. A 83, 020302 (2011).

[5] Christopher T. Chubb and Steven T. Flammia, “Statisti-

cal mechanical models for quantum codes with correlated

noise,” (2018), unpublished, 1809.10704.

[6] E. Dennis, A. Kitaev, A. Landahl, and J. Preskill,

“Topological quantum memory,” J. Math. Phys. 43, 4452

(2002).

[7] Austin G. Fowler, Adam C. Whiteside, and Lloyd C. L.

Hollenberg, “Towards practical classical processing for

the surface code,” Phys. Rev. Lett. 108, 180501 (2012).

[8] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N.

Cleland, “Surface codes: Towards practical large-scale

quantum computation,” Phys. Rev. A 86, 032324 (2012).

[9] Austin G. Fowler, Adam C. Whiteside, Angus L.

McInnes, and Alimohammad Rabbani, “Topological

code autotune,” Phys. Rev. X 2, 041003 (2012).

[10] Christopher Chamberland, Guanyu Zhu, Theodore J. Yo-

der, Jared B. Hertzberg, and Andrew W. Cross, “Topo-

logical and subsystem codes on low-degree graphs with

ﬂag qubits,” Phys. Rev. X 10, 011022 (2020).

[11] Christopher Chamberland, Aleksander Kubica,

Theodore J Yoder, and Guanyu Zhu, “Triangular

color codes on trivalent graphs with ﬂag qubits,” New

Journal of Physics 22, 023019 (2020).

[12] Christophe Vuillot, Lingling Lao, Ben Criger, Car-

men Garc´ıa Almud´ever, Koen Bertels, and Barbara M.

Terhal, “Code deformation and lattice surgery are gauge

ﬁxing,” New Journal of Physics 21, 033028 (2019).

[13] Giacomo Torlai and Roger G. Melko, “Neural decoder for

topological codes,” Phys. Rev. Lett. 119, 030501 (2017).

[14] S. Krastanov and L. Jiang, “Deep neural network prob-

abilistic decoder for stabilizer codes,” Scientiﬁc Reports

7, 11003 (2017), 1705.09334.

[15] N. P. Breuckmann and X. Ni, “Scalable neural network

decoders for higher dimensional quantum codes,” Quan-

tum 2, 68 (2018), 1710.09489.

[16] Zhih-Ahn Jia, Yuan-Hang Zhang, Yu-Chun Wu, Liang

Kong, Guang-Can Guo, and Guo-Ping Guo, “Eﬃcient

machine-learning representations of a surface code with

boundaries, defects, domain walls, and twists,” Phys.

Rev. A 99, 012307 (2019).

[17] Paul Baireuther, Thomas E. O’Brien, Brian Tarasin-

ski, and Carlo W. J. Beenakker, “Machine-learning-

assisted correction of correlated qubit errors in a topo-

logical code,” Quantum 2, 48 (2018).

[18] Christopher Chamberland and Pooya Ronagh, “Deep

neural decoders for near term fault-tolerant experi-

ments,” Quantum Science and Technology 3, 044002

(2018).

[19] P. Baireuther, M. D. Caio, B. Criger, C. W. J. Beenakker,

and T. E. O’Brien, “Neural network decoder for topolog-

ical color codes with circuit level noise,” New Journal of

Physics 21, 013003 (2019).

[20] Nishad Maskara, Aleksander Kubica, and Tomas

Jochym-O’Connor, “Advantages of versatile neural-

network decoding for topological codes,” Phys. Rev. A

99, 052351 (2019).

[21] D. Bacon, S. T. Flammia, A. W. Harrow, and J. Shi,

“Sparse quantum codes from quantum circuits,” in Pro-

ceedings of the Forty-Seventh Annual ACM on Sympo-

sium on Theory of Computing, STOC ’15 (ACM, New

York, NY, USA, 2015) pp. 327–334, 1411.3334.

[22] D. Bacon, S. T. Flammia, A. W. Harrow, and J. Shi,

“Sparse quantum codes from quantum circuits,” IEEE

Transactions on Information Theory 63, 2464–2479

(2017).

[23] Jozef Streˇcka, “Generalized algebraic transformations

and exactly solvable classical-quantum models,” Physics

Letters A 374, 3718 – 3722 (2010).

[24] Christopher Chamberland and Michael E. Beverland,

“Flag fault-tolerant error correction with arbitrary dis-

tance codes,” Quantum 2, 53 (2018), 1708.02246.

[25] C. Chamberland and A. W. Cross, “Fault-tolerant magic

state preparation with ﬂag qubits,” Quantum 3, 143

(2019), 1811.00566.

[26] Rui Chao and Ben W. Reichardt, “Quantum error correc-

tion with only two extra qubits,” Phys. Rev. Lett. 121,

050502 (2018).

[27] H´ector Bomb´ın, “Single-shot fault-tolerant quantum er-

ror correction,” Phys. Rev. X 5, 031043 (2015).

[28] Benjamin J. Brown, Naomi H. Nickerson, and Dan E.

Browne, “Fault-tolerant error correction with the gauge

color code,” Nature Communications 7, 12302 (2016).

[29] Earl T. Campbell, “A theory of single-shot error correc-

tion for adversarial noise,” Quantum Science and Tech-

nology 4, 025006 (2019), 1805.09271.

[30] I. Dumer, A. A. Kovalev, and L. P. Pryadko, “Thresh-

olds for correcting errors, erasures, and faulty syndrome

measurements in degenerate quantum codes,” Phys. Rev.

Lett. 115, 050502 (2015), 1412.6172.

[31] A. A. Kovalev, S. Prabhakar, I. Dumer, and L. P.

Pryadko, “Numerical and analytical bounds on threshold

error rates for hypergraph-product codes,” Phys. Rev. A

97, 062320 (2018), 1804.01950.

[32] David Poulin, “Stabilizer formalism for operator quan-

tum error correction,” Phys. Rev. Lett. 95, 230504

(2005).

[33] Dave Bacon, “Operator quantum error-correcting subsys-

tems for self-correcting quantum memories,” Phys. Rev.

A 73, 012340 (2006).

[34] Daniel Gottesman, Stabilizer Codes and Quantum Error

Correction, Ph.D. thesis, Caltech (1997).

[35] A. R. Calderbank, E. M. Rains, P. M. Shor, and

N. J. A. Sloane, “Quantum error correction via codes

over GF(4),” IEEE Trans. Info. Theory 44, 1369–1387

(1998).

[36] Jeroen Dehaene and Bart De Moor, “Cliﬀord group, sta-

bilizer states, and linear and quadratic operations over

GF(2),” Phys. Rev. A 68, 042318 (2003).

[37] Scott Aaronson and Daniel Gottesman, “Improved sim-

ulation of stabilizer circuits,” Phys. Rev. A 70, 052328

(2004).

[38] Bin Dai, Shilin Ding, and Grace Wahba, “Multivariate

Bernoulli distribution,” Bernoulli 19, 1465–1483 (2013).

[39] F. Wegner, “Duality in generalized Ising models and

phase transitions without local order parameters,” J.

Math. Phys. 2259, 12 (1971).

[40] A. J. Landahl, J. T. Anderson, and P. R. Rice, “Fault-

tolerant quantum computing with color codes,” (2011),

presented at QIP 2012, December 12 to December 16,

arXiv:1108.5738.

[41] A. A. Kovalev and L. P. Pryadko, “Spin glass re-

ﬂection of the decoding transition for quantum error-

correcting codes,” Quantum Inf. & Comp. 15, 0825

(2015), arXiv:1311.7688.

[42] Lars Onsager, “Crystal statistics. I. a two-dimensional

model with an order-disorder transition,” Phys. Rev. 65,

117–149 (1944).

[43] Shigeo Naya, “On the spontaneous magnetizations of

honeycomb and Kagom´e Ising lattices,” Progress of The-

oretical Physics 11, 53–62 (1954).

[44] Michael E. Fisher, “Transformations of Ising models,”

Phys. Rev. 113, 969–981 (1959).

[45] Sergey Bravyi, Martin Suchara, and Alexander Vargo,

“Eﬃcient algorithms for maximum likelihood decoding

in the surface code,” Phys. Rev. A 90, 032326 (2014).

[46] Markus Hauru, Clement Delcamp, and Sebastian Miz-

era, “Renormalization of tensor networks using graph-

independent local truncations,” Phys. Rev. B 97, 045111

(2018).

[47] M. de Koning, Wei Cai, A. Antonelli, and S. Yip,

“Eﬃcient free-energy calculations by the simulation of

nonequilibrium processes,” Computing in Science Engi-

neering 2, 88–96 (2000).

[48] Charles H. Bennett, “Eﬃcient estimation of free energy

diﬀerences from Monte Carlo data,” Journal of Compu-

tational Physics 22, 245268 (1976).

[49] Tobias Preis, Peter Virnau, Wolfgang Paul, and Jo-

hannes J. Schneider, “{GPU} accelerated monte carlo

simulation of the 2d and 3d ising model,” Journal of Com-

putational Physics 228, 4468 – 4477 (2009).

[50] A. Gilman, A. Leist, and K. A. Hawick, “3D lat-

tice Monte Carlo simulations on FPGAs,” in Proceed-

ings of the International Conference on Computer De-

sign (CDES) (The Steering Committee of The World

Congress in Computer Science, Computer Engineering

and Applied Computing (WorldComp), 2013).

[51] Kun Yang, Yi-Fan Chen, Georgios Roumpos, Chris

Colby, and John Anderson, “High performance Monte

Carlo simulation of Ising model on TPU clusters,”

(2019), unpublished, 1903.11714.

[52] D. Poulin and Y. Chung, “On the iterative decoding of

sparse quantum codes,” Quant. Info. and Comp. 8, 987

(2008), arXiv:0801.1241.

[53] Ye-Hua Liu and David Poulin, “Neural belief-

propagation decoders for quantum error-correcting

codes,” Phys. Rev. Lett. 122, 200501 (2019), 1811.07835.

[54] Alex Rigby, J. C. Olivier, and Peter Jarvis, “Modi-

ﬁed belief propagation decoders for quantum low-density

parity-check codes,” Phys. Rev. A 100, 012330 (2019),

1903.07404.

[55] A. A. Kovalev, I. Dumer, and L. P. Pryadko, “Design

of additive quantum codes via the code-word-stabilized

framework,” Phys. Rev. A 84, 062319 (2011).

[56] Pavithran Iyer and David Poulin, “Hardness of decoding

quantum stabilizer codes,” IEEE Transactions on Infor-

mation Theory 61, 5209–5223 (2015), arXiv:1310.3235.

[57] E. A. Kruk, “Decoding complexity bound for linear block

codes,” Probl. Peredachi Inf. 25, 103–107 (1989), (In

Russian).

[58] J. T. Coﬀey and R. M. Goodman, “The complexity of

information set decoding,” IEEE Trans. Info. Theory 36,

1031 –1037 (1990).

[59] Andrew J. Viterbi, “Error bounds for convolutional codes

and an asymptotically optimum decoding algorithm,”

IEEE Transactions on Information Theory 13, 260–269

(1967).

[60] R. G. Gallager, Low-Density Parity-Check Codes (M.I.T.

Press, Cambridge, Mass., 1963).

[61] M. P. C. Fossorier, “Iterative reliability-based decoding

of low-density parity check codes,” IEEE Journal on Se-

lected Areas in Communications 19, 908–917 (2001).

[62] Thomas J. Richardson and R¨udiger L. Urbanke, “The ca-

pacity of low-density parity-check codes under message-

passing decoding,” Information Theory, IEEE Transac-

tions on 47, 599–618 (2001).

[63] David Declerq, Marc Fossorier, and Ezio Biglieri, eds.,

Channel Coding. Theory, Algorithms, and Applications

(Academic Press Library in Mobile and Wireless Com-

munications, San Francisco, 2014).

[64] Weilei Zeng and Leonid P. Pryadko, “Iterative decoding

of row-reduced quantum LDPC codes,” (2020), unpub-

lished.

[65] F. J. MacWilliams and N. J. A. Sloane, The Theory

of Error-Correcting Codes (North-Holland, Amsterdam,

1981).

[66] Omar Fawzi, Antoine Grospellier, and Anthony Lever-

rier, “Eﬃcient decoding of random errors for quantum

expander codes,” (2017), unpublished, 1711.08351.

[67] Omar Fawzi, Antoine Grospellier, and Anthony Lever-

rier, “Constant overhead quantum fault-tolerance with

quantum expander codes,” in 59th IEEE Annual Sympo-

sium on Foundations of Computer Science, FOCS 2018,

Paris, France, October 7-9, 2018 (2018) pp. 743–754.

[68] A. Grospellier and A. Krishna, “Numerical study

of hypergraph product codes,” (2018), unpublished,

1810.03681.

[69] Pavel Panteleev and Gleb Kalachev, “Degenerate quan-

tum LDPC codes with good ﬁnite length performance,”

(2019), unpublished, 1904.02703.

[70] Antoine Grospellier, Lucien Grou`es, Anirudh Krishna,

and Anthony Leverrier, “Combining hard and soft de-

coders for hypergraph product codes,” (2020), unpub-

lished, arXiv:2004.11199.

Comments

Products

Project