Fermat's Library | q-2020-02-06-226 annotated/explained version.

Data re-uploading for a universal quantum classiﬁer

Adri

an P

erez-Salinas

1,2

, Alba Cervera-Lierta

1,2

, Elies Gil-Fuster

, and Jos

e I. Latorre

1,2,4,5

Barcelona Supercomputing Center

Institut de Ci

encies del Cosmos, Universitat de Barcelona, Barcelona, Spain

Dept. F

ısica Qu

antica i Astrof

ısica, Universitat de Barcelona, Barcelona, Spain.

Nikhef Theory Group, Science Park 105, 1098 XG Amsterdam, The Netherlands.

Center for Quantum Technologies, National University of Singapore, Singapore.

A single qubit provides suﬃcient compu-

tational capabilities to construct a universal

quantum classiﬁer when assisted with a clas-

sical subroutine. This fact may be surpris-

ing since a single qubit only oﬀers a simple

superposition of two states and single-qubit

gates only make a rotation in the Bloch sphere.

The key ingredient to circumvent these limita-

tions is to allow for multiple data re-uploading.

A quantum circuit can then be organized as

a series of data re-uploading and single-qubit

processing units. Furthermore, both data re-

uploading and measurements can accommo-

date multiple dimensions in the input and sev-

eral categories in the output, to conform to

a universal quantum classiﬁer. The extension

of this idea to several qubits enhances the eﬃ-

ciency of the strategy as entanglement expands

the superpositions carried along with the clas-

siﬁcation. Extensive benchmarking on diﬀer-

ent examples of the single- and multi-qubit

quantum classiﬁer validates its ability to de-

scribe and classify complex data.

1 Introduction

Quantum circuits that make use of a small number of

quantum resources are of most importance to the ﬁeld

of quantum computation. Indeed, algorithms that

need few qubits may prove relevant even if they do

not attempt any quantum advantage, as they may be

useful parts of larger circuits.

A reasonable question to ask is what is the lower

limit of quantum resources needed to achieve a given

computation. A naive estimation for the quantum

cost of a new proposed quantum algorithm is often

made based on analogies with classical algorithms.

But this may be misleading, as classical computation

can play with memory in a rather diﬀerent way as

quantum computers do. The question then turns to

the more reﬁned problem of establishing the absolute

minimum of quantum resources for a problem to be

solved.

We shall here explore the power and minimal needs

of quantum circuits assisted with a classical subrou-

tine to carry out a general supervised classiﬁcation

task, that is, the minimum number of qubits, quan-

tum operations and free parameters to be optimized

classically. Three elements in the computation need

renewed attention. The obvious ﬁrst concern is to

ﬁnd a way to upload data in a quantum computer.

Then, it is necessary to ﬁnd the optimal processing

of information, followed by an optimal measurement

strategy. We shall revisit these three issues in turn.

The non-trivial step we take here is to combine the

ﬁrst two, which is data uploading and processing.

There exist several strategies to design a quantum

classiﬁer. In general, they are inspired in well-known

classical techniques such as artiﬁcial neural networks

[1–3] or kernel methods used in classical machine

learning [4–10]. Some of these proposals [4–6] encode

the data values into a quantum state amplitude, which

is manipulated afterward. These approaches need an

eﬃcient way to prepare and access to these ampli-

tudes. State preparation algorithms are in general

costly in terms of quantum gates and circuit depth,

although some of these proposals use a speciﬁc state

preparation circuit that only require few single-qubit

gates. The access to the states that encode the data

can be done eﬃciently by using a quantum random

access memory (QRAM) [11]. However, this is experi-

mentally challenging and the construction of a QRAM

is still under development. Other proposals exploit

hybrid quantum-classical strategies[7–10]. The clas-

sical parts can be used to construct the correct en-

coding circuit or as a minimization method to extract

the optimal parameters of the quantum circuit, such

as the angles of the rotational gates. In the ﬁrst case,

the quantum circuit computes the hardest instances of

the classical classiﬁcation algorithm as, for example,

the inner products needed to obtain a kernel matrix.

In the second case, the data is classiﬁed directly by

using a parametrized quantum circuit, whose variables

are used to construct a cost function that should be

minimized classically. This last strategy is more con-

venient for a Near Intermediate Scale Quantum com-

putation (NISQ) since, in general, it requires short-

depth circuits, and its variational core makes it more

resistant to experimental errors. Our proposal be-

longs to this last category, the parametrized quantum

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 1

arXiv:1907.02085v2 [quant-ph] 30 Jan 2020

classiﬁers.

A crucial part of a quantum classiﬁcation algorithm

is how data is encoded into the circuit. Proposals

based on kernel methods design an encoding circuit

which implements a feature map from the data space

to the qubits Hilbert space. The construction of this

quantum feature map may vary depending on the al-

gorithm, but common strategies make use of the quan-

tum Fourier transform or introduce data in multiple

qubits using one- and two-qubit gates [9, 10]. Both

the properties of the tensor product and the entangle-

ment generated in those encoding circuits capture the

non-linearities of the data. In contrast, we argue that

there is no need to use highly sophisticated encoding

circuits nor a signiﬁcant number of qubits to intro-

duce these non-linearities. Single-qubit rotations ap-

plied multiple times along the circuit generate highly

non-trivial functions of the data values. The main dif-

ference between our approach and the ones described

above is that the circuit is not divided between the

encoding and processing parts, but implements both

multiple times along the algorithm.

Data re-uploading is considered as a manner of solv-

ing the limitations established by the no-cloning the-

orem. Quantum computers cannot copy data, but

classical devices can. For instance, a neural network

takes the same input many times when processing the

data in the hidden layer neurons. An analogous quan-

tum neural network can only use quantum data once.

Therefore, it makes sense to re-upload classical data

along a quantum computation to bypass this limita-

tion on the quantum circuit. By following this line

of thought, we present an equivalence between data

re-uploading and the Universal Approximation The-

orem applied to artiﬁcial neural networks [12]. Just

as a network composed of a single hidden layer with

enough neurons can reproduce any continuous func-

tion, a single-qubit classiﬁer can, in principle, achieve

the same by re-uploading the data enough times.

The single-qubit classiﬁer illustrates the computa-

tional power that a single qubit can handle. This pro-

posal is to be added to other few-qubit benchmarks

in machine learning [13]. The input redundancy has

also been proposed to construct complex encoding in

parametrized quantum circuits and in the construc-

tion of quantum feature maps [10, 14]. These and

other proposals mentioned in the previous paragraphs

are focused on representing classically intractable or

very complex kernel functions with few qubits. On

the contrary, the focus of this work is to distill the

minimal amount of quantum resources, i.e., the num-

ber of qubits and gates, needed for a given classiﬁca-

tion task quantiﬁed in terms of the number of qubits

and unitary operations. The main result of this work

is, indeed, to show that there is a trade-oﬀ between

the number of qubits needed to perform classiﬁcation

and multiple data re-uploading. That is, we may use

fewer qubits at the price of re-entering data several

times along the quantum computation.

We shall illustrate the power of a single- and multi-

qubit classiﬁers with data re-uploading with a series

of examples. First, we classify points in a plane that

is divided into two areas. Then, we extend the num-

ber of regions on a plane to be classiﬁed. Next, we

consider the classiﬁcation of multi-dimensional pat-

terns and, ﬁnally, we benchmark this quantum clas-

siﬁer with non-convex ﬁgures. For every example,

we train a parametrized quantum circuit that car-

ries out the task and we analyze its performance in

terms of the circuit architecture, i.e., for single- and

multi-qubit classiﬁers with and without entanglement

between qubits.

This paper is structured as follows. First, in Sec-

tion 2, we present the basic structure of a single-qubit

quantum classiﬁer. Data and processing parameters

are uploaded and re-uploaded using one-qubit gen-

eral rotations. For each data point, the ﬁnal state

of the circuit is compared with the target state as-

signed to its class, and the free parameters of the cir-

cuit are updated accordingly using a classical mini-

mization algorithm. Next, in Section 3, we motivate

the data re-uploading approach by using the Univer-

sal Approximation Theorem of artiﬁcial neural net-

works. In Section 4, we introduce the extension of

this classiﬁer to multiple qubits. Then, in Section 5,

we detail the minimization methods used to train the

quantum classiﬁers. Finally, in Section 6, we bench-

mark single- and multi-qubit quantum classiﬁers de-

ﬁned previously with problems of diﬀerent dimensions

and complexity and compare their performance re-

spect to classical classiﬁcation techniques. The con-

clusions of this proposal for a quantum classiﬁer are

exposed in Section 7.

2 Structure of a single-qubit quantum

classiﬁer

The global structure of any quantum circuit can be

divided into three elements: uploading of informa-

tion onto a quantum state, processing of the quan-

tum state, and measurement of the ﬁnal state. It is

far from obvious how to implement each of these ele-

ments optimally to perform a speciﬁc operation. We

shall now address them one at a time for the task of

classiﬁcation.

2.1 Re-uploading classical information

To load classical information onto a quantum circuit is

a highly non-trivial task [4]. A critical example is the

processing of big data. While there is no in-principle

obstruction to upload large amounts of data onto a

state, it is not obvious how to do it.

The problem we address here is not related to a

large amount of data. It is thus possible to consider a

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 2

quantum circuit where all data are loaded in the co-

eﬃcients of the initial wave function [8, 9, 13–15]. In

the simplest of cases, data are uploaded as rotations of

qubits in the computational basis. A quantum circuit

would then follow that should perform some classiﬁ-

cation.

This strategy would be insuﬃcient to create a uni-

versal quantum classiﬁer with a single qubit. A ﬁrst

limitation is that a single qubit only has two degrees

of freedom, thus only allowing to represent data in

a two-dimensional space. No quantum classiﬁer in

higher dimensions can be created if this architecture

is to be used. A second limitation is that, once data

is uploaded, the only quantum circuit available is a

rotation in the Bloch sphere. It is easy to prove that

a single rotation cannot capture any non-trivial sepa-

ration of patterns in the original data.

We need to turn to a diﬀerent strategy, which turns

out to be inspired by neural networks. In the case of

feed-forward neural networks, data are entered in a

network in such a way that they are processed by sub-

sequent layers of neurons. The key idea is to observe

that the original data are processed several times, one

for each neuron in the ﬁrst hidden layer. Strictly

speaking, data are re-uploaded onto the neural net-

work. If neural networks were aﬀected by some sort

of no-cloning theorem, they could not work as they

do. Coming back to the quantum circuit, we need to

design a new architecture where data can be intro-

duced several times into the circuit.

The central idea to build a universal quantum clas-

siﬁer with a single qubit is thus to re-upload classical

data along with the computation. Following the com-

parison with an artiﬁcial neural network with a single

hidden layer, we can represent this re-upload diagram-

matically, as it is shown in Figure 1. Data points in a

neural network are introduced in each processing unit,

represented with squares, which are the neurons of the

hidden layer. After the neurons process these data, a

ﬁnal neuron is necessary to construct the output to be

analyzed. Similarly, in the single-qubit quantum clas-

siﬁer, data points are introduced in each processing

unit, which this time corresponds to a unitary rota-

tion. However, each processing unit is aﬀected by the

previous ones and re-introduces the input data. The

ﬁnal output is a quantum state to be analyzed as it

will be explained in the next subsections.

The explicit form of this single-qubit classiﬁer is

shown in Figure 2. Classical data are re-introduced

several times in a sequence interspaced with process-

ing units. We shall consider the introduction of data

as a rotation of the qubit. This means that data from

three-dimensional space, ~x, can be re-uploaded using

unitaries that rotate the qubit U (~x). Later processing

units will also be rotations as discussed later on. The

whole structure needs to be trained in the classiﬁca-

tion of patterns.

As we shall see, the performance of the single-qubit

(a) Neural network (b) Quantum classiﬁer

Figure 1: Simpliﬁed working schemes of a neural network

and a single-qubit quantum classiﬁer with data re-uploading.

In the neural network, every neuron receives input from all

neurons of the previous layer. In contrast with that, the

single-qubit classiﬁer receives information from the previous

processing unit and the input (introduced classically). It pro-

cesses everything all together and the ﬁnal output of the

computation is a quantum state encoding several repetitions

of input uploads and processing parameters.

quantum classiﬁer will depend on the number of re-

uploads of classical data. This fact will be explored

in the results section.

2.2 Processing along re-uploading

The single-qubit classiﬁer belongs to the category of

parametrized quantum circuits. The performance of

the circuit is quantiﬁed by a ﬁgure of merit, some

speciﬁc χ

to be minimized and deﬁned later. We

need, though, to specify the processing gates present

in the circuit in terms of a classical set of parameters.

Given the simple structure of a single-qubit circuit

presented in Figure 2, the data is introduced in a sim-

ple rotation of the qubit, which is easy to character-

ize. We just need to use arbitrary single-qubit rota-

tions U (φ

, φ

) ∈ SU(2). We will write U(

φ) with

φ = (φ

, φ

). Then, the structure of the universal

quantum classiﬁer made with a single qubit is

φ, ~x) ≡ U (

)U(~x) . . . U(

)U(~x), (1)

which acts as

|ψi = U(

φ, ~x)|0i. (2)

The ﬁnal classiﬁcation of patterns will come from

the results of measurements on |ψi. We may introduce

the concept of processing layer as the combination

L(i) ≡ U (

)U(~x), (3)

so that the classiﬁer corresponds to

φ, ~x) = L(N) . . . L(1), (4)

where the depth of the circuit is 2N . The more layers

the more representation capabilities the circuit will

have, and the more powerful the classiﬁer will be-

come. Again, this follows from the analogy to neural

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 3

L(1) L(N)

|0i U (~x)

)

···

U (~x)

)

(a) Original scheme

L(1) L(N)

|0i U



, ~x



···



, ~x



(b) Compressed scheme

Figure 2: Single-qubit classiﬁer with data re-uploading. The

quantum circuit is divided into layer gates L(i), which con-

stitutes the classiﬁer building blocks. In the upper circuit,

each of these layers is composed of a U (~x) gate, which up-

loads the data, and a parametrized unitary gate U(

φ). We

apply this building block N times and ﬁnally compute a cost

function that is related to the ﬁdelity of the ﬁnal state of

the circuit with the corresponding target state of its class.

This cost function may be minimized by tunning the

pa-

rameters. Eventually, data and tunable parameters can be

introduced with a single unitary gate, as illustrated in the

bottom circuit.

networks, where the size of the intermediate hidden

layer of neurons is critical to represent complex func-

tions.

There is a way to compactify the quantum circuit

into a shorter one. This can be done if we incorporate

data and processing angles in a single step. Then, a

layer would only need a single rotation to introduce

data and tunable parameters, i.e. L(i) = U(

φ, ~x). In

addition, each data point can be uploaded with some

weight w

. These weights will play a similar role as

weights in artiﬁcial neural networks, as we will see in

the next section. Altogether, each layer gate can be

taken as

L(i) = U



+ ~w

◦~x



, (5)

where ~w

◦ ~x =



, w



is the Hadamard

product of two vectors. In case the data points have

dimension lesser than three, the rest of ~x components

are set to zero. Such an approach reduces the depth of

the circuit by half. Further combinations of layers into

fewer rotations are also possible, but the nonlinearity

inherent to subsequent rotations would be lost, and

the circuit would not be performing well.

Notice that data points are introduced linearly into

the rotational gate. Non-linearities will come from

the structure of these gates. We chose this encoding

function as we believe it is one of the lesser biased

ways to encode data with unknown properties. Due

to the structure of single-qubit unitary gates, we will

see that this encoding is particularly suited for data

with rotational symmetry. Still, it can also classify

other kinds of data structures. We can also apply

other encoding techniques, e.g. the ones proposed in

Ref. [10], but for the scope of this work, we have

just tested the linear encoding strategy as a proof of

concept of the performance of this quantum classiﬁer.

It is also possible to enlarge the dimensionality of

the input space in the following way. Let us extend

the deﬁnition of i-th layer to

L(i) = U



(k)

+ ~w

(k)

◦~x

(k)



···U



(1)

+ ~w

(1)

◦~x

(1)



(6)

where each data point is divided into k vectors of di-

mension three. In general, each unitary U could ab-

sorb as many variables as freedom in an SU(2) uni-

tary. Each set of variables act at a time, and all of

them have been shown to the circuit after k iterations.

Then, the layer structure follows. The complexity of

the circuit only increases linearly with the size of the

input space.

2.3 Measurement

The quantum circuit characterized by a series of pro-

cessing angles {θ

} and weights {w

} delivers a ﬁnal

state |ψi, which needs to be measured. The results

of each measurement are used to compute a χ

that

quantiﬁes the error made in the classiﬁcation. The

minimization of this quantity in terms of the classical

parameters of the circuit can be organized using any

preferred supervised machine learning technique.

The critical point in the quantum measurement is

to ﬁnd an optimal way to associate outputs from the

observations to target classes. The fundamental guid-

ing principle to be used is given by the idea of max-

imal orthogonality of outputs [16]. This is easily es-

tablished for a dichotomic classiﬁcation, where one of

two classes A and B have to be assigned to the ﬁnal

measurement of the single qubit. In such a case it

is possible to measure the output probabilities P (0)

for |0i and P (1) for |1i. A given pattern could be

classiﬁed into the A class if P (0) > P (1) and into B

otherwise. We may reﬁne this criterium by introduc-

ing a bias. That is, the pattern is classiﬁed as A if

P (0) > λ, and as B otherwise. The λ is chosen to op-

timize the success of classiﬁcation on a training set.

Results are then checked on an independent validation

set.

The assignment of classes to the output reading of

a single qubit becomes an involved issue when many

classes are present. For the sake of simplicity, let us

mention two examples for the case of classiﬁcation to

four distinct classes. One possible strategy consists on

comparing the probability P(0) to four sectors with

three thresholds: 0 ≤ λ

≤ λ

≤ 1. Then, the

value of P(0) will fall into one of them, and classiﬁ-

cation is issued. A second, more robust assignment is

obtained by computing the overlap of the ﬁnal state

to one of the states of a label states-set. This states-

set is to be chosen with maximal orthogonality among

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 4

Figure 3: Representation in the Bloch sphere of four and six

maximally orthogonal points, corresponding to the vertices

of a tetrahedron and an octahedron respectively. The single-

qubit classiﬁer will be trained to distribute the data points in

one of these vertices, each one representing a class.

all of them. This second method needs from the max-

imally orthogonal points in the Bloch sphere. Figure

3 shows the particular cases that can be applied to

a classiﬁcation task of four and six classes. In gen-

eral, a good measurement strategy may need some

prior computational eﬀort and reﬁned tomography of

the ﬁnal state. Since we are proposing a single-qubit

classiﬁer, the tomography protocol will only require

three measurements.

It is possible to interpret the single-qubit classi-

ﬁer in terms of geometry. The classiﬁer opens a 2-

dimensional Hilbert space, i.e., the Bloch sphere. As

we encode data and classiﬁcation within the param-

eters deﬁning rotations, this Hilbert space is enough

to achieve classiﬁcation. Any operation L(i) is a ro-

tation on the Bloch sphere surface. With this point

of view in mind, we can easily see that we can clas-

sify any point using only one unitary operation. We

can transport any point to any other point on the

Bloch sphere by nothing else than choosing the an-

gles of rotation properly. However, this does not

work for several data, as the optimal rotation for some

data points could be very inconvenient for some other

points. However, if more layers are applied, each one

will perform a diﬀerent rotation, and many diﬀerent

rotations together have the capability of enabling a

feature map. Data embedded in this feature space

can be easily separated into classes employing the re-

gions on the Bloch sphere.

2.3.1 A ﬁdelity cost function

We propose a very simple cost function motivated by

the geometrical interpretation introduced above. We

want to force the quantum states |ψ(

θ, ~w, ~x)i to be as

near as possible to one particular state on the Bloch

sphere. The angular distance between the label state

and the data state can be measured with the relative

ﬁdelity between the two states [17]. Thus, our aim

is to maximize the average ﬁdelity between the states

at the end of the quantum circuit and the label states

corresponding to their class. We deﬁne the following

cost function that carries out this task,

(

θ, ~w) =

µ=1



1 − |h

|ψ(

θ, ~w, ~x

)i|



, (7)

where |

i is the correct label state of the µ data

point, which will correspond to one of the classes.

2.3.2 A weighted ﬁdelity cost function

We shall next deﬁne a reﬁned version of the previous

ﬁdelity cost function to be minimized. The set of

maximally orthogonal states in the Bloch sphere, i.e.,

the label states, are written as |ψ

i, where c is the

class. Each of these label states represents one class

for the classiﬁer. Now, we will follow the lead usually

taken in neural network classiﬁcation.

Let us deﬁne the quantity

(

θ, ~w, ~x) = |h

|ψ(

θ, ~w, ~x)i|

, (8)

where M is the total number of training points, |

is the label state of the class c and |ψ(

θ, ~w, ~x)i is the

ﬁnal state of the qubit at the end of the circuit. This

ﬁdelity is to be compared with the expected ﬁdelity of

a successful classiﬁcation, Y

(~x). For example, given

a four-class classiﬁcation and using the vertices of a

tetrahedron as label states (as shown in Figure 3),

one expects Y

(~x) = 1, where s is the correct class,

and Y

(~x) = 1/3 for the other r classes. In general,

(~x) can be written as a vector with one entry equal

to 1, the one corresponding to the correct class, and

the others containing the overlap between the correct

class label state and the other label states.

With these deﬁnitions, we can construct a cost

function which turns out to be inspired by conven-

tional cost functions in artiﬁcial neural networks. By

weighting the ﬁdelities of the ﬁnal state of the circuit

with all label states, we deﬁne the weighted ﬁdelity

cost function as

(~α,

θ, ~w) =

µ=1

c=1



(

θ, ~w, ~x

) − Y

(~x

)



(9)

where M is the total number of training points, C

is the total number of classes, ~x

are the training

points and ~α = (α

, ··· , α

) are introduced as class

weights to be optimized together with

θ and ~w pa-

rameters. This weighted ﬁdelity has more parameters

than the ﬁdelity cost function. These parameters are

the weights for the ﬁdelities.

The main diﬀerence between the weighted ﬁdelity

cost function of Eq. (9) and the ﬁdelity cost function

of Eq. (7) is how many overlaps do we need to com-

pute. The χ

requires as many ﬁdelities as classes

every time we run the optimization subroutine, while

the χ

needs just one. This is not such a big diﬀer-

ence for a few classes and only one qubit. It is possible

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 5

to measure any state with a full tomography process

which, for one qubit, is achievable. However, for many

diﬀerent classes, we expect that one measurement will

be more eﬃcient than many.

Besides the weighted ﬁdelity cost function being

costlier than the ﬁdelity cost function, there is another

qualitative diﬀerence between both. The ﬁdelity cost

function forces the parameters to reach the maximum

in ﬁdelities. Loosely speaking, this ﬁdelity moves the

qubit state to where it should be. The weighted ﬁ-

delity forces the parameters to be close to a speciﬁed

conﬁguration of ﬁdelities. It moves the qubit state to

where it should be and moves it away from where it

should not. Therefore, we expect that the weighted ﬁ-

delity will work better than the ﬁdelity cost function.

Moreover, this extra cost in terms of the number of

parameters of the weighted ﬁdelity cost function will

only aﬀect the classical minimization part of the al-

gorithm. In a sense, we are increasing the classical

processing part to reduce the quantum resources re-

quired for the algorithm, i.e. the number of quantum

operations (layers). This fact gain importance in the

NISQ computation era.

3 Universality of the single-qubit clas-

siﬁer

After analyzing several classiﬁcation problems, we ob-

tain evidence that the single-qubit classiﬁer intro-

duced above can approximate any classiﬁcation func-

tion up to arbitrary precision. In this section, we pro-

vide the motivation for this statement based on the

Universal Approximation Theorem (UAT) of artiﬁcial

neural networks [12].

3.1 Universal Approximation Theorem

Theorem– Let I

= [0, 1]

be the m-dimensional unit

cube and C(I

) the space of continuous functions in

. Let the function ϕ : R → R be a nonconstant,

bounded and continuous function and f : I

→ R

a function. Then, for every  > 0, there exists an

integer N and a function h : I

→ R, deﬁned as

h(~x) =

i=1

ϕ ( ~w

·~x + b

) , (10)

with α

, b

∈ R and ~w

∈ R

, such that h is an ap-

proximate realization of f with precision , i.e.,

|h(~x) − f(~x)| <  (11)

for all ~x ∈ I

In artiﬁcial neural networks, ϕ is the activation

function, ~w

are the weights for each neuron, b

are the

biases and α

are the neuron weights that construct

the output function. Thus, this theorem establishes

that it is possible to reconstruct any continuous func-

tion with a single layer neural network of N neurons.

The proof of this theorem for the sigmoidal activation

function can be found in Ref. [18]. This theorem was

generalized for any nonconstant, bounded and contin-

uous activation function in Ref. [12]. Moreover, Ref.

[12] presents the following corollary of this theorem:

ϕ could be a nonconstant ﬁnite linear combination of

periodic functions, in particular, ϕ could be a non-

constant trigonometric polynomial.

3.2 Universal Quantum Circuit Approximation

The single-qubit classiﬁer is divided into several layers

which are general SU(2) rotational matrices. There

exist many possible decompositions of an SU(2) rota-

tional matrix. In particular, we use

φ) = U(φ

, φ

) = e

iφ

, (12)

where σ

are the conventional Pauli matrices. Using

the SU(2) group composition law, we can rewrite the

above parametrization in a single exponential,

φ) = e

i~ω(

φ)·~σ

, (13)

with ~ω(

φ) =



(

φ), ω

(

φ), ω

(

φ)



and

(

φ) = d N sin ((φ

− φ

)/2) sin (φ

/2) , (14)

(

φ) = d N cos ((φ

− φ

)/2) sin (φ

/2) , (15)

(

φ) = d N sin ((φ

+ φ

)/2) cos (φ

/2) , (16)

where N =



√

1 − cos



−1

and cos d =

cos ((φ

+ φ

)/2) cos (φ

/2).

The single-qubit classiﬁer codiﬁes the data points

into

φ parameters of the U unitary gate. In particu-

lar, we can re-upload data together with the tunable

parameters as deﬁned in Eq. (5), i.e.

φ(~x) = (φ

(~x), φ

(~x)) =

θ + ~w ◦ ~x. (17)

Thus,

U(~x) = U

(~x)U

N−1

(~x) ···U

(~x) =

i=1

i~ω(

(~x))·~σ

(18)

Next, we apply the Baker-Campbell-Hausdorﬀ (BCH)

formula [19] to the above equation,

U(~x) = exp

i=1

~ω(

(~x)) ·~σ + O

corr

. (19)

Notice that the remaining BCH terms O

corr

are

also proportional to Pauli matrices due to [σ

, σ

] =

2i

ijk

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 6

Each ~ω terms are trigonometric functions, uncon-

stant, bounded and continuous. Then

i=1

~ω(

(~x)) =

i=1



(

(~x)), ω

(

(~x)), ω

(

(~x))



i=1



(

+ ~w

◦~x), ω

(

+ ~w

◦~x), ω

(

+ ~w

◦~x)



= (f

(~x), f

(~x)) . (20)

We still have to deal with the remaining terms

corr

of the BCH expansion. Instead of applying

such expansion, we can use again the SU(2) group

composition law to obtain the analytical formula of

U(~x) = e

ξ(~x)·~σ

, where

ξ(~x) will be an inextricably

trigonometric function of ~x. The O

corr

terms are pro-

portional to ~σ matrices, so O

corr

= ~%(~x) · ~σ for some

function ~%(~x). Then,

U(~x) = e

ξ(~x)·~σ

= e

f(~x)·~σ+i~%(~x)·~σ

. (21)

Thus, O

corr

terms can be absorbed in

f(~x).

For each data point ~x, we obtain a ﬁnal state that

will contain these

ξ(~x) functions. With all train-

ing points, we construct a cost function that can in-

clude new parameters α

for each class if we use the

weighted ﬁdelity cost function of Eq. (9). The func-

tion obtained from the combination of

ξ(x) and α

expected to be complex enough to probably represent

almost any continuous function. However, more pa-

rameters are necessary to map this argument with the

UAT expression.

If we compare the parameters of the UAT with the

single-qubit circuit parameters, the ~w

will correspond

with the weights, the

with the biases b

, the number

of layers N of the quantum classiﬁer will correspond

with the number of neurons in the hidden layer and

~ω functions with the activation functions ϕ.

We have explained why it is necessary to re-upload

the data at each layer and why a single qubit could

be a universal classiﬁer. As has been stated before,

an artiﬁcial neural network introduces the data points

in each hidden neuron, weights them and adds some

bias. Here we cannot just copy each data point be-

cause the non-cloning theorem, so we have to re-

upload it at each layer.

4 From single- to multi-qubit quantum

classiﬁer

The single-qubit classiﬁer cannot carry any quantum

advantage respect classical classiﬁcation techniques

such as artiﬁcial neural networks. In the previous

sections, we have deﬁned a quantum mechanical ver-

sion of a neural network with a single hidden layer. In

general, a huge amount of hidden neurons is necessary

to approximate a target function with a single layer.

To circumvent this inconvenience, more hidden layers

are introduced, leading eventually to the concept of

deep neural networks.

By using the single-qubit classiﬁer formalism that

we have introduced in the previous sections, we pro-

pose its generalization to more qubits. The introduc-

tion of multiple qubits to this quantum classiﬁer may

improve its performance as more hidden layers im-

prove the classiﬁcation task of an artiﬁcial neural net-

work. With the introduction of entanglement between

these qubits, we reduce the number of layers of our

classiﬁer as well as propose a quantum classiﬁcation

method that can achieve quantum advantage.

Figure 1 shows the analogy between a neural net-

work with a single hidden layer and a single-qubit

classiﬁer. The generalization of this analogy is not

so obvious. A multi-qubit classiﬁer without entan-

glement could have some similarities with a convolu-

tional neural network, where each qubit could repre-

sent a neural network by itself. However, it is not clear

if the introduction of entanglement between qubits

can be understood as a deep neural network archi-

tecture. The discussion around this analogy as well

as an extended study of the performance of a multi-

qubit classiﬁer is beyond the scope of this work. In

the next subsections, we present a general proposal

for a multi-qubit classiﬁer which we compare with the

single-qubit one in Section 6.

4.1 Measurement strategy and cost function

for a multi-qubit classiﬁer

With a single-qubit classiﬁer, the measurement strat-

egy consisting on comparing the ﬁnal state of the

circuit with a pre-deﬁned target state was achiev-

able. Experimentally, one needs to perform a quan-

tum state tomography protocol of only three mea-

surements. However, if more qubits are to be con-

sidered, tomography protocols become exponentially

expensive in terms of number of measurements.

We propose two measurement strategies for a multi-

qubit classiﬁer. The ﬁrst one is the natural general-

ization of the single-qubit strategy, although it will

become unrealizable for a large number of qubits. We

compare the ﬁnal state of the circuit with one of the

states of the computational basis, one for each class.

The second strategy consist on focusing in one qubit

and depending on its state associate one or other class.

This is similar to previous proposals of binary multi-

qubit classiﬁers [7], although we add the possibility of

multiclass classiﬁcation by introducing several thresh-

olds (see Section 2).

Another part that should be adapted is the deﬁni-

tion of the cost function. In particular, we use diﬀer-

ent functions for each strategy explained above.

For the ﬁrst strategy, we use the ﬁdelity cost func-

tion of Eq. (7). Its generalization to more qubits is

straightforward. However, the orthogonal states used

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 7

|0i L

(1) L

(2) L

(3)

···

(N)

|0i L

(1) L

(2) L

(3)

···

(N)

(a) Ansatz with no entanglement

|0i L

(1)

•

(2)

•

···

•

(N)

|0i L

(1)

•

(2)

•

···

•

(N)

(b) Ansatz with entanglement

Figure 4: Two-qubit quantum classiﬁer circuit without en-

tanglement (top circuit) and with entanglement (bottom

circuit). Here, each layer includes a rotation with data re-

uploading in both qubits plus a CZ gate if there is entangle-

ment. The exception is the last layer, which does not have

any CZ gate associated to it. For a ﬁxed number of layers,

the number of parameters to be optimized doubles the one

needed for a single-qubit classiﬁer.

for a multi-qubit classiﬁer are taken as the computa-

tional basis states. A more sophisticated set of states

could be considered to improve the performance of

this method.

For the second strategy, we use the weighted ﬁdelity

cost function. As stated above, we just focus on one

qubit, thus

c,q

(

θ, ~w, ~x) = h

|ρ

(

θ, ~w, ~x)|

i, (22)

where ρ

is the reduced density matrix of the qubit to

be measured. Then, the weighted ﬁdelity cost func-

tion can be adapted as

(~α,

θ, ~w) =

µ=1

c=1

q=1



c,q

(

θ, ~w, ~x

) − Y

(~x

)



(23)

where we average over all Q qubits that form the clas-

siﬁer. Eventually, we can just measure one of these

qubits, reducing the number of parameters to be op-

timized.

4.2 Quantum circuits examples

The deﬁnition of a multi-qubit quantum classiﬁer cir-

cuit could be as free as is the deﬁnition of a multi-

layer neural network. In artiﬁcial neural networks,

it is far from obvious what should be the number of

hidden layers and neurons per layer to perform some

task. Besides, it is, in general, problem-dependent.

For a multi-qubit quantum classiﬁer, there is extra

degree of freedom in the circuit-design: how to in-

troduce the entanglement. This is precisely an open

problem in parametrized quantum circuits: to ﬁnd a

|0i L

(1) L

(2) L

(3)

···

(N)

|0i L

(1) L

(2) L

(3)

···

(N)

|0i L

(1) L

(2) L

(3)

···

(N)

|0i L

(1) L

(2) L

(3)

···

(N)

(a) Ansatz with no entanglement

|0i L

(1)

•

(2)

•

···

•

(N)

|0i L

(1)

•

(2)

•

···

•

(N)

|0i L

(1)

•

(2)

•

···

•

(N)

|0i L

(1)

•

(2)

•

···

•

(N)

(b) Ansatz with entanglement

Figure 5: Four-qubit quantum classiﬁer circuits. Without

entanglement (top circuit), each layer is composed by four

parallel rotations. With entanglement (bottom circuit) each

layer includes a parallel rotation and two parallel CZ gates.

The order of CZ gates alternates in each layer between (1)-

(2) and (3)-(4) qubits and (2)-(3) and (1)-(4) qubits. The

exception is in the last layer, which does not contain any CZ

gate. For a ﬁxed number of layers, the number of parameters

to be optimized quadruples the ones needed for a single-qubit

classiﬁer.

correct ansatz for the entangling structure of the cir-

cuit.

Figures 4 and 5 show the explicit circuits used in

this work. For a two-qubit classiﬁer without entangle-

ment, and similarly for a four-qubit classiﬁer, we iden-

tify each layer as parallel rotations on all qubits. We

introduce the entanglement using CZ gates between

rotations that are absorbed in the deﬁnition of layer.

For two-qubit classiﬁer with entanglement, we apply a

CZ gate after each rotation with exception of the last

layer. For a four-qubit classiﬁer, two CZ gates are ap-

plied after each rotation alternatively between (1)-(2)

and (3)-(4) qubits and (2)-(3) and (1)-(4) qubits.

The number of parameters needed to perform the

optimization doubles the ones needed for a single-

qubit classiﬁer for the two-qubit classiﬁer and quadru-

ples for the four-qubit classiﬁer. For N layers, the cir-

cuit depth is N for the non-entangling classiﬁers and

2N for the entangling classiﬁers.

5 Minimization methods

The practical training of a parametrized single-qubit

or multi-qubit quantum classiﬁer needs minimization

in the parameter space describing the circuit. This

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 8

is often referred as a hybrid algorithm, where classi-

cal and quantum logic coexist and beneﬁt from one

another. To be precise, the set of {θ

} angles and

} weights, together with α

q,l

parameters if appli-

cable, forms a space to be explored in search of a min-

imum χ

. In parameter landscapes as big as the ones

treated here, or in regular neural network classiﬁca-

tion, the appearance of local minima is ultimately un-

avoidable. The composition of rotation gates renders

a large product of independent trigonometric func-

tions. It is thus clear to see that our problem will

be overly populated with minima. The classical min-

imizer can easily get trapped in a not optimal one.

Our problem is reduced to minimizing a function

of many parameters. For a single-qubit classiﬁer, the

number of parameters is (3 + d)N where d is the di-

mension of the problem, i.e. the dimension of ~x, and

N is the number of layers. Three of these parameters

are the rotational angles and the other d correspond

with the ~w

weight. If using the weighted ﬁdelity cost

function, we should add C extra parameters, one for

each class.

In principle, one does not know how is the parame-

ter landscape of the cost function to be minimized. If

the cost function were, for example, a convex function,

a downhill strategy would be likely to work properly.

The pure downhill strategy is known as gradient de-

scent. In machine learning, the method commonly

used is a Stochastic Gradient Descent (SGD) [20].

There is another special method of minimization

known as L-BFGS-B [21]. This method has been used

in classical machine learning with very good results

[22].

The results we present from now on are starred by

the L-BFGS-B algorithm, as we found it is accurate

and relatively fast. We used open source software [23]

as the core of the minimization with own made func-

tions to minimize. The minimizer is taken as a black

box whose parameters are set by default. As this is

the ﬁrst attempt of constructing a single- or multi-

qubit classiﬁer, further improvements can be done on

the hyperparameters of minimization.

Nevertheless we have also tested a SGD algorithm

for the ﬁdelity cost function. This whole algorithm

has been developed by us following the steps from [17].

The details can be read in Appendix A. In general, we

found that L-BFGS-B algorithm is better than SGD.

This is something already observed in classical neu-

ral networks. When the training set is small, it is

often more convenient to use a L-BFGS-B strategy

than a SGD. We were forced to use small training

sets due to computational capabilities for our simula-

tions. Numerical evidences on this arise when solving

the problems we face for these single- and multi-qubit

classiﬁers with classical standard machine learning li-

braries [22]. This can be understood with a simple

argument. Neural networks or our quantum classiﬁer

are supposed to have plenty of local minima. Neural

networks have got huge products of non linear func-

tions. The odds of having local minima are then large.

In the quantum circuits side, there are nothing but

trigonometric functions. In both cases, if there are a

lot of training points it is more likely to ﬁnd some of

them capable of getting us out of local minima. If this

is the case, SGD is more useful for being faster. On

the contrary, when the training set is small, we have

to pick an algorithm less sensitive to local minima,

such as the L-BFGS-B.

6 Benchmark of a single- and multi-

qubit classiﬁer

We can now tackle some classiﬁcation problems. We

will prove that a single-qubit classiﬁer can perform

a multi-class classiﬁcation for multi-dimensional data

and that a multi-qubit classiﬁer, in general, improves

these results.

We construct several classiﬁers with diﬀerent num-

ber of layers. We then train the circuits with a train-

ing set of random data points to obtain the values of

the free parameters {θ

} and {w

} for each layer and

{α

} when applicable. We use the cost functions de-

ﬁned in Eq. (9) and Eq. (7). Then, we test the perfor-

mance of each classiﬁer with a test set independently

generated and one order of magnitud greater than the

training set. For the sake of reproducibility, we have

ﬁxed the same seed to generate all data points. For

this reason, the test and training set points are the

same for all problems. For more details, we provide

the explicit code used in this work [24].

We run a single-, two- and four-qubit classiﬁers,

with and without entanglement, using the two cost

functions described above. We benchmark several

classiﬁers formed by L = 1, 2, 3, 4, 5, 6, 8 and 10 layers.

In the following subsections, we describe the partic-

ular problems addressed with these single- and multi-

qubit classiﬁers with data re-uploading. We choose

four problem types: a simple binary classiﬁcation,

a classiﬁcation of a ﬁgure with multiple patterns, a

multi-dimensional classiﬁcation and a non-convex ﬁg-

ure.

The code used to deﬁne and benchmark the single-

and multi-qubit quantum classiﬁer is open and can be

found in Ref. [24].

6.1 Simple example: classiﬁcation of a circle

Let us start with a simple example. We create a ran-

dom set of data on a plane with coordinates ~x =

, x

) with x

∈ [−1, 1]. Our goal is to classify these

points according to x

< r

, i.e. if they are inside

or outside of a circle of radius r. The value of the ra-

dius is chosen in such a way that the areas inside and

outside it are equal, that is, r =

, so the proba-

bility of success if we label each data point randomly

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 9

Qubits 1 2 1 2 4

Layers

No Ent. Ent. No Ent. Ent. No Ent. Ent.

1 0.50 0.75 – 0.50 0.76 – 0.76 –

2 0.85 0.80 0.73 0.94 0.96 0.96 0.96 0.96

3 0.85 0.81 0.93 0.94 0.97 0.95 0.97 0.96

4 0.90 0.87 0.87 0.94 0.97 0.96 0.97 0.96

5 0.89 0.90 0.93 0.96 0.96 0.96 0.96 0.96

6 0.92 0.92 0.90 0.95 0.96 0.96 0.96 0.96

8 0.93 0.93 0.96 0.97 0.95 0.97 0.95 0.96

10 0.95 0.94 0.96 0.96 0.96 0.96 0.96 0.97

Table 1: Results of the single- and multi-qubit classiﬁers with data re-uploading for the circle problem. Numbers indicate the

success rate, i.e. number of data points classiﬁed correctly over total number of points. Words “Ent.” and “No Ent.” refer

to considering entanglement between qubits or not, respectively. We have used the L-BFGS-B minimization method with the

weighted ﬁdelity and ﬁdelity cost functions. For this problem, both cost functions lead to high success rates. The multi-qubit

classiﬁer increases this success rate but the introduction of entanglement does not aﬀect it signiﬁcantly.

is 50%. We create a train dataset with 200 random

entries. We then validate the single-qubit classiﬁer

against a test dataset with 4000 random points.

The results of this classiﬁcation are written in Ta-

ble 1. With the weighted ﬁdelity cost function, the

single-qubit classiﬁer achieves more than 90% of suc-

cess with only two layers, that is, 12 parameters. The

results are worse with the ﬁdelity cost function. For

a two-qubit and a four-qubit classiﬁer, two layers are

required to achieve 96% of success rate, that is, 22 pa-

rameters for the two-qubit and 42 for the four-qubit.

The introduction of entanglement does not change the

result in any case. The results show a saturation of

the success rate. Considering more layers or more

qubits does not change this success rate.

The characterization of a closed curved is a hard

problem for an artiﬁcial neural network that works

in a linear regime, although enough neurons, i.e. lin-

ear terms, can achieve a good approximation to any

function. On the contrary, the layers of a single-qubit

classiﬁer are rotational gates, which have an intrinsic

non-linear behavior. In a sense, a circle becomes an

easy function to classify as a linear function is for an

artiﬁcial neural network. The circle classiﬁcation is,

in a sense, trivial for a quantum classiﬁer. We need

to run these classiﬁers with more complex ﬁgures or

problems to test their performance.

It is interesting to compare classiﬁers with diﬀerent

number of layers. Figure 6 shows the result of the

classiﬁcation for a single-qubit classiﬁer of 1, 2, 4 and

8 layers. As with only one layer the best classiﬁcation

that can be achieved consist on dividing the plane in

half, with two layers the classiﬁer catches the circular

shape. As we consider more layers, the single-qubit

classiﬁer readjust the circle to match the correct ra-

dius.

(a) 1 layer (b) 2 layers

Figure 6: Results of the circle classiﬁcation obtained with a

single-qubit classiﬁer with diﬀerent number of layers using the

L-BFGS-B minimizer and the weighted ﬁdelity cost function.

With one layer, the best that the classiﬁer can do is to divide

the plane in half. With two layers, it catches the circular

shape which is readjusted as we consider more layers.

6.2 Classiﬁcation of multiple patterns

We want to show now that the single-qubit classiﬁer

can solve multi-class problems. We divide a 2D plane

into several regions and assign a label to each one.

We propose the following division: three regions cor-

responding to three circular sectors and the interme-

diate space between them. We call this problem the

3-circles problem. This is a hardly non-linear prob-

lem and, consequently, diﬃcult to solve for a classical

neural network in terms of computational power.

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 10

Qubits 1 2 1 2 4

Layers

No Ent. Ent. No Ent. Ent. No Ent. Ent.

1 0.73 0.56 – 0.75 0.81 – 0.88 –

2 0.79 0.77 0.78 0.76 0.90 0.83 0.90 0.89

3 0.79 0.76 0.75 0.78 0.88 0.89 0.90 0.89

4 0.84 0.80 0.80 0.86 0.84 0.91 0.90 0.90

5 0.87 0.84 0.81 0.88 0.87 0.89 0.88 0.92

6 0.90 0.88 0.86 0.85 0.88 0.89 0.89 0.90

8 0.89 0.85 0.89 0.89 0.91 0.90 0.88 0.91

10 0.91 0.86 0.90 0.92 0.90 0.91 0.87 0.91

Table 2: Results of the single- and multi-qubit classiﬁers with data re-uploading for the 3-circles problem. Numbers indicate

the success rate, i.e. number of data points classiﬁed correctly over total number of points. Words “Ent.” and “No Ent.”

refer to considering entanglement between qubits or not, respectively. We have used the L-BFGS-B minimization method with

the weighted ﬁdelity and ﬁdelity cost functions. Weighted ﬁdelity cost function presents better results than the ﬁdelity cost

function. The multi-qubit classiﬁer reaches 0.90 success rate with a lower number of layers than the single-qubit classiﬁer.

The introduction of entanglement slightly increases the success rate respect the non-entangled circuit.

Table 2 shows the results for this four-class prob-

lem. For a single-qubit classiﬁer, a maximum of 92%

of success is achieved with 10 layers, i.e. 54 parame-

ters. From these results, it seems that this problem

also saturates around 91% of success. However, the

introduction of more qubits and entanglement makes

possible this result possible with less parameters. For

two qubits with entanglement, 4 layers are necessary

to achieve the same success as with a single-qubit, i.e.

34 parameters. For four qubits without entanglement

4 layers are also required. Notice also that, although

the number of parameters increases signiﬁcantly with

the number of qubits, some of the eﬀective operations

are performed in parallel.

There is an eﬀect that arises from this more com-

plex classiﬁcation problem: local minima. Notice that

the success rate can decrease when we add more layers

into our quantum classiﬁer.

As with the previous problem, it is interesting to

compare the performance in terms of sucess rate of

classiﬁers with diﬀerent number of layers. Figure 7

shows the results for a two-qubit classiﬁer with no en-

tanglement for 1, 3, 4 and 10 layers. Even with only

one layer, the classiﬁer identiﬁes the four regions, be-

ing the more complicated to describe the central one.

As we consider more layers, the classiﬁer performs

better and adjust these four regions.

6.3 Classiﬁcation in multiple dimensions

As explained in Section 2, there is no restriction in

uploading multidimensional data. We can upload up

to three values per rotation since this is the degrees of

freedom of a SU(2) matrix. If the dimension of data is

larger than that, we can just split the data vector into

subsets and upload each one at a time, as described

explicitly in Eq. (6). Therefore, there is no reason to

limit the dimension of data to the number of degrees

of freedom of a qubit. We can in principle upload any

(a) 1 layer (b) 3 layers

Figure 7: Results for the 3-circles problem using a single-

qubit classiﬁer trained with the L-BFGS-B minimizer and the

weighted ﬁdelity cost function. With one layer, the classiﬁer

intuits the four regions although the central one is diﬃcult

to tackle. With more layers, this region is clearer for the

classiﬁer and it tries to adjust the circular regions.

kind of data if we apply enough gates.

Following this idea we will now move to a more

complicated classiﬁcation using data with 4 coordi-

nates. We use as a problem the four-dimensional

sphere, i.e. classifying data points according to

+ x

< 2/π. Similarly with the previous

problems, x

∈ [−1, 1] and the radius has been chosen

such that the volume of the hypersphere is half of the

total volume. This time, we will take 1000 random

points as the training set because the total volume

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 11

Qubits 1 2 1 2 4

Layers

No Ent. Ent. No Ent. Ent. No Ent. Ent.

1 0.87 0.87 – 0.87 0.87 – 0.90 –

2 0.87 0.87 0.87 0.87 0.92 0.91 0.90 0.98

3 0.87 0.87 0.87 0.89 0.89 0.97 – –

4 0.89 0.87 0.87 0.90 0.93 0.97 – –

5 0.89 0.87 0.87 0.90 0.93 0.98 – –

6 0.90 0.87 0.87 0.95 0.93 0.97 – –

8 0.91 0.87 0.87 0.97 0.94 0.97 – –

10 0.90 0.87 0.87 0.96 0.96 0.97 – –

Table 3: Results of the single- and multi-qubit classiﬁers with data re-uploading for the four-dimensional hypersphere problem.

Numbers indicate the success rate, i.e. the number of data points classiﬁed correctly over the total number of points. Words

“Ent.” and “No Ent.” refer to considering entanglement between qubits or not, respectively. We have used the L-BFGS-B

minimization method with the weighted ﬁdelity and ﬁdelity cost functions. The ﬁdelity cost function gets stuck in some local

minima for the multi-qubit classiﬁers. The results obtained with the weighted ﬁdelity cost function are much better, reaching

the 0.98 with only two layers for the four-qubit classiﬁer. Here, the introduction of entanglement improves signiﬁcantly the

performance of the multi-qubit classiﬁer.

Qubits 1 2 1 2 4

Layers No Ent. Ent. No Ent. Ent. No Ent. Ent.

1 0.34 0.51 – 0.43 0.77 – 0.81 –

2 0.57 0.63 0.59 0.76 0.79 0.82 0.87 0.96

3 0.80 0.68 0.65 0.68 0.94 0.95 0.92 0.94

4 0.84 0.78 0.89 0.79 0.93 0.96 0.93 0.96

5 0.92 0.86 0.82 0.88 0.96 0.96 0.96 0.95

6 0.93 0.91 0.93 0.91 0.93 0.96 0.97 0.96

8 0.90 0.89 0.90 0.92 0.94 0.95 0.95 0.94

10 0.90 0.91 0.92 0.93 0.95 0.96 0.95 0.95

Table 4: Results of the single- and multi-qubit classiﬁers with data re-uploading for the three-class annulus problem. Numbers

indicate the success rate, i.e. the number of data points classiﬁed correctly over the total number of points. Words “Ent.” and

“No Ent.” refer to considering entanglement between qubits or not, respectively. We have used the L-BFGS-B minimization

method with the weighted ﬁdelity and ﬁdelity cost functions. The weighted ﬁdelity cost function presents better success rates

than the ﬁdelity cost function. The multi-qubit classiﬁers improve the results obtained with the single-qubit classiﬁer but the

using of entanglement does not introduce signiﬁcant changes.

increases.

Results are shown in Table 3. A single-qubit

achieves 97% of success with eight layers (82 parame-

ters) using the weighted ﬁdelity cost function. Results

are better if we consider more qubits. For two qubits,

the best result is 98% and it only requires three en-

tangled layers (62 parameters). For four qubits, it

achieves 98% success rate with two layers with entan-

glement, i.e. 82 parameters.

6.4 Classiﬁcation of non-convex ﬁgures

As a ﬁnal benchmark, we propose the classiﬁcation of

a non-convex pattern. In particular, we classify the

points of an annulus with radii r

0.8 − 2/π and

√

0.8. We ﬁx three classes: points inside the

small circle, points in the annulus and points outside

the big circle. So, besides it being a non-convex clas-

siﬁcation task, it is also a multi-class problem. A sim-

pler example, with binary classiﬁcation, can be found

in Appendix B.

The results are shown in Table 4. It achieves 93% of

success with a single-qubit classiﬁer with 10 layers and

a weighted ﬁdelity cost function. With two qubits, it

achieves better results, 94% with three layers. With

four qubits, it reaches a 96% success rate with only

two layers with entanglement.

It is interesting to observe how the single-qubit clas-

siﬁer attempts to achieve the maximum possible re-

sults as we consider more and more layers. Figure 8

shows this evolution in terms of the number of layers

for a single-qubit classiﬁer trained with the weighted

ﬁdelity cost function. It requires four layers to learn

that there are three concentric patterns and the ad-

dition of more layers adjusts these three regions.

6.5 Comparison with classical classiﬁers

It is important to check if our proposal is in some

sense able to compete with actual technology of su-

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 12

(a) 1 layer (b) 2 layers (c) 3 layers (d) 4 layers

(e) 5 layers (f) 6 layers (g) 8 layers (h) 10 layers

Figure 8: Results obtained with the single-qubit classiﬁer for the annulus problem, using the weighted ﬁdelity cost function

during the training. The better results are obtained with a 10 layers classiﬁer (93% of success rate). As we consider more

qubits and entanglement, we can increase the success rate up to 96%, as shows Table 4.

pervised machine learning. To do so we have used the

standard machine learning library scikit-learn [22]

and solved the same problems as we have solved with

the quantum classiﬁer. We have included the four

problems presented in the main paper plus ﬁve extra

problems analyzed in Appendix B. The aim of this

classical benchmarking is not to make an extended

review of what classical machine learning is capable

to perform. The aim is to compare our simple quan-

tum classiﬁer to simple models such as shallow neural

networks and simple support vector machines.

The technical details of the classical classiﬁcation

are the following: the neural network has got one hid-

den layer with 100 neurons, a ReLu activation func-

tion and the solver lbfgs by scikit-learn. The sup-

port vector machine is the default sklearn.svm.SVC.

Some changes in the initialization parameters were

tested with no signiﬁcant diﬀerences.

Table 5 compares the best performance of a neural

network, support vector classiﬁer (SVC), the single-

qubit classiﬁer with ﬁdelity cost function and single-

qubit classiﬁer with a weighted ﬁdelity cost function.

In all problems, the performance of the single-qubit

classiﬁer is, at least, comparable with the classical

methods. In some problems, like the 3-circles problem

and the binary annulus problem, the results of the

single-qubit classiﬁer are better than with the classical

methods.

7 Conclusions

We have proposed a single-qubit classiﬁer that can

represent multidimensional complex ﬁgures. The core

of this quantum classiﬁer is the data re-uploading.

This formalism allows circumventing the limitations

of the no-cloning theorem to achieve a correct gener-

alization of an artiﬁcial neural network with a single

layer. In that sense, we have applied the Universal

Approximation Theorem to prove the universality of

a single-qubit classiﬁer.

The structure of this classiﬁer is the following.

Data and processing parameters are uploaded mul-

tiple times along the circuit by using one-qubit rota-

tions. The processing parameters of these rotations

are diﬀerent at each upload and should be optimized

using a classical minimization algorithm. To do so, we

have deﬁned two cost functions: one inspired in the

traditional neural networks cost functions (weighted

ﬁdelity cost function) and the other, simpler, consist-

ing of the computation of the ﬁdelity of the ﬁnal state

with respect to a target state. These target states

are deﬁned to be maximally orthogonal among them-

selves. Then, the single-qubit classiﬁer ﬁnds the opti-

mal rotations to separate the data points into diﬀerent

regions of the Bloch sphere, each one corresponding

with a particular class.

The single-qubit classiﬁer can be generalized to a

larger number of qubits. This allows the introduction

of entanglement between these qubits by adding two-

qubit gates between each layer of rotations. We use

a particular entangling ansantz as a proof of concept.

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 13

Problem

Classical classiﬁers Quantum classiﬁer

NN SVC χ

Circle 0.96 0.97 0.96 0.97

3 circles 0.88 0.66 0.91 0.91

Hypersphere 0.98 0.95 0.91 0.98

Annulus 0.96 0.77 0.93 0.97

Non-Convex 0.99 0.77 0.96 0.98

Binary annulus 0.94 0.79 0.95 0.97

Sphere 0.97 0.95 0.93 0.96

Squares 0.98 0.96 0.99 0.95

Wavy Lines 0.95 0.82 0.93 0.94

Table 5: Comparison between single-qubit quantum classiﬁer and two well-known classical classiﬁcation techniques: a neural

network (NN) with a single hidden layer composed of 100 neurons and a support vector classiﬁer (SVC), both with the default

parameters as deﬁned in scikit-learn python package. We analyze nine problems: the ﬁrst four are presented in Section 6

and the remaining ﬁve in Appendix B. Results of the single-qubit quantum classiﬁer are obtained with the ﬁdelity and weighted

ﬁdelity cost functions, χ

and χ

deﬁned in Eq. (7) and Eq. (9) respectively. This table shows the best success rate, being

1 the perfect classiﬁcation, obtained after running ten times the NN and SVC algorithms and the best results obtained with

single-qubit classiﬁers up to 10 layers.

The exploration of other possible ansatzes is out of

the scope of this work.

We have benchmarked several quantum classiﬁers

of this kind, made of a diﬀerent number of layers,

qubits and with and without entanglement. The pat-

terns chosen to test these classiﬁers are the points

inside and outside of a circle (simple example) and

similarly for a four-dimensional hypersphere (multi-

dimensional example); a two dimensional region com-

posed by three circles of diﬀerent size (multiple classes

example); and the points outside and inside of an an-

nulus (non-convex example). In all cases, the single-

qubit classiﬁer achieves more than 90% of the success

rate. The introduction of more qubits and entangle-

ment increases this success and reduces the number

of layers required. The weighted ﬁdelity cost function

turns out to be more convenient to achieve better re-

sults than the ﬁdelity cost function. In all problems,

the probability to get stuck in a local minima increases

with the number of layers, an expected result from an

optimization problem involving several parameters.

In summary, we have proposed a quantum classiﬁer

model that seems to be universal by exploiting the

non-linearities of the single-qubit rotational gates and

by re-uploading data several times.

Acknowledgements

APS and JIL acknowledge CaixaBank for its support

of this work through Barcelona Supercomputing Cen-

ter project CaixaBank Computaci´on Cu´antica. The

authors acknowledge the interesting discussions with

Quantic group team members. The authors also ac-

knowledge Xanadu Quantum Computing, in particu-

lar Shahnawaz Ahmed, for writing a tutorial of this

quantum classiﬁer using its full-stack library Penny-

lane [25].

References

[1] M. Schuld, I. Sinayskiy, and F. Petruc-

cione, Quantum Information Processing 13, 2567

(2014).

[2] K. H. Wan, O. Dahlsten, H. Kristj´ansson,

R. Gardner, and M. S. Kim, npj Quantum In-

formation 3, 36 (2017).

[3] E. Torrontegui and J. J. Garc´ıa-Ripoll, EPL (Eu-

rophysics Letters) 125, 30004 (2019).

[4] N. Wiebe, D. Braun, and S. Lloyd, Physics Re-

view Letters 109, 050505 (2012).

[5] P. Rebentrost, M. Mohseni, and S. Lloyd,

Physics Review Letters 113, 130503 (2014).

[6] J. Biamonte, P. Wittek, N. Pancotti, P. Reben-

trost, N. Wiebe, and S. Lloyd, Nature 549 , 195

(2017).

[7] E. Farhi and H. Neven, “Classiﬁcation with quan-

tum neural networks on near term processors,”

(2018), arXiv:1802.06002 [quant-ph] .

[8] M. Schuld, A. Bocharov, K. Svore, and

N. Wiebe, “Circuit-centric quantum classiﬁers,”

(2018), arXiv:1804.00633 [quant-ph] .

[9] V. Havl´ıˇcek, A. D. C´orcoles, K. Temme, A. W.

Harrow, A. Kandala, J. M. Chow, and J. M.

Gambetta, Nature 567, 209 (2019).

[10] M. Schuld and N. Killoran, Physics Review Let-

ters 122, 040504 (2019).

[11] V. Giovannetti, S. Lloyd, and L. Maccone,

Physics Review Letters 100, 160501 (2008).

[12] K. Hornik, Neural Networks 4, 251 (1991).

[13] R. Ghobadi, J. S. Oberoi, and E. Zahedinejhad,

“The power of one qubit in machine learning,”

(2019), arXiv:1905.01390 [quant-ph] .

[14] J. Gil Vidal and D. Oliver Theis, “Input re-

dundancy for parameterized quantum circuits,”

(2019), arXiv:1901.11434 [quant-ph] .

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 14

[15] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fu-

jii, Physics Review A 98, 032309 (2018).

[16] C. W. Helstrom, Quantum detection and estima-

tion theory / Carl W. Helstrom (Academic Press

New York, 1976) pp. ix, p. : 309.

[17] M. A. Nielsen and I. L. Chuang, Quantum Com-

putation and Quantum Information: 10th An-

niversary Edition, 10th ed. (Cambridge Univer-

sity Press, New York, NY, USA, 2011).

[18] G. Cybenko, Mathematics of Control, Signals,

and Systems 2, 303 (1989).

[19] B. C. Hall, Lie Groups, Lie Algebras, and Rep-

resentations An Elementary Introduction (Grad-

uate Texts in Mathematics, 222 (2nd ed.),

Springer, 2015).

[20] M. A. Nielsen, Neural networks and deep learn-

ing, Vol. 25 (Determination press USA, 2015).

[21] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, SIAM

Journal on Scientiﬁc Computing 16, 1190 (1995).

[22] F. Pedregosa, G. Varoquaux, A. Gramfort,

V. Michel, B. Thirion, O. Grisel, M. Blondel,

P. Prettenhofer, R. Weiss, V. Dubourg, J. Van-

derplas, A. Passos, D. Cournapeau, M. Brucher,

M. Perrot, and E. Duchesnay, Journal of Ma-

chine Learning Research 12, 2825 (2011).

[23] E. Jones, T. Oliphant, P. Peterson, et al., “SciPy:

Open source scientiﬁc tools for Python,” https:

//www.scipy.org/ (2001).

[24] A. P´erez-Salinas, “Quantum classiﬁer with

data re-uploading,” https://github.com/

AdrianPerezSalinas/universal_qlassifier

(2019).

[25] S. Ahmed, “Data-reuploading classifer,”

https://pennylane.ai/qml/app/tutorial_

data_reuploading_classifier.html (2019).

[26] J. Romero, R. Babbush, J. R. McClean,

C. Hempel, P. J. Love, and A. Aspuru-Guzik,

Quantum Science and Technology 4, 014008

(2018).

A The Stochastic Gradient Descent

method (SGD)

We propose an algorithm to compute the derivatives

needed to minimize the cost function that is inspired

in the stochastic gradient descent algorithm for neu-

ral networks. This kind of computation is not new,

as similar proposals can be found in [7, 8, 26]. We

show that, in analogy to neural networks, this algo-

rithm can be interpreted as a back-propagation, as

it takes intermediate steps of the circuit for exactly

computing the gradient of the function χ

. However,

it requires full access to the wave function at interme-

diate steps of the computation, which makes it costly

to be implemented experimentally.

A.1 To compute the gradient

Our goal is to ﬁnd the gradient of χ

for one data

point of the training set. In terms of Eq. (2) and Eq.

(4),

|ψi|

= |h

|L(N)L(N −1) ···L(1)|0i|

. (24)

Let us write explicitly an intermediate quantum state

|ψ

i of the single-qubit classiﬁer,

|ψ

i =

k=0

L(k)|0i = L(l)|ψ

l−1

i, (25)

where 1 ≤ l ≤ N and |ψ

i ≡ |0i. Next, notice that

we can recursively deﬁne the counterpart of the label

states |

i as

h∆

| = h

|L(N)L(N −1) . . . L(l+1) = h∆

l+1

|L(l+1),

(26)

where 1 ≤ l ≤ N and h∆

| ≡ h

|. A way to un-

derstand this notation is by checking that |ψ

i counts

gates from the beginning of the circuit, starting from

ﬁrst layer L(1), and h∆

| counts backwards from the

end of the circuit, starting from N

layer.

Thus, Eq. (24) can be rewritten as

|ψi|

= |h∆

|ψ

, (27)

where l ∈ [1, N ] deﬁnes the circuit layer where the

gradient will be computed.

Each layer L(l) contains as much as six tunable

parameters: three rotational angles represented with

the vector

and three weights ~w

. For a problem of

dimension d > 3, each layer is composed by sublayers

with up to six tunable parameters, as shown in Eq.(6).

Then, when we derive with respect to one parameter,

only the corresponding layer gate will be aﬀected by

it. We can split the set of gates in two parts: those

before the derivative gate and those after. Using the

recursive deﬁnitions presented above, the derivative

with respect to

= (θ

, θ

) angles can be written

∂χ

∂θ

= −2Re



h∆

∂L(l)

∂θ

|ψ

l−1



hψ

|∆



, (28)

for i = 1, 2, 3 and similarly with ∂χ

/∂ ~w

. This

derivative is not unitary. Nevertheless, this fact has

not importance in our algorithm since we will use its

value to update the values of the new parameters in

the next iteration classically.

The computation of the partial derivatives of L(l)

is straightforward do to its matrix unitary structure.

In general, L(l) is a general SU(2) matrix which can

be parametrized as

L(l) ≡ U(

) =





cos

(φ

+φ

)

−sin

−i

(φ

−φ

)

sin

(φ

−φ

)

cos

−i

(φ

+φ

)





(29)

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 15

where

= (φ

, φ

) =

+ ~w

◦~x. Thus,

∂L(l)

∂θ

∂L(l)

∂φ

, (30)

∂L(l)

∂w

∂L(l)

∂θ

, (31)

The key feature of taking the L(l) parametriza-

tion of Eq.(29) is that its derivatives are almost itself,

which simpliﬁes its computation signiﬁcantly. Notice

that

∂L(l)

∂θ

∂U (

+ w

◦~x)

∂θ

+ ~w

◦~x + π

(32)

where

is a vector of dimension three which compo-

nents are all zeros except the i-th one. The calcula-

tions for the derivative become simpler in this case,

as we just have to shift the proper parameter by π,

and use it to measure the derivative.

Once we have computed the gradient of the cost

function χ

, we can update the parameters by setting

← θ

− η

∂χ

∂θ

, (33)

← w

− η

∂χ

∂w

, (34)

where η is an adjustable learning rate.

A.2 To update parameters

There are several ways of updating the parameters.

One naive option is to update parameters every time

we compute the gradient of the function. Then we

start all over again and repeat until the parameters

become stable. This is not a good practice. If we

do so, we are moving point by point, optimizing and

classifying every data from the training set of our clas-

siﬁer. This classiﬁes all points slowly. At the late

steps of the minimization process, we will have to face

plenty of points which are already classiﬁed and do

not have important gradients. In summary, this is a

very ineﬃcient algorithm.

Another option is to compute the gradients for ev-

ery point, and then take the average of all gradients

and update parameters with it. However, this method

is likely to achieve poor accuracies. Some gradients

can be canceled out by some others and the stability

is achieved before it should be.

The method that is usually used is the so-called

batched optimization. This method is a mix of the

two previous ways of tackling the problem. First, we

have to split the training set into smaller subsets or

batches. We compute the averaged gradient of the

ﬁrst batch with respect to all points in the batch

and then update. Then, proceed similarly with the

second batch. We keep going until we ﬁnish all the

batches. The crucial step is to shuﬄe the training set

and split it into new diﬀerent mixed batches. This

way, we optimize the classiﬁer respect to subsets of

the original training set that are constantly chang-

ing which is more eﬃcient than a point-by-point opti-

mization, but statistically equivalent. In our SGD al-

gorithm, we took batches with 20 points out of a total

amount of 200 points for the two-dimensional prob-

lems, 500 for three-dimensional problems and 1000 for

four-dimensional problems.

Although we tested this SGD algorithm, we ob-

tained worse results than with the L-BFGS-B method.

In addition, the computational eﬀorts for this SGD al-

gorithm were larger as could not be fully optimized.

Thus we discard for the moment this SGD minimiza-

tion method.

B Classiﬁcation results in other prob-

lems

We tested our classiﬁer in several diﬀerent problems.

Here, we show a summary of the results obtained in

the other problems we tackled. As the ones deﬁned

in the main paper, we constrained the data space to

~x ∈ [−1, 1]

dim

. These extra classiﬁcation tasks aim

to cover diﬀerent kinds of training data, and show

that our quantum classiﬁer can adapt itself to large

varieties of problems.

The datasets have 200 random entries – except for

the sphere, which has 500 – and test the performance

of the classiﬁer against 4000 random points. All prob-

lems were designed without biases, i.e. a blind classi-

ﬁer will get successes of ∼50% for a binary classiﬁer,

∼33% for a ternary classiﬁer and ∼25% for a quater-

nary classiﬁer.

In general, we can observe that the classiﬁer im-

proves its performance as we increase the number of

layers until it comes into a stationary regime. More

qubits and entanglement usually allows entering in

the stationary regime with a lesser number of layers.

In the following subsections, we deﬁne these extra

problems and present the results of each one summa-

rized in a table. The best results obtained in each

problem are plotted in Figure 9.

Non-convex problem

The non-convex problem is made for testing two mu-

tually non-convex zones dividing a 2D area. The bor-

der of both zones lies on the function x

= −2x

3/2 sin(πx

). This border is chosen in a way that

there is no area small enough to leave it unclassiﬁed,

forcing the classiﬁer to catch these smaller zones.

Complete results can be read in Table 6. The

weighted ﬁdelity cost function gets the best perfor-

mance, 98%, with 6 layers and 1 qubit, (32 parame-

ters). Fidelity cost gets a result of 96% for the same

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 16

Qubits 1 2 1 2 4

Layers

No Ent. Ent. No Ent. Ent. No Ent. Ent.

1 0.49 0.55 – 0.49 0.76 – 0.76 –

2 0.82 0.75 0.75 0.86 0.94 0.85 0.96 0.96

3 0.93 0.74 0.85 0.96 0.95 0.95 0.95 0.97

4 0.93 0.74 0.88 0.95 0.96 0.97 0.95 0.96

5 0.91 0.95 0.90 0.97 0.95 0.96 0.95 0.97

6 0.96 0.94 0.93 0.98 0.97 0.97 0.95 0.97

8 0.96 0.96 0.95 0.98 0.98 0.97 0.96 0.97

10 0.95 0.92 0.95 0.96 0.96 0.96 0.96 0.97

Table 6: Results of the single- and multi-qubit classiﬁers with data re-uploading for the non-convex problem. Numbers indicate

the success rate, i.e. the number of data points classiﬁed correctly over the total number of points. Words “Ent.” and “No

Ent.” refer to considering entanglement between qubits or not respectively. We have used the L-BFGS-B minimization method

with the weighted ﬁdelity and ﬁdelity cost functions. Both cost functions lead to higher success rates, although the weighted

ﬁdelity cost function is better. It achieves the 0.98 success with two qubits, entanglement, and four layers.

conditions, while the best, 97%, is achieved for 2 en-

tangled qubits with 8 layers (80 parameters). The ﬁ-

nal ﬁgure obtained after the classiﬁcation can be seen

in Figure 9(a).

We can compare this result with the one obtained

in Table 5. Neural networks work well as they can

approximate our non-convex function properly. How-

ever, SVC performs worse, as they ﬁnd hard to deal

with non-convexity. In contrast with that, our quan-

tum classiﬁer is plastic enough to classify this data

with no major diﬃculty.

Binary annulus

This a binary version of the annulus problem. The

two classes are deﬁned as being inside or outside the

annulus with radii r

0.8 − 2/π and r

√

0.8.

This geometry, as happens with the three-class annu-

lus problem, is interesting because the classiﬁer ﬁnds

a way to connect areas of the problem which are dis-

connected. This involves a to and fro path for the

parameters.

The ﬁdelity cost function reaches the best result at

94%, with 2 qubits with entanglement and 6 layers,

which is 60 parameters. For the weighted ﬁdelity, we

reach up to 97% with 2 qubits with no entanglement

and 4 layers, 40 parameters. The results are written

in Table 7 and the best result is plotted in Figure

9(b).

If we compare these results to the classical ones in

Table 5, we see again that the SVC method is way

worse than the NN, which can be interpreted as the

sign of non-convexity this problem has.

Sphere

This quantum classiﬁer is able to classify multidi-

mensional data, as we have shown with the four-

dimensional hypersphere. We also tested a three-

dimensional ﬁgure, a sphere with radius r =

3/π.

For this problem, the ﬁdelity cost function reaches

its maximum, 93%, with a single-qubit classiﬁer of 10

layers (60 parameters). The same success is obtained

with a two-qubit entangled classiﬁer and 6 layers (72

parameters). With the weighted ﬁdelity, this success

rate grows up to 96% for two- and four- qubit classiﬁer

of 2 layers (24 and 48 parameters respectively) with

and without entanglement. All results are written in

Table 8.

Squares

This problem divides a 2D area into four quadrants

with straight lines. This is one of the easiest prob-

lems for a neural network. By construction, neural

networks can establish a separation between classes

by using biases, and thus straight lines are immedi-

ate to understand. We construct this problem to see

how a quantum classiﬁer performs against a neural

network in the latter’s ﬁeld.

Classical results are very good, up to 98% and 96%

for neural networks of 100 neurons and a single hidden

layer and SVC respectively. However, quantum clas-

siﬁer performs even better. The ﬁdelity cost function

reaches 99% of success in a two-qubit classiﬁer with-

out entanglement and 6 layers (60 parameters). Any

two-qubit result is comparable with the success rate of

the classical models. Something similar can be found

for the weighted ﬁdelity. The maximum success, 96%,

is obtained with a two-qubit entangled classiﬁer with

4 layers (40 parameters). The results are written in

Table 9 and the best performance is plotted in Figure

9(c).

Wavy lines

This problem is the four-class version of the non-

convex problem. Now the area is divided into four

regions by two diﬀerent functions. The borders’ equa-

tions are x

= sin(πx

) ± x

. The important feature

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 17

Qubits 1 2 1 2 4

Layers

No Ent. Ent. No Ent. Ent. No Ent. Ent.

1 0.44 0.50 – 0.44 0.59 – 0.66 –

2 0.48 0.50 0.51 0.53 0.73 0.72 0.70 0.96

3 0.91 0.50 0.56 0.74 0.75 0.95 0.78 0.96

4 0.80 0.74 0.56 0.86 0.97 0.97 0.92 0.96

5 0.90 0.93 0.88 0.89 0.97 0.96 0.97 0.94

6 0.92 0.91 0.94 0.95 0.94 0.95 0.95 0.93

8 0.90 0.93 0.95 0.92 0.94 0.94 0.96 0.94

10 0.90 0.92 0.91 0.92 0.95 0.93 0.96 0.93

Table 7: Results of the single- and multi-qubit classiﬁers with data re-uploading for the binary annulus problem. Numbers

indicate the success rate, i.e. the number of data points classiﬁed correctly over the total number of points. Words “Ent.” and

“No Ent.” refer to considering entanglement between qubits or not respectively. We have used the L-BFGS-B minimization

method with the weighted ﬁdelity and ﬁdelity cost functions. As happens in other problems, the results obtained with the

weighted ﬁdelity cost function are better than the ones obtained with the ﬁdelity cost function. The multi-qubit classiﬁers and

the introduction of entanglement increase the success rates.

Qubits 1 2 1 2 4

Layers No Ent. Ent. No Ent. Ent. No Ent. Ent.

1 0.53 0.70 – 0.53 0.70 – 0.70 –

2 0.77 0.73 0.53 0.78 0.94 0.96 0.96 0.96

3 0.76 0.74 0.77 0.78 0.92 0.94 0.94 0.95

4 0.84 0.83 0.78 0.89 0.92 0.94 0.95 0.94

5 0.89 0.85 0.77 0.90 0.94 0.94 0.95 0.94

6 0.90 0.89 0.88 0.92 0.87 0.93 0.94 0.94

8 0.89 0.87 0.90 0.93 0.92 0.89 0.94 0.93

10 0.93 0.91 0.90 0.93 0.94 0.92 0.92 0.92

Table 8: Results of the single- and multi-qubit classiﬁers with data re-uploading for the three-dimensional sphere problem.

Numbers indicate the success rate, i.e. the number of data points classiﬁed correctly over the total number of points. Words

“Ent.” and “No Ent.” refer to considering entanglement between qubits or not respectively. We have used the L-BFGS-B

minimization method with the weighted ﬁdelity and ﬁdelity cost functions. The weighted ﬁdelity cost function is better than

the ﬁdelity cost function. There are no signiﬁcant diﬀerences between the two-qubit and the four-qubit classiﬁers. Both are

better than the single-qubit classiﬁer and the introduction of entanglement does not increase the success rates.

Qubits 1 2 1 2 4

Layers No Ent. Ent. No Ent. Ent. No Ent. Ent.

1 0.58 0.48 – 0.70 0.92 – 0.90 –

2 0.76 0.96 0.97 0.74 0.91 0.94 0.95 0.95

3 0.90 0.96 0.98 0.90 0.94 0.95 0.95 0.95

4 0.89 0.98 0.96 0.88 0.94 0.95 0.95 0.95

5 0.91 0.97 0.98 0.89 0.94 0.94 0.95 0.94

6 0.92 0.99 0.94 0.93 0.94 0.94 0.94 0.94

8 0.93 0.98 0.94 0.93 0.94 0.95 0.95 0.94

10 0.94 0.97 0.93 0.94 0.94 0.94 0.94 0.93

Table 9: Results of the single- and multi-qubit classiﬁers with re-uploading data for the four-classes squares problem. Numbers

indicate the success rate, i.e. the number of data points classiﬁed correctly over the total number of points. Words “Ent.” and

“No Ent.” refer to considering entanglement between qubits or not respectively. We have used the L-BFGS-B minimization

method with the weighted ﬁdelity and ﬁdelity cost functions. In this problem, the ﬁdelity cost function presents better results

than the weighted ﬁdelity cost function. It achieves the 0.99 success with the two-qubit classiﬁer with six layers and no

entanglement.

of this problem is that there are some areas in the

problem too small to be caught by the classiﬁer.

As can be seen in Figure 9(d), most of the fail-

ure points are in these small non-convex areas. The

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 18

Qubits 1 2 1 2 4

Layers

No Ent. Ent. No Ent. Ent. No Ent. Ent.

1 0.70 0.52 – 0.76 0.75 – 0.88 –

2 0.86 0.75 0.80 0.84 0.89 0.88 0.91 0.92

3 0.74 0.82 0.84 0.84 0.92 0.91 0.92 0.92

4 0.80 0.85 0.87 0.87 0.89 0.93 0.92 0.93

5 0.85 0.90 0.88 0.87 0.92 0.92 0.93 0.93

6 0.92 0.92 0.91 0.88 0.93 0.94 0.93 0.93

8 0.90 0.91 0.91 0.92 0.92 0.92 0.93 0.94

10 0.92 0.91 0.93 0.90 0.93 0.93 0.93 0.93

Table 10: Results of the single- and multi-qubit classiﬁers with re-uploading data for the four-classes wavy lines problem.

Numbers indicate the success rate, i.e. the number of data points classiﬁed correctly over the total number of points. Words

“Ent.” and “No Ent.” refer to considering entanglement between qubits or not respectively. We have used the L-BFGS-B

minimization method with the weighted ﬁdelity and ﬁdelity cost functions. Results with the weighted ﬁdelity cost function are

slightly better than the ones obtained with the ﬁdelity cost function. The multi-qubit classiﬁers are vaguely better than the

single-qubit classiﬁers. The introduction of entanglement does not change signiﬁcantly the results.

(a) χ

, 1 qubit, 6 layers (b) χ

, 2 qubits without entanglement, 4 layers

, 2 qubits without entanglement, 6 layers (d) χ

, 2 qubits with entanglement, 6 layers

Figure 9: The best results for the 2D problems analyzed in this appendix. The caption of each ﬁgure includes the architecture

of the classiﬁer, i.e. number of qubits, layers and the use of entanglement. In the case of equal results with diﬀerent classiﬁers,

the simplest architecture has been chosen. Colors in the left part of each ﬁgure represent the diﬀerent classes obtained from

the classiﬁer outputs. The right part of each ﬁgure prints the points correctly classiﬁed (in green) and bad classiﬁed (in red).

Black solid lines deﬁne the problem boundaries. Notice that all missed points are located near the borders of the problem. This

means that the classiﬁer is understanding properly which is the problem most important features, but it lacks more training

points. This can be easily corrected if we increase the number of training points.

classiﬁer would rather adjust the rest of the points

instead of tuning those zones and losing everything

else. The results for this problem are not as good as

for other problems, but we still get 94% for the ﬁdelity

cost function, two entangled qubits and 10 layers (200

parameters) and the weighted ﬁdelity, four entangled

qubits and 4 layers (80 parameters).

It is remarkable to compare these results, written in

Table 10, with the ones obtained using classical mod-

els. The quantum classiﬁer approximately equals the

NN method for this problem and outperforms SVC,

94% against 95% (NN) and 82% (SVC).

Accepted in Quantum 2020-01-27, click title to verify. Published under CC-BY 4.0. 19

Comments

Products

Project