
1 Causal modeling and timelike Bell-
scenarios
In the following we employ the formalism of Bayesian
networks [28], which provides a natural framework for
classical causal modeling. A central concept in this
framework is that of a directed acyclic graph (DAG),
which consists of a set of nodes, representing the rel-
evant random variables
1
in the considered situation,
and directed edges, representing the causal relations
between those variables. A set of variables X
1
, . . . , X
n
forms a Bayesian network with respect to some DAG
if and only if the probability distribution p(x
1
, . . . , x
n
)
can be decomposed as
p(x
1
, . . . , x
n
) =
n
Y
i=1
p(x
i
|pa
i
) (1)
where P A
i
stands for the set of graph-theoretical par-
ents of the variable X
i
(i.e. all variables that have a
direct causal influence over X
i
). Without loss of gen-
erality each variable can be understood as a determin-
istic function of its parents plus local noise U
i
that
supplies potential randomness, x
i
= f
i
(pa
i
, u
i
). This
formalism thus enables a distinction between simple
statistical correlations and actual causation by explic-
itly specifying the underlying mechanism generating
the data.
Here we are interested in DAGs containing so-called
latent variables, which are empirically inaccessible. In
the context of Bell’s theorem [1] these are also known
as hidden variables. For any set of observed corre-
lations, there are in general many DAGs with hid-
den variables that could have produced these obser-
vations. Among these, causal inference is particularly
interested in those fulfilling the conditions of minimal-
ity and faithfulness. Minimality requires that, given
two possible causal models, we choose the simplest
one, capable of generating the smallest set of correla-
tions (including the observed one). In turn, faithful-
ness, requires the causal model to be able to explain
the observed data without resorting to fine-tuning of
the causal-statistical parameters. In other words, any
observed (conditional) independence should be a con-
sequence of the causal structure itself, rather than a
specific choice of parameters. Faithful (i.e. non-fine-
tuned) models are therefore robust against changes in
the causal parameters and thus the preferred choice.
To illustrate the last point, consider the paradig-
matic causal structure of Bell’s theorem in Fig. 1a.
This structure intuitively reflects the causal assump-
tions of Bell’s theorem, leading to the so-called local
hidden-variable (LHV) models. First, the two parties
are assumed to be spacelike separated, such that the
correlations between the measurements outcomes A
1
We adopt the standard convention that uppercase letters
label random variables while their values are denoted in lower
case.
and B can only be mediated via a common source
Λ, implying that p(a, b|x, y, λ) = p(a|x, λ)p(b|y, λ).
Second, it is assumed that the experimenters can
freely choose which observables to measure (repre-
sented by the random variables X and Y ), inde-
pendently of how the system was prepared, that is,
p(x, y, λ) = p(x, y)p(λ). Note that these constraints
implied by the causal model appear at an unobserv-
able level since they explicitly involve the hidden vari-
ables Λ. Yet, they also imply observable constraints
in the form of no-signaling conditions, expressed as
p(a|x, y) = p(a|x) and p(b|x, y) = p(b|y), and Bell
inequalities [1, 29].
Figure 1: Causal structure underlying Bell’s theorem. (a)
Two observers, Alice and Bob, each have the choice of two
measurements represented by the random variables X and Y ,
respectively. The correlation between their measurement out-
comes, modeled as random variables A and B, respectively,
are mediated solely by a common cause in their past—the
hidden variable Λ. (b) Bell’s causal model augmented with
one-way communication from Alice to Bob. The initial state
of the joint system is specified by the ontic state Λ. First,
Alice performs a measurement with setting x, obtaining out-
come a. She then sends a message m to Bob, who performs
a measurement with setting y, obtaining outcome b.
While quantum correlations obey the no-signaling
conditions, they violate Bell inequalities [1, 29] and
are thus in conflict with the assumptions behind the
causal structure in Fig. 1a. In order to maintain a
classical causal explanation, the model in Fig. 1a must
therefore be augmented with additional resources;
something that can only be done at the cost of in-
troducing fine-tuning [26]. For instance, the causal
structure in Fig. 1b can reproduce all quantum corre-
lations, but at the same time allows, in principle, for
non-local correlations between X and B. Hence, in
order to satisfy the no-signaling condition p(b|x, y) =
p(b|y) the causal parameters must be chosen from a
set of measure zero [28], a signature of fine-tuning.
Studying such non-local classical models can pro-
vide valuable insights into the relation between clas-
sical and quantum theory, and their applications [2].
However, at the same time such models lead to an
unfair comparison, since allowing for communication
makes not only classical, but also quantum models
more powerful. In practice, it is more natural to
assume a certain underlying causal structure, and
ask what can be achieved with classical and quan-
tum resources? Bell’s theorem is a particular case
of this broader question, referring to spacelike sep-
arated events. However, there are often situations
2