The Do-Calculus Revisited
Judea Pearl
Keynote Lecture, August 17 , 2012
UAI-2012 Conference, Catalina, CA
Abstract
The do-calculus was developed i n 1995 to facilitate the
identification of causal effects in non-parametric mod-
els. The completeness proofs of
[
Huang and Valtorta,
2006
]
and
[
Shpitser and Pearl, 2006
]
and the graphi-
cal criteria of
[
Tian and Shpitser, 201 0
]
have laid this
identification problem to rest. Recent explorations un-
vei l the usefulness of the do-calculus in three addi-
tional areas: mediation analysis
[
Pearl, 2012
]
, trans-
portability
[
Pearl and Bareinboim, 2011
]
and meta-
synthesis. Meta-synthesis (freshly coined) is the task
of fusing empirical results from several diverse stud-
ies, conducted on heterogeneous populations and un-
der different conditions, so as to synthesize an esti-
mate of a causal relation in some target environment,
potentially different from those under study. The talk
surveys these results with emphasis on the challenges
posed by meta-synthesis. For background material,
see hhttp://bayes.cs.ucla.edu/csl
papers.htmli.
1 Introduction
Assuming readers are familiar with the basics of graph-
ical models, I will start by reviewing the problem of
nonparametric identification and how it was solved by
the do-calculus and its derivatives. I will then show
how the do-calculus benefits mediation analysis (Sec-
tion 2) , transportability problems (Section 3) and
meta-synthesis (Section 4 ).
1.1 Causal Models, interventions, and
Identification
Definition 1. (Structural Equation Model)
[
Pearl,
2000, p. 203
]
.
A structural equation model (SEM) M i s defined as
follows:
1. A set U of background or exogenous variables, rep-
resenting f actors outside the model, which never-
theless affect relationship within the model.
2. A set V = {V
1
, . . ., V
n
} of observed endogenous
variables, where each V
i
is functionally dependent
on a subset P A
i
of U V \{V
i
}.
3. A set F of functions {f
1
, . . . , f
n
} such that each
f
i
determines the value of V
i
V, v
i
= f
i
(pa
i
, u).
4. A joint probability distribution P (u) over U.
When an instantiation U = u is given, together with
F , the model is said to be “completely specified” (at
the unit level) when the pair hP (u), F i is given, the
model is “fully specified” (at the population level).
Each fully specified model defines a causal diagram
G in which an arrow is drawn towards V
i
from each
member of i ts parent set P A
i
.
Interventions and counterfactuals are defined through
a mathemati cal operator called do(x ), which simu-
lates physical interventions by deleting certain func-
tions from the model, replacing them with a constant
X = x, while keeping the rest of the model unchanged.
The resulting model is denoted M
x
.
The postintervention distribution resulting from the
action do(X = x) is given by the equation
P
M
(y|do(x)) = P
M
x
(y) (1)
In words, in the framework of model M , the postin-
tervention distribution of outcome Y is defined as the
probability that model M
x
assigns to each outcome
level Y = y. From this distribution, which is read-
ily computed from any fully specified model M , we
are able to assess treatment efficacy by comparing as-
pects of this distribution at different levels of x. Coun-
terfactuals are defined si milarly through the equation
Y
x
(u) = Y
M
x
(u) (see
[
Pearl, 2009, Ch. 7
]
).
The following definition captures the requirement that
a causal query Q be estimable from the data:
4
In Nando de Freitas and Kevin Murphy (Eds.), Proceedings of the Twenty-Eighth Conference on Uncertainty
in Artificial Intelligence, Corvallis, OR: AUAI Press, 4-11, 2012.
TECHNICAL REPORT
R-402
August 2012
Definition 2. (Identifiability)
[
Pearl, 2000, p. 77
]
.
A causal query Q(M) is identifiable, given a set of
assumptions A, if for any two (fully specified) models
M
1
and M
2
that satis fy A, we have
P (M
1
) = P (M
2
) Q(M
1
) = Q(M
2
) (2)
In words, the functional details of M
1
and M
2
do not
matter; what matters is that the assumptions in A
(e.g., those encoded in the diagram) would constrain
the variability of those details in such a way that equal-
ity of P s would entail equality of Qs. When this hap-
pens, Q depends on P only, and should therefore be
expressible in terms of the parameters of P .
1.2 The Rules of do-calculus
When a query Q is given in the form of a do-expression,
for example Q = P (y|do(x), z), its identifiabil ity can
be decided systematically using an algebraic procedure
known as the do-calculus
[
Pearl, 1995
]
. It consists of
three inference rules that permit us to map interven-
tional and observational distributions whenever cer-
tain conditions hold in the causal diagram G .
Let X, Y, Z, and W be arbitrary disjoint sets o f nodes
in a causal DAG G. We denote by G
X
the graph
obtained by deleting from G all arrows p ointing to
nodes in X. Likewise, we denote by G
X
the graph
obtained by deleting from G all arrows emerging from
nodes in X. To represent the deletion of both i ncoming
and outgoing arrows, we use the nota tion G
XZ
.
The following three rules are valid for every interven-
tional distribution compatible with G.
Rule 1 (Insertion/deletion of observations):
P (y|do(x), z, w) = P (y|do(x), w)
if (Y Z|X, W )
G
X
(3)
Rule 2 (Action/observation exchange):
P (y|do(x), do(z), w) = P (y|do(x), z, w)
if (Y Z|X, W )
G
X Z
(4)
Rule 3 (Insertion/deletion of actions):
P (y|do(x), do(z), w) = P (y|do(x), w)
if (Y Z|X, W )
XZ(W )
, (5)
where Z(W ) is the set of Z-nodes that are not ances-
tors of any W -node in G
X
.
To establish identifiability of a query Q, o ne needs to
repeatedly apply the rules of do-calculus to Q, until the
final expression no l onger contains a do-operator
1
; this
renders it estimabl e from non-experimenta l data, and
the final do-free expression can serve as an estimator
of Q. The do-calculus was proven to be complete to
the identifiability of causal effects
[
Shpitser and Pearl,
2006; Huang and Valtorta, 2006
]
, which means that
if the do-operations cannot be removed by repeated
appli cation of these three rules, Q is not identifiable.
Parallel works by
[
Tian and Pearl, 2002
]
and
[
Shpitser
and Pearl, 2006
]
have led to graphical criteria for veri-
fying the identifiability of Q as well as polynomial time
algorithms f or constructing an estimator of Q. This,
from a mathematical viewpoint, closes the chapter of
nonparametric identification of causal effects.
2 Using do-Calculus for Identifying
Dire ct and Indirect Effects
Consider the mediation model in Fig. 1(a). (In this
section, M stands for the mediating variabl e.) The
Controlled Direct Effect (CDE) of X on Y is defined
as
CDE(m) = E(Y |do(X = 1, M = m))E(Y |do(X = 0, M = m))
and the Natural Direct Effect (NDE) is defined by the
counterfactual expression
NDE = E(Y
X=1,M
X =0
) E(Y
X=0
)
The natural direct effect represents the effect trans-
mitted from X and Y while keeping some inter-
mediate variable M at whatever level it attained
prior to the transition
[
Robins and Greenland, 1992;
Pearl, 2001
]
.
Since CDE is a do-expression, its identification is fully
characterized by the do-calculus. The NDE, however,
is counterfactual and requires more intricate condi-
tions, as shown in Pearl (2001). When transla ted to
graphical l anguage these conditio ns read:
Assumption-Set A
[
Pearl, 2001
]
There exists a set W of measured covariates such that:
A-1 No member of W is a descendant of X.
A-2 W blocks all back-door paths from M to Y , dis-
regarding the one through X.
A-3 The W -specific effect of X on M is identifiable
using do-calculus.
1
Such derivations are illustrated in graphical details in
[
Pearl, 2009, p. 87
]
.
5
A-4 The W -specific joint effect of {X, M} on Y is iden-
tifiable using do-calculus.
The bulk of the literature on mediation anal ysis has
chosen to express identification conditions in the lan-
guage of “ignorability” (i.e., independence among
counterfactuals) which is rather opaque and has led
to significant deviation from assumption set A.
A typical example of overly stringent conditions that
can be fo und in the literature reads as follows:
“Imai, Keele and Yamamoto (2010) showed
that the sequential ignorability assumption
must be satisfied in order to identify the av-
erage mediation effects. This key assump-
tion implies that the treatment assignment
is essentially random after adjusting for ob-
served pretreatment covariates and that the
assignment of mediator values i s also essen-
tially random once both observed treatment
and the same set of observed pretreatment
cova riates are a djusted for.”
[
Imai, Jo, and
Stuart, 2011
]
When translated to graphical representation, these
conditions read:
Assumption-Set B
There exists a set W of measured covariates such that:
B-1 No member of W is a descendant of X.
B-2 W blocks all back-door paths from X to M .
B-3 W and X block all back-door paths from M to Y .
We see that assumption set A relaxes that in B in two
ways.
First, we need not insist on using “the same set of ob-
serve pretreatment covariates,” two separate sets can
sometimes accomplish what the same set does not.
Second, conditions A-3 and A-4 invoke do-calculus and
thus open the doo r for identification criteria beyond
back-door adjustment (as in B-2 and B-3).
In the sequel we will show that these two features en-
dow A with greater identification power. (See also
[
Shpitser, 2012
]
.)
2.1 Divide and conquer
Fig. 2 demonstrates how the “divide and conquer” flex-
ibility translates into an increase identification power.
Here, the X M relatio nship requires an adjustment
X
Y
(a)
X
Y
W
MM
(b)
Figure 1: (a) The basic unconfounded mediation
model, showing the treatment (X) mediator (M) and
outcome (Y ). (b) The mediator model with an added
cova riate (W ) that confounds both the X M and
the M Y relationships.
Figure 2: A mediati on model with two dependent
confounders, permitting the decomposition of Eq. (6).
Hollow circles stand for unmeasured confounders. The
model satisfies condition A but viol ates condition B.
for W
2
, and the X Y relationship requires an ad-
justment for W
3
. If we ma ke the two adj ustments sep-
arately, we can identify NDE by the estimand:
NDE =
X
m
X
w
2
,w
3
P (W
2
= w
2
, W
3
= w
3
)
[E(Y | X = 1, M = m, W
3
= w
3
)]
[E(Y | X = 0, M = m, W
3
= w
3
, )]
P (M = m | X = 0, W
2
= w
2
) (6)
However, if we insist on adjusting for W
2
and W
3
si-
multaneously, as required by assumption set B, the
X M relationship would become confounded, by
opening two colliders in tandem along the path X
W
3
W
2
M. As a result, assumption set B would
deem the NDE to be unidentifiable; there is no covari-
ates set W that simultaneously deconfounds the two
relationships.
2.2 Going beyond back-door adjustment
Figure 3 displays a model for which the natural direct
effect achieves its identifiability through multi-step ad-
justment (in this case using the front-door procedure),
permitted by A, though not through a single-step ad-
justment, as demanded by B. In this model, the null
set W = {0} satisfies conditions B-1 and B-3, but not
condition B-2; there is no set of covariates that would
enable us to deconfound the treatment-mediator re-
lationship. Fortunately, condition A-3 requires only
6
Figure 3: Measuring Z permits the identification of the
effect o f X on M through the front-door procedure.
that we i dentify the effect of X on M by some do-
calculus method, not necessarily by rendering X ran-
dom or unconfounded (or ignorabl e). The presence of
observed variable Z permits us to identify this causal
effect using the front-door conditi on
[
Pearl, 1995;
2009
]
.
In Fig . 4, the front-door estimator needs to be applied
to both the X M and the X Y relationships.
In addition, conditioning on W is necessary, in order
to satisfy condition A-2. Still, the identification of
P (m|do(x), w) and E(Y |do(m, x), w) presents no spe-
cial problems to students of causal inference
[
Shpitser
and Pearl, 2006
]
.
Figure 4: Measuring Z and T permits the identifi-
cation of the effect of X o n M and X on Y f or each
specific w and leads to the identification of the natural
direct effect.
Figure 5: N DE is identified by adjusting fo r W and
using Z to deconfound the X Y relationship.
Figure 5, demonstrates the role that an observed co-
variate (Z) on the X Y pathway can play in the
identification of natural effects. In this model, con-
ditioning on W deconfounds both the M Y and
Y M relationships but confounds the X Y
relationship. However the W - specific joint effect of
{X, M } on Y is identifiable through observations on
Z (using the front-door estimand).
In Fig. 6, a covariate Z situated along the path from M
to Y leads to identifying N DE. Here the mediator
outcome relationship is unconfounded (once we fix X),
so, we are at liberty to choose W = {0} to satisfy con-
dition A-2. The treatment mediator relationship is
confounded, and requires an adjustment for T (so does
the treatment-outcome relationship). However, condi-
tioning on T will confound the {M X} Y relation-
ship (in violation of condition A-4). Here, the presence
of Z comes to our help, for it permits us to estimate
P (Y | do(x, m), t) thus rendering NDE identifiable.
Figure 6: The confounding created by adjusting for T
can be removed using measurement of Z.
3 Using do-Calculus to Decide
Transportability
In applications invo lving identifiability, the role of
the do-calculus is to remove the do-operator from the
query expressio n. We now discuss a totally differ-
ent application, to decide if experimental findings can
be transported to a new, potentially different envi-
ronment, where only passive observations can be per-
foremed. This probl em, labeled “transportability” in
[
Pearl and Bareinboi m, 2011
]
can also be reduced to
syntactic operation using the do-calculus but here the
aim will be to separate the do-operator from a set S
of variables, that indicate disparities between the two
environment.
We shall motivate the problem through the following
three examples.
Example 1. We conduct a randomized trial in Los
Angeles (LA) and estimate the causal effect of exposure
X on outcome Y for every age group Z = z as depicted
in Fig. 7(a). We now wish to generalize the result s to
the population of New York City (NYC), but data alert
us to the fact that the study distribution P (x, y, z) in
LA is significantl y different from the one in NYC (call
the latter P
(x, y, z)). In particular, we notice that the
average age in NYC is significantly higher than that in
LA. How are we to estimate the causal effect of X on
Y in NY C, denoted P
(y|do(x)).
Example 2. Let the variable Z in Example 1 st and f or
subjects language proficiency, and let us assume that
7
Z does not affect exposure (X) or outcome (Y ), yet it
correlates with both, being a proxy for age which is not
measured in either study (see Fig. 7(b)). Given the ob-
served disparity P (z) 6= P
(z), how are we to estimate
the causal effect P
(y|do(x)) for the target population
of NYC from the z-specific causal effect P (y|do(x), z)
estimated at the study population of LA?
Example 3. Examine the case where Z is a X-
dependent variable, say a disease bio-marker, stand-
ing on the causal pathways betw een X and Y as
shown in Fig. 7(c). Assume further that the disparity
P (z) 6= P
(z) is discovered i n each level of X and that,
again, both the average and the z-specific causal effect
P (y|do(x), z) are estimated in the LA experiment, for
all levels of X and Z. Can we, based on information
given, estimate the average (or z-specific) causal effect
in the target population of NY C?
To formalize problems of this sort, Pearl and Barein-
boim devised a graphical representation called “selec-
tion diagrams” which encodes knowledge about differ-
ences between populations. It is defined as follows:
Definition 3. (Selection Diagram).
Let hM, M
i be a pair of structural causal models (Def-
inition 1) relative to domains hΠ, Π
i, sharing a causal
diagram G. hM, M
i is said to induce a selection dia-
gram D if D is constructed as follows:
1. Every edge in G is also an edge in D;
2. D contains an extra edge S
i
V
i
whenever there
exists a discrepancy f
i
6= f
i
or P (U
i
) 6= P
(U
i
)
between M and M
.
In summary, the S-variables locate the mechani sm s
where structural discrepancies between the two popu-
lations are suspected to take place. Alternatively, the
absence of a selection node pointing to a variable repre-
sents the assumption that the mechanism responsible
for assigning value to that variable is the same in the
two populations, as shown in Fig. 7 .
Using selection di agrams a s the basic representational
language, and harnessing the concepts of interventio n,
do-calculus, and identifiability (Section 1), we can now
give the notion of transportability a forma l definition.
Definition 4. (Transportability)
Let D be a selection diagram relative to domains
hΠ, Π
i. Let hP , Ii be the pair of observational and
interventional distributions of Π, and P
be the ob-
servational distribution of Π
. The causal relation
R
) = P
(y|do(x), z) is said to be transportable
from Π to Π
in D if R
) is uniquely computable
from P, P
, I in any model that induces D.
Theorem 1. Let D be the selection diagram char-
acterizing two populations, Π and Π
, and S a set
X Y X Y X Y
(c)(b)(a)
S
S
S
Z
Z
Z
Figure 7: Selection diagram s depicting Examples 1–3.
In (a) the two populations differ in age distributions.
In (b) the populations differs in how Z depends on age
(an unmeasured variable, represented by the hollow
circle) and the age distributions are the same. In (c)
the p opulations differ i n how Z depends on X.
of selection variables in D . The relation R =
P
(y|do(x), z) is transportable from Π to Π
if the ex-
pression P (y|do(x), z, s) is reducible, using the rules of
do-calculus, to an expression in which S appears only
as a conditioning variable in do-free terms.
This criterion was proven to be both sufficient and
necessary for causal effects, namely R = P (y|do(x))
[
Bareinboim a nd Pearl, 2012b
]
. Theorem 1 does not
specify the sequence of rules leading to the needed
reducti on when such a sequence exists.
[
Bareinboim
and Pearl, 2012b
]
established a complete and effec-
tive graphical procedure of confirming transportability
which also synthesizes the transport formula whenever
possible. For example, the transport f ormulae derived
for the three models in Fig. 7 are (respectively):
P
(y|do(x)) =
X
z
P (y|do(x), z)P
(z) (7)
P
(y|do(x)) = p(y|do(x)) (8)
P
(y|do(x)) =
X
z
P (y|z, x)P
(z|x ) (9)
Each transport formula determines for the investiga-
tor what information need to be taken from the exper-
imental and observational studies and how they oug ht
to be combined to yield an unbiased estimate of R.
4 From “Meta-analysis” to
“Meta-synthesis”
“Meta analysis” is a data fusion problem aimed at
combining results from many experimental and obser-
vational studies, each conducted on a different popu-
lation and under a different set of conditions, so as to
synthesize an aggregate measure of effect size that is
“better,” in some sense, than any one study in isola-
tion. This fusion problem has received enormous at-
tention in the health and social sciences, where data
are scarce and experiments are costly.
8
Unfortunately, current techniq ues o f meta-analysis do
little more than take weighted averages of the vari-
ous studies, thus averaging apples and oranges to infer
properties of bananas. One should be able to do bet-
ter. Using “selection diagrams” to encode commonal-
ities among studies, we should be able to “synthesize”
an estimator that is guaranteed to provide unbiased
estimate of the desired quantity based on information
that each study share with the target environment.
The basic idea is captured in the following definition
and theorem.
Definition 5. (Meta-identifiability):
A relation R is said to be “meta-identifiable” from a
set of populations
1
, Π
2
, . . . , Π
K
) to a target popu-
lation Π
iff it is identifiable from the information set
I = {I
1
), I
2
), . . . , I
K
), I
)}, where I
k
)
stands for the information provided by population Π
k
.
Theorem 2. (Meta-identifiability):
Given a set of studies {Π
1
, Π
2
, . . ., Π
K
} character-
ized by selection diagrams {D
1
, D
2
, . . . , D
K
} relative
to a target population Π
, a relation R
) is “meta-
identifiable” if it can be decomposed into a set of sub-
relations of the form:
R
k
= P (V
k
|do(W
k
), Z
k
) k = 1, 2, . . ., K
such that each R
k
is transportable from some D
k
.
Theorem 2 reduces the problem of Meta synthesis to
a set of transportability problems, and calls for a sys-
tematic way of decomposing R to fit the information
provided by I.
Exemplifying meta-synthesis
Consider the diagrams depicted in Fig. 8, each repre-
senting a study conducted on a different population
and under a different set of conditions. Solid circles
represent variables that were measured in the respec-
tive study and hollow circles va riables that remained
unmeasured. An arrow represents an external
influence affecting a mechanism by which the study
population is assumed to differ from the target popu-
lation Π
, shown in Fig. 8(a). For example, Fig. 8(c)
represents an observational study on population Π
c
in
which variables X, Z and Y were measured, W was not
measured and the prio r probability P
c
(z) differs from
that of the target population P
(z). Diagrams (b)–(f)
represent observational studies while (g)–(j) stand for
experimental studies with X randomized (hence the
missing arrows into X).
Despite differences in populations, measurements and
conditions, each of the studies may provide informa-
tion that bears o n the target relation R
) which, in
X YW
(c)
S
Z
X YW
X YW X YW
X YW
X YW
S
S
S
S
(a)
X YW
(d) (e)
(f)
Z Z
Z Z
Z
(b)
Z Z
Z
(g)
X YW
(h)
(i)
X YW
Figure 8: Diagrams representing 8 studies ((b)–(i))
conducted under different conditions on different pop-
ulations, aiming to estimate the causal effect of X on
Y in the target population, shown in 8(a).
this example, we take to be the causal effect of X on
Y , P
(y|do(x)) or; g iven the structure of Fig. 8(a),
R
) = P
(y|do(x)) =
X
z
P
(y|x, z)P
(z).
While R
) can be estimated directly from some of
the studies, (e.g., (g)) o r indirectl y from others (e.g.,
(b) and (d)), it cannot be estimated from those studies
in which the population differs substantially from Π
,
(e.g., (c), (e), (f)). The estimates of R provided by the
former studies may differ from each other due to sam-
pling variations and measurement errors, and can be
aggregated in the standard tradition of meta analysis.
The latter studies, however, should not be averaged
with the f ormer, since they do not provide unbiased
estimates of R. Still, they are not to tally useless, for
they can provide inf ormatio n that renders the former
estimates more accurate. For example, although we
cannot identify R f rom study 8(c), since P
c
(z) differs
from the unknown P
(z), we can nevertheless use the
estimates of P
c
(x|z), P
c
(y|z, x) that 8(c) provides to
improve the accuracy of P
(x|z) and P
(y|z, x)
2
which
may be needed for estimating R by indirect meth-
2
The absence of boxed arrows into X and Y in Fig. 8(c)
implies the equalities
P
c
(x|z) = P
(x|z) and P
c
(y|z, x) = P
(y|z, x).
9
ods. For example, P
(y|z, x) is needed in study 8(b) if
we use the estimator R =
P
z
P
(y|x, z)P
(z), while
P
(x|z) is needed if we use the inverse probability es-
timator R =
P
z
P
(x, y, z)/P
(x|z).
Similarly, consider the randomized studies depicted in
8(h) and 8(i). None is sufficient for identifying R in
isolation yet, taken together, they permit us to borrow
P
i
(w|do(x)) from 8(i) and P
h
(y|w, do(x)) from 8(h)
and synthesize a bias-free estimator:
R =
X
w
P
(y|w, do(x))P
(w|do(x))
=
X
w
P
h
(y|w, do(x)P
i
(w|do(x))
The challenge of meta synthesis is to take a co llection
of studies, annotated with their respective selection di-
agrams (as in Fig. 8), and construct an estimator of a
specified relation R
) that makes maximum use of
the samples availa ble, by exploi ting the commonalities
among the populations studied and the target popula-
tion Π
. As the relation R
) changes, the synthesi s
strategy will change a s well.
It is hard not to speculate that data-pooling strate-
gies based on the principles outlined here will one day
replace the blind methods currently used in meta anal-
ysis.
Knowledge-guided Domain Adaptation
It is commonly assumed that causal knowledge is nec-
essary only when interventions are contem plated and
that in purely predictive tasks, probabilistic knowl-
edge suffices. When dealing with generalization a cross
domains, however, causal knowledge can be valuable,
and in fact necessa ry even in predictive or classification
tasks.
The idea is simple; causal knowledge is essentially
knowledge about the mechanisms that remain invari-
ant under change. Suppose we learn a probability dis-
tribution P (x, y, z) in one environment and we ask how
this probability would change when we move to a new
environment that differs slightly from the former. If we
have knowledge of the causal mechanism g enerating P ,
we could then represent where we suspect the change
to occur and channel all our computational resources
to re-learn the local relationship that has changed o r
is likely to have changed, whil e keeping all other rela-
tionships invariant.
For example, assume that P
(x, y, z) is the probability
distribution in the target environment and our inter-
est lies i n estimating P
(x|z) knowing that the causal
diagram behind P is the chain X Y Z, and that
the process determining Y , represented by P
(y|x),
is the only one that changed. We can simply re-learn
P
(y|x) and estimate our target relation P
(x|z) with-
out measuring Z in the new environment. (This is
done using P
(x, y, z) = P (x)P
(y|x)P (z|y) with the
first and third terms transported from the source en-
vironment.)
In compl ex problems, the savings gained by focusing
on only a small subset of variables in P
can be enor-
mous, because any reduction in the number of mea-
sured vari ables translates into substantial reduction in
the number of samples needed to achieve a given level
of prediction a ccuracy.
As can be expected, for a given transported relation,
the subset of variables that can be ignored in the tar-
get environment is not unique, and should therefore
be chosen so as to minimize both measurement costs
and sampli ng variability. This opens up a host of new
theoretical questions about transportability in causal
graphs. For example, deciding if a relation is trans-
portable when we forbid measurement of a given sub-
set of variables in P
[
Pearl and Bareinboim, 2011
]
,
or deciding how to pool studies optimally when sam-
ple size varies drastically f rom study to study
[
Pearl,
2012
]
.
Conclusions
The do-calculus, which originated as a syntactic tool
for identificati on problems, was shown to benefit three
new areas of investigation. Its main power lies in re-
ducing to syntactic manipulations complex problems
concerning the estimabil ity of ca usal relations under a
variety of conditions.
In Section 2 we demonstrated that going beyond stan-
dard adjustment for covari ates, and unleashing the full
power of do-calculus, can lead to improved identifica-
tion power of natural direct and indirect effects. Sec-
tion 3 demonstrated how questi ons of transportabil-
ity can be reduced to symbolic derivations in the do-
calculus, yielding graph-based procedures for deciding
whether causal effects in the target population can be
inferred from experim ental findings in the study popu-
lation. When the answer is affirmative, the procedures
further identify what experimental and observational
findings need be obtained from the two populations,
and how they can be combined to ensure bias-free
transport.
In a related pro blem, Bareinboim a nd Pearl (2012b)
show how the do-calculus can be used to decide
whether the effect of X on Y can be estimated from ex-
periments on a different set, Z, that is more accessible
10
to manipulations. Here the aim is to transform do-
expressions to sentences that invoke only do(z) sym-
bols.
Finally, in Section 4, we tackled the problem of data
fusion and showed that principled fusion (which we
call meta-synthesis) can be reduced to a sequence of
syntactic op erations each involving a local transporta-
bility exercise. This task leaves many questions unset-
tled, because of the multiple ways a give relation can
be decomposed.
Acknowledgments
This research was supported in parts by grants from
NIH # 1R01 LM009961-01, NSF #I IS-1018922, a nd
ONR #N000-14-09-1-0665 and #N00014- 10-1-0933.
References
[
Bareinboim and Pearl, 2 012a
]
Bareinboim, E., and
Pearl, J. 2012a. Causal inference by surrogate
experiments: z-identifiability. Technical Report R-
397, Cognitive Systems Laboratory, Department of
Computer Science, UCLA. To appear in Proceedings
of the Twenty-Eighth Conference on U ncertainty in
Artificial Intelligence (UAI), 2012.
[
Bareinboim and Pearl, 2 012b
]
Bareinboim, E., and
Pearl, J. 2012b. Transportability of causal ef-
fects: Completeness results. Technical Report R-
390, Cognitive Systems Laboratory, Department of
Computer Science, UCLA. To appear in Proceedings
of the Twenty-Sixth AAAI Conference on Artificial
Intelligence (AAAI), 2012.
[
Huang and Valtorta, 2006
]
Huang, Y., and Valtorta,
M. 2006. Pearl’s calculus of intervention is com-
plete. In Dechter, R., a nd Richardson, T., eds., Pro-
ceedings of the Twenty-Second Conference on Un-
certainty in Artificial Intelligence. Corvalli s, OR :
AUAI Press. 217–224.
[
Imai, Jo, and Stuart, 2011
]
Imai, K.; Jo, B.; and Stu-
art, E. A. 2011 . Commentary: Using potential
outcomes to understand causal mediation analysis.
Multivariate Behavioral Research 46:842–854.
[
Imai, Keele, and Yamamoto, 2010
]
Imai, K.; Keele,
L.; and Yamamoto, T. 201 0. Identi fication, infer-
ence, and sensitivity analysis for causal mediation
effects. Statistical Science 25(1):51–71.
[
Pearl and Bareinboim, 2011
]
Pearl, J., and
Bareinboim, E. 2011. Transportability of
causal and statistical relations: A formal ap-
proach. In Proceedings of the T wenty-Fifth
Conference on Artificial Intelligence (AAAI-
11), 247–254. Menlo Park, CA: AAAI Press.
<http://ftp.cs.ucla.edu/pub/stat
ser/r372a.pdf>.
[
Pearl, 1995
]
Pearl, J. 1995. Causal diagrams for em-
pirical research. Biometrika 82(4):669–710.
[
Pearl, 2000
]
Pearl, J. 2000. Causality: Models, Rea-
soning, and Inference. New York: Cambridge Uni-
versity Press. Second ed., 2 009.
[
Pearl, 2001
]
Pearl, J. 2 001. Direct and indirect ef-
fects. In Proceedings of the Seventeenth Conference
on Uncertainty in Artificial Intelligence. San Fran-
cisco, CA: Morgan Kaufmann. 411–420 .
[
Pearl, 2009
]
Pearl, J. 2009. Causality: Models, Rea-
soning, and Inference. New York: Cambridge Uni-
versity Press, second edition.
[
Pearl, 2012
]
Pearl, J. 2012. Some thoughts concern-
ing transfer learning, with applications to meta-
analysi s and data-sharing estimation. Technical Re-
port R-387, Cognitive Systems Laborato ry, D epart-
ment of Computer Science, UCLA.
[
Robins and Greenland, 1992
]
Robins, J., and Green-
land, S. 1992. Identifiabili ty and exchangeabil ity fo r
direct and indirect effects. Epidemiology 3(2):143–
155.
[
Shpitser and Pearl, 2006
]
Shpitser, I., and Pearl, J.
2006. Identification of joint interventio nal distri-
butions in recursive semi-Markovian causal models.
In Proceedings of the Twenty-First National Con-
ference on Artificial Intelligence. Menlo Park, CA:
AAAI Press. 1219–1226.
[
Shpitser, 2012
]
Shpitser, I. 2012. Counterfactual
graphical models for mediation analysis via path-
specific effects. Technical report, Harvard Univer-
sity, MA.
[
Tian and Pearl, 2002
]
Tian, J., and Pearl, J. 2002. A
general identification condition for causal effects. In
Proceedings of the Eighteenth National Conference
on Artificial Intelligence. Menlo Park, CA: AAAI
Press/The MIT Press. 567–573.
[
Tian and Shpitser, 2010
]
Tian, J., and Shpitser, I.
2010. On identifying causal effects. In Dechter, R.;
Geffner, H.; and Halpern, J., eds. , Heuristics, Prob-
ability and Causality: A Tr ibute to Judea Pearl. UK:
College Publications. 415–444.
11