If you like this paper, also check out this work by David Manheim t...
**Goodhart's law** is often stated as, "When a measure becomes a ta...
This is a nice summary of this paper in laymen's terms: https://www...
>> The four categories of Goodhart effects introduced by Garrabrant ...
Goodhart effects indeed occur in many settings, and understanding t...
arXiv:1803.04585v4 [cs.AI] 24 Feb 2019
Categorizing Variants of Goodhart’s Law
David Manheim Scott Garrabrant
davidmanheim@gmail.com scott@intelligence.org
February 26, 2019
There are several distinct failure modes for overoptimization of systems on
the basis of metrics. This occurs when a metric which can be used to improve
a system is used to such an extent that further optimization is ineffective or
harmful, and is sometimes termed Goodhart’s Law
1
. This class of failure is
often poorly understood, partly because terminology for discussing them is am-
biguous, and partly because discussion using this ambiguous terminology ignores
distinctions between different failure modes of this general type.
This paper expands on an earlier discussion by Garrabrant [
2], which notes
there are (at least) four different mechanisms” that r elate to Goodhart’s Law.
This paper is intended to explore these mechanisms further, and specify more
clearly how they occur. This discussion should be helpful in better understand-
ing these types of failures in economic regulation, in public policy, in machine
learning, and in artificial intelligence alignment[
4]. The impor tance of Goodhart
effects de pends on the amount of power directed towards optimizing the proxy,
and so the increased optimizationpower oered by artificial intelligence makes
it especially critical for that field.
Varieties of Goodhart-like Phenomena
As used in this paper, a Goodhart effect is when optimization causes a col-
lapse o f the statistical relationship between a goal which the optimizer intends
and the proxy used for that goal. The four ca tegories of Goodhart effects in-
troduced by Garrabr ant are 1) Regressional, where selection for an imperfect
1
As a historical note, Goodharts Law [
1] as originally formulated states that “any ob-
served statistical regularity will tend to collapse once pressure is placed upon it for control
purposes.” This has been interpreted and explained more widely, perhaps to the point where
it is ambiguous what the term m eans. O ther closely related formulations, such as Campbell’s
law (which arguably has scholarly precedence[
3] ) and the Lucas critique, were also initially
specific, and their interpretation has also been expanded greatly. Lastly, the Cobra Effect and
perverse incentives are often closely related to these failures, and the different effects interact.
Because none of the terms were laid out formally, the categories proposed do not match what
was originally discussed. A separate forthcoming paper intends to address the relationship
between those formulations and the categories more formally explained here.
1
proxy nec essarily also selects for noise, 2) Extremal, where selection for the met-
ric pushes the sta te distribution into a region where old relationships no longer
hold, 3) Causal, where an action on the part of the regulator causes the col-
lapse, and 4) Adversarial, where an agent with different goals than the regulator
causes the collapse. These varied forms often occur together, but defining them
individually is useful. In doing so, this paper introduces and explains several
sub categories which differ in important ways.
To formalize the intuitive description, we consider a system S with a set
of possible sta tes s S. For the initial discussion we focus on a single actor,
the re gulator, who influences the system by selecting a permissable re gion of
the state-spac e , A S. For this discussion, we will use Goal to refer to the
true goal of the regulator, which is a mapping fr om states G(s) R for s S.
Because regulators have incomplete knowledge, they cannot act based on G(s)
and instead act only on a proxy M (s) R for s S
2
. For simplicity’s sake,
we will consider actions where the regulator chooses some threshold c and the
permissible states are defined such that s A if M (s) c. This creates a
selection pressure that allows the first two Goodhart-like effects to occur
3
.
1 Regressional Goodhart
Regressional Goodhart - When selecting for a proxy measure, you select not
only for the true goal, but also for the difference between the proxy and
the goa l. This is also known as “Tails come apart.” [
5]
Simple Model:
M = G + normal(µ, σ
2
) (1)
Due to the noise, a point with a large M value will likely have a large G value,
but also a large noise value. Thus, when M is larg e , you can exp ect G to be
predictably smaller than M. Des pite the lack of bias, for large values of c the
values of G when M > c is expected to be higher than otherwise. While this
is the simplest Goodhart effect, it is also the most fundamental: it cannot be
avoided. No matter wha t measure is chosen for optimization, an inexa c t metr ic
necessarily leads to a divergence between the goal and the metric in the tail.
2 Extremal Goodhart
Extremal Goodhart - Worlds in which the proxy takes an extreme value
may be very different from the ordinary worlds in which the relationship
2
In general, a mapping from s R is a measaure, and if used for decision-making, it
is known as a metric. T he current presentation assumes a single-dimensional case. Use of
multiple metrics and restrictions follows similar dynamics, but for discussing Goodhart effects
the single dimensional case is ideal.
3
This restriction of the available s tates is one form of selection pressure. There are other
forms of selection pressure which can apply, but these are unnecessary for presenting the
basic dynamics. For example, we often find that the states are chosen probabilistically and
the distribution can be influenced to prefer certain regions. One important such case is
evolutionary selection, where the most li kely states are generated based on a set of states
selected in a previous generation.
2
between the proxy and the goal was observed. A form of this occurs occurs
in statistics and machine learning as “out of sample prediction.”
Extremal Goodhart can occur in two ways. First, the proxy may have been
simplified based on either an insufficient number of observations to uncover the
true r elationship, or because a simpler measure was preferred. In either case,
model insufficiency c an lead to extremal Goo dhart. Secondly, the measure may
be a correct estimator only in certain regions of the space. In regions where the
generating process differs, the use of the measure leads to extremal Goodha rt.
Extremal Goodhart - Model Insufficiency - The metric of interest is based
on a learned relationship be tween the go al and the metric which is approx-
imately accurate in the initia l region. Selection pressure moves the metr ic
away from the region in which the relationship is most accura te so that
the relatio nship collapses.
Simple Model:
M = G(s
i
) + G
(s
i
) (2)
In the case of model insufficiency, the error is induced by model simplification
and inaccuracy, and not a fundamental issue with the ina bility to build an
sufficiently accurate estimator. In the simplest case, model insufficiency involves
incomplete understanding of an empirically learned relationship. Even though
we as sume the relationship is the same within and without the observed rang e ,
it is still often more difficult to detect the ex act functional relationship bas e d on
a limited range. More commonly, model insufficiency occurs beca use complex
systems are approximated using observations of the non- optimized system space.
An example of this phenomenon in public policy is when a relationship is
simplified for use as a measure without recognizing how selection press ure ap-
plied to the metric makes the simplification problematic. In machine learning
this often happens due to underfitting, such as when a relationship is assumed
to be a low degree polynomial because higher order polynomial terms are small
in the observed region. Selection on the basis of the approximated metric moves
towards re gions where the higher-order terms are more important, so that use
of the machine learning system crea tes a Goodhart effect.
Extremal Goodhart - Change in Regime - The proxy M may be related
to G differently in different regions. E ven if the correct relationship is
learned for the observed region, in the region where the proxy takes an
extreme value the relationship to the goal may be fundamentally different.
Selection on the basis of the proxy moves into such a region.
Simple Model:
G =
(
M + x, where M <= a
M + y, where M > a
(3)
The cas e of re gime change can be due to measurement error so that er rors
are systemically incorrect in a given region. Alternatively, it can be due to
3
a generating process that differs in defferent regions . An example of the
first case is where wind-speed measurements may be systematically biased
downwards when the wind-speed excee ds the design tolerances of the in-
struments. In this case, we expect that the relationship between measur ed
windsp eed and wind damage to systematically differ for measured wind-
speeds at and above the maximum tolerance. Any selection pressure for
the s econd case would cause entremal Goodhart to occur when a boundary
is reached. Fo r example, the relationship between wind speeds a nd height
undergoes a change above the atmospheric boundary layer. Even w he n
the correct relationship is learned for this lower altitude, the relationship
changes above it.
In the above cases of regressional and extremal Goodhart, there are issues
with selection pressure for even simple systems when proxies are used. These
two classes require selection pressure, but do no t involve an intervention on
the part of the regulator. When a regulator intervenes in the causal sense, it
changes the state-s pace and allows an additional type of error that depends on
the causal structure o f the intervention.
3 Causal Goodhart
Causal Goodhart - When the causal path between the proxy and the goal is
indirect, intervening can change the relationship between the measure and
proxy. If a regulator intervenes to maximize a metric , the causal pathway
can change such that the proxy no longer tracks the goal. In such cases,
extreme interventions c an be less effective than more moderate ones, and
further selection or intervention can be co unterproductive.
In the earlier cases, the corre lational issues found in Goodhart-like effects ex-
ist regardless of causal structure. Here, however, a mode l that ignores causal
structure is problematic when the regulator’s actions themselves change the re-
lationships. These take the form of Goodhart-like problems, but can be better
discussed in terms of the effect of regulator actions on the causal structure. As a
simple example, a r egulator attempting to us e a windspeed model to build wind-
mills must be careful not to ignore the eect the windmills themselves have on
wind-speeds. Building many windmills in a given area can alter the wind-speed
in the region e nough to inva lidate the earlier relatio nship.
Interesting ly, this does not require any uncertainty, nor does it require an
incorrect understanding of the relationships, unlike earlier cases. Instead, the
effect is induced by the regulator’s a c tion. The exact way in which this occurs
can vary, as discussed below.
Three Causal Goodhart Effects
There are a three general cases which lead to causal Goodhart, as well as several
others that may appear similar, but are actually regressional or extremal Good-
hart effects. The below diagram illustrates three classes of causal Goodhart
4
effects. In each, the regulato r attempts to act based on a correlation between
the measure and the goal by intervening. T he node chosen fo r intervention is
shaded.
X
Goal
Metric
Shared Cause
Goal
X
Metric
Intermediary
X
Goal
Goal
Metric
Metric Manipulation
Shared Cause Intervention - T he regulator intervenes on a shared cause of
the metric and the goal.
Simple Model: The regulator sets the shared cause X to a maximal value.
The Metric no longer has a causal relationship to the Goal. The relationship
between Metric and Goal now consists entirely of the combination of the error
terms between each and X.
Shared cause intervention may restrict both the Goal and the Metric to a
region favored by the regulator, at the co st of changing the metric’s relatio nship
to the goal in a way that may b e counterproductive for further optimization.
Suppose, for example, that there are two tests administered to students which
are correlated. If a teacher trains skills related to test taking genera lly, one of the
traits which make the students likely to do well on both tests changes. Because
of this, the remaining relationship between scores on the two tests is due to other
factors, and the new c orrelation between scores is likely to be lower than the
earlier correlation. This does not imply that the overall outcome is worse than
without the optimization, but more extreme interventions can be less effective
than less extreme ones. For this rea son, further selection or intervention on the
basis of the metric can be counterproductive.
Intermediary Intervention - The regulator intervenes on a variable in the
causal chain connecting Goal to Metric.
Simple Model: The regulator sets X to a specific value. The relationship be-
tween the Goal and Metric now no longer exists; they are independent variables.
This does not necessarily affect the Goal at all, but can serve to increase the
value of the metric.
Metric Manipulation - The regulator intervenes to set the Metric, without
affecting other nodes. (In this ca se, it does not matter how the goal was
related to X, as indicated in the figure.)
5
Simple Model: The regulator sets M to a specific value. The relationship
between Metric and Goal now no longer holds.
Imagine, to give another example, that a teacher changes test scores or
grades
4
. This doesn’t contribute to the goal of learning, it simply changes M so
that it is useless (or, if only some s cores are changed, less useful) in measuring
G.
The re ason for causal Goodhart e ffects can either b e due to mistaken or
missing understanding of ca usal effects, or simple lack of foresight. As with
extremal Goodhart effects, mistakes a bout the causal connection can be due to
insufficient data, or a change in regime. Importantly, the same mistakes can
also lead to extremal or (worsened) correlational Goodhart effects if there is
selection instead of intervention.
Non-Causal Goodhart Effects in Causal Systems
In the below three cases, the figures show where the regulator mistakes the
form of the causal relationship. T he dashed lines a re the true causal path, and
the dotted lines are the ass umed ones. If the regulator attempts to select on
the basis of the assumed model, it can lead to worsened regressional Goodha rt
effects or extremal Goodhart effects, but not c ausal Goodhart effects.
X
Goal
Metric
Ignored Shared Cause
Goal
X
Metric
Ignored Intermediary
Ignored Shared Cause - The regulator assumes the relationship between the
Goal and Metric is a causal chain, or is direct, but there is instead a shared
cause. When ignoring X, the metr ic is valid as a proxy for the goal.
Simple Model:
Metric normal(X, σ
2
m
)
Goal normal(X, σ
2
g
)
(4)
However, because of the conditional independence of Goa l and Metric, for
any fixed value of X they are uncorrelated. This means that our metric is
less valid if there is se le ction. With selection of a fixed X, the case resembles
extremal Goodha rt, since the regime changed.
4
This is ass um ing the teacher is the regulator - if the teacher is an agent, this is a different
case, explored below.
6
Ignored Intermediary - The regulator assumes the relationship between the
Goal and Metric is direct, but an intermediary ex ists which creates an
additional source of noise.
Without intervention, the causal mistake here adds a term to the error, leading
to the earlier cases of regressional and extremal Goodhart effects.
Goal
Metric
X
Ignored Metric Cause
Goal
Metric
X
Ignored Goal Cause
Ignored Additional C ause - The metric is cause d by multiple factors, of
which the goal relates to only some. Alternatively, the goal is caused by
multiple factors, of which the metric relates to only some. (No Figure
provided.)
Simple Model:
Metric normal(X + Goal, σ
2
m
) (5)
or
Goal normal(X + M etric, σ
2
g
) (6)
If X has a distribution that matches the distribution of error of the assumed re-
lationship between X and the metric, this leads to worsened regressional Good-
hart effects as defined above, because additional noise is due to X. In this
simple model case the g oal is caused by the metric, for example height caus ing
windsp eed. More commonly, the metric would be caused by some Y , which also
causes the Goal. In such a case, the metric might be measured height, where
the measurement itself c an have error. On the other hand, if X is distributed
differently than the assumed distribution of error in the relationship between
Goal and Metric, there is a model insufficiency issue. If X contains s ome nonlin-
earity, it is a regime change. The e arlier examples of extremal Goodhar t effects
are of these types. (More complex errors can lead to similar mistakes.)
4 Adversarial Goodhart
The discussion of causal relationships is important for a different class o f sit-
uations where there are multiple actors. There are numerous such dynamics,
but they can be classified as happening in one of two cases . First, the actor
may have goals which the regulator is unaware of (or insufficiently wary about)
and the agent can act indepe ndently of the regulator in a way that adversely
affects the regulator’s goal. We refer to such cases a s adversarial misalignment.
Second, the regulator can use incentive s to align the agent’s goals and use their
actions as a way to optimize, while not acting themself. We refer to these as
7
“Cobra effects.” In all of these cases, which were called “adversarial Goodhart”
in the previous work, other a gents react in different ways to create Goodhart-
like effects following any of the earlier paradigms. We introduce and formulate
several specific cases of each.
Adversarial Misalignment Goodhart - The agent applies selection pres-
sure knowing the regulator will a pply different selection pressure on the
basis of the metric
5
.
The adversarial misalignment failure can occur due to the agent creating ex-
tremal Goodhart effects, or by exacerbating always-present regressional Good-
hart, or due to caus al intervention by the agent which changes the effect of the
regulator optimization.
Campbell’s Law - Age nts select a metric knowing the choice of regula tor met-
ric. Agents ca n correla te their metric with the regulator’s metric, and
select on their metric
6
. This fur ther reduces the usefulness of selection
using the metric for acheiving the original goal.
Simple Model: G
R
, M
R
are the regulator’s goal and metric. G
A
, M
A
are the
agent’s go al and metric. (The agent sele cts M
A
after seeing the regulator choice
of metric.)
M
R
= G
R
+ X
M
A
= G
A
· X
(7)
Here, the agent selects for values with high M
A
, and the regulator’s later se-
lection then creates a relationship between X and their goal, especially at the
extremes. The agent does this by selecting for a metric such that even weak
selection on M
A
hijacks the regulator’s selection on M
R
to acheive their goal.
The agent choice of metric need not be a useful proxy for their goal absent the
regulator’s action. In the example given, if X normal(µ, σ
2
), the correlation
between G
A
and M
A
is zer o over the full set of states, but becomes positive on
the subspace sele c ted by the regulator.
There are clearly further dy namics worth exploring, but this case serves
to introduce the issues involve d in adversarial conflict over metrics without
incentives. When the regulator attempts to regulate via incentives, a new set
of cases can occur, which are generally related to what is know as the “Cobra
Effect.” [
7] This is named after a supposed s ituation in colonial India where
British authorities offered a reward for dead cobr as. Instead of hunting cobras,
5
This is the case most closely related to Campbell’s law[
6], which was originally stated
as “The more any quantitative social indicator is used for social decision-making, the more
subject it w ill be to corruption pressur es and the more apt it will be to distort and corrupt
the social pro cesses it is intended to monitor.” In this case, the choice of measurement affects
the outcome because agents attempt to corrupt the measure.
6
They could instead alter causal structure to create the same effect. It seems unclear that
this difference is cr itical to the dynamics considered.
8
however, some people bred and killed their own cobras to kill in o rder to receive
rewards. This not only failed to achieve the goal, but led to more cobras than
before the reward was offered. Because of the (supposedly) historical meaning
and popular usage, we differentiate b etween cobra effects tha t fit this model,
which we will call normal cobra effects, and ones where the agent applies pressure
in a non-causal fashion to create Goodhart effects, which we call non-c ausal
cobra effects.
Normal Cobra Effect - The regulator modifies the agent goal, usually via an
incentive, to correlate it with the regulator metric. The agent then acts by
changing the observed causal structure due to incompletely aligned goals
in a way that cr eates a Goodhart effect.
Simple Model: The agent finds cause Y (an “Ignored Additional Cause”)
which is unobserved or constant in the structure the regulator initially consid-
ered, and the agent changes the value of Y in order to maximize the metric.
(This is a form of causal Goodhart using metric manipulation.)
Goal
Metric
Y
Cobra Effect
G
A
R
= G
A
0
+ M
R
(8)
The Cobra effect can also occur via an agent action that creates any of the
above-mentioned causal Goodhart effects, namely shar ed cause, intermediary,
or metric manipulation.
Non-Causal Cobra Effect - The regulator modifies the agent goal to make
agent actions aligned with the regulator’s metric. Under selection pressure
from the agent, extremal Goodhart effects occur or regressional Goodhart
effects are worsened.
Simple Model: G
R
, M
R
are the regulator’s go al and metric. G
A
0
and G
A
R
are the agent’s goal be fore and after regulator modification.
G
A
R
= G
A
0
+ M
R
(9)
When the agent applies selection pressure, it creates a Goodhart effect on the
regulator metric.
Conclusion
This paper represents an attempt to categorize a cla ss of the simple statistical
misalignments that occur both in any algorithmic system used for optimization,
9
and in many human systems that rely on metrics for optimization. The dy-
namics highlighted are hopefully useful to explain many situations of interest in
policy design, in machine learning, and in specific questions about AI alignment.
In policy, these dynamics are commonly e ncounter ed but too-rarely discussed
clearly. In machine learning, these errors include extremal Goodhart effects due
to using limited data and choosing overly parsimonious models, errors that oc-
cur due to myopic consideration of goa ls, and mistakes that occur when ignoring
causality in a system. Finally, in AI alignment, thes e issues are fundamental to
both aligning systems towards a goal, and a ssuring that the system’s metrics
do not have perverse effects once the system begins optimizing for them.
References
[1] Charles E. Goodhart Problems of Monetary Management: The U.K. Ex-
perience 1975. Papers in Monetary E conomics. Reserve Bank of Australia.
I.
[2] Scott Garrabrant Goodhart Taxonomy December 30, 2017. Lesserwrong.
https://www.le sserwrong.com/posts/EbFABnst8LsidYs5Y/goodha rt-tax onomy
[3] Jeffery Roda mar, U.S. Department of Education There ought to be a law!
Campbell v. Goodhart Significance, Vol. 15, 2018. doi: 1 0.1111/j.174 0-
9713.2018.01205.x
[4] Nick Bostrom. Superintelligence: Paths, Dangers, Strategies, 2014.
[5] Gregor y Lewis (Thrasymachus) Why the tails come apart Lesswrong.com,
2014
http://lesswrong.com/lw/km6/why the tails come apart/
and The Polemical Medic, thepolemicalmedic.com, 2015.
http://www.thepolemicalmedic.com/why-the-tails-come-apart/
[6] Donald T. Campbell Assessing the impact of planned social change Evalu-
ation and Program Planning. 2 (1): 6790. doi:10.1016/0149-7189(79)90048-
X
[7] Horst Siebert Der Kobra-Effekt. Wie man Irrwege der Wirtschaftspolitik
vermeidet 2001. Munich: Deutsche Verlags-Anstalt. ISBN 3-421-05562- 9
10

Discussion

This is a nice summary of this paper in laymen's terms: https://www.holistics.io/blog/four-types-goodharts-law/ **Goodhart's law** is often stated as, "When a measure becomes a target, it ceases to be a good measure”, and it is named after the British economist Charles Goodhart, who is credited with expressing it in a 1975 article on monetary policy in the United Kingdom. At the time, "Goodhart's law was used to criticize the British Thatcher government for trying to conduct monetary policy on the basis of targets for broad and narrow money, but the law reflects a much more general phenomenon." Source: https://en.wikipedia.org/wiki/Goodhart%27s_law If you like this paper, also check out this work by David Manheim titled Building Less Flawed Metrics: https://mpra.ub.uni-muenchen.de/90649/1/MPRA_paper_90649.pdf >> The four categories of Goodhart effects introduced by Garrabrant are 1) Regressional, where selection for an imperfect proxy necessarily also selects for noise, 2) Extremal, where selection for the metric pushes the state distribution into a region where old relationships no longer hold, 3) Causal, where an action on the part of the regulator causes the collapse, and 4) Adversarial, where an agent with different goals than the regulator causes the collapse. These varied forms often occur together, but defining them individually is useful. In doing so, this paper introduces and explains several sub categories which differ in important ways. Goodhart effects indeed occur in many settings, and understanding these common failure modes can be extremely helpful: >> "The dynamics highlighted are hopefully useful to explain many situations of interest in policy design, in machine learning, and in specific questions about AI alignment. In policy, these dynamics are commonly e ncounter ed but too-rarely discussed clearly. In machine learning, these errors include extremal Goodhart effects due to using limited data and choosing overly parsimonious models, errors that occur due to myopic consideration of goa ls, and mistakes that occur when ignoring causality in a system. Finally, in AI alignment, thes e issues are fundamental to both aligning systems towards a goal, and a ssuring that the system’s metrics do not have perverse effects once the system begins optimizing for them."