Fermat's Library | Categorizing Variants of Goodhart’s Law annotated/explained version.

If you like this paper, also check out this work by David Manheim t...

**Goodhart's law** is often stated as, "When a measure becomes a ta...

This is a nice summary of this paper in laymen's terms: https://www...

>> The four categories of Goodhart eﬀects introduced by Garrabrant ...

Goodhart effects indeed occur in many settings, and understanding t...

arXiv:1803.04585v4 [cs.AI] 24 Feb 2019

Categorizing Variants of Goodhart’s Law

David Manheim Scott Garrabrant

davidmanheim@gmail.com scott@intelligence.org

February 26, 2019

There are several distinct failure modes for overoptimization of systems on

the basis of metrics. This occurs when a metric which can be used to improve

a system is used to such an extent that further optimization is ineﬀective or

harmful, and is sometimes termed Goodhart’s Law

. This class of failure is

often poorly understood, partly because terminology for discussing them is am-

biguous, and partly because discussion using this ambiguous terminology ignores

distinctions between diﬀerent failure modes of this general type.

This paper expands on an earlier discussion by Garrabrant [

2], which notes

there are “(at least) four diﬀerent mechanisms” that r elate to Goodhart’s Law.

This paper is intended to explore these mechanisms further, and specify more

clearly how they occur. This discussion should be helpful in better understand-

ing these types of failures in economic regulation, in public policy, in machine

learning, and in artiﬁcial intelligence alignment[

4]. The impor tance of Goodhart

eﬀects de pends on the amount of power directed towards optimizing the proxy,

and so the increased optimizationpower oﬀered by artiﬁcial intelligence makes

it especially critical for that ﬁeld.

Varieties of Goodhart-like Phenomena

As used in this paper, a Goodhart eﬀect is when optimization causes a col-

lapse o f the statistical relationship between a goal which the optimizer intends

and the proxy used for that goal. The four ca tegories of Goodhart eﬀects in-

troduced by Garrabr ant are 1) Regressional, where selection for an imperfect

As a historical note, Goodharts Law [

1] as originally formulated states that “any ob-

served statistical regularity will tend to collapse once pressure is placed upon it for control

purposes.” This has been interpreted and explained more widely, perhaps to the point where

it is ambiguous what the term m eans. O ther closely related formulations, such as Campbell’s

law (which arguably has scholarly precedence[

3] ) and the Lucas critique, were also initially

speciﬁc, and their interpretation has also been expanded greatly. Lastly, the Cobra Eﬀect and

perverse incentives are often closely related to these failures, and the diﬀerent eﬀects interact.

Because none of the terms were laid out formally, the categories proposed do not match what

was originally discussed. A separate forthcoming paper intends to address the relationship

between those formulations and the categories more formally explained here.

proxy nec essarily also selects for noise, 2) Extremal, where selection for the met-

ric pushes the sta te distribution into a region where old relationships no longer

hold, 3) Causal, where an action on the part of the regulator causes the col-

lapse, and 4) Adversarial, where an agent with diﬀerent goals than the regulator

causes the collapse. These varied forms often occur together, but deﬁning them

individually is useful. In doing so, this paper introduces and explains several

sub categories which diﬀer in important ways.

To formalize the intuitive description, we consider a system S with a set

of possible sta tes s ∈ S. For the initial discussion we focus on a single actor,

the re gulator, who inﬂuences the system by selecting a permissable re gion of

the state-spac e , A ⊆ S. For this discussion, we will use Goal to refer to the

true goal of the regulator, which is a mapping fr om states G(s) → R for s ∈ S.

Because regulators have incomplete knowledge, they cannot act based on G(s)

and instead act only on a proxy M (s) → R for s ∈ S

. For simplicity’s sake,

we will consider actions where the regulator chooses some threshold c and the

permissible states are deﬁned such that s ∈ A if M (s) ≥ c. This creates a

selection pressure that allows the ﬁrst two Goodhart-like eﬀects to occur

1 Regressional Goodhart

Regressional Goodhart - When selecting for a proxy measure, you select not

only for the true goal, but also for the diﬀerence between the proxy and

the goa l. This is also known as “Tails come apart.” [

Simple Model:

M = G + normal(µ, σ

) (1)

Due to the noise, a point with a large M value will likely have a large G value,

but also a large noise value. Thus, when M is larg e , you can exp ect G to be

predictably smaller than M. Des pite the lack of bias, for large values of c the

values of G when M > c is expected to be higher than otherwise. While this

is the simplest Goodhart eﬀect, it is also the most fundamental: it cannot be

avoided. No matter wha t measure is chosen for optimization, an inexa c t metr ic

necessarily leads to a divergence between the goal and the metric in the tail.

2 Extremal Goodhart

Extremal Goodhart - Worlds in which the proxy takes an extreme value

may be very diﬀerent from the ordinary worlds in which the relationship

In general, a mapping from s → R is a measaure, and if used for decision-making, it

is known as a metric. T he current presentation assumes a single-dimensional case. Use of

multiple metrics and restrictions follows similar dynamics, but for discussing Goodhart eﬀects

the single dimensional case is ideal.

This restriction of the available s tates is one form of selection pressure. There are other

forms of selection pressure which can apply, but these are unnecessary for presenting the

basic dynamics. For example, we often ﬁnd that the states are chosen probabilistically and

the distribution can be inﬂuenced to prefer certain regions. One important such case is

evolutionary selection, where the most li kely states are generated based on a set of states

selected in a previous generation.

between the proxy and the goal was observed. A form of this occurs occurs

in statistics and machine learning as “out of sample prediction.”

Extremal Goodhart can occur in two ways. First, the proxy may have been

simpliﬁed based on either an insuﬃcient number of observations to uncover the

true r elationship, or because a simpler measure was preferred. In either case,

model insuﬃciency c an lead to extremal Goo dhart. Secondly, the measure may

be a correct estimator only in certain regions of the space. In regions where the

generating process diﬀers, the use of the measure leads to extremal Goodha rt.

Extremal Goodhart - Model Insuﬃciency - The metric of interest is based

on a learned relationship be tween the go al and the metric which is approx-

imately accurate in the initia l region. Selection pressure moves the metr ic

away from the region in which the relationship is most accura te so that

the relatio nship collapses.

Simple Model:

M = G(s

) + G

′

) (2)

In the case of model insuﬃciency, the error is induced by model simpliﬁcation

and inaccuracy, and not a fundamental issue with the ina bility to build an

suﬃciently accurate estimator. In the simplest case, model insuﬃciency involves

incomplete understanding of an empirically learned relationship. Even though

we as sume the relationship is the same within and without the observed rang e ,

it is still often more diﬃcult to detect the ex act functional relationship bas e d on

a limited range. More commonly, model insuﬃciency occurs beca use complex

systems are approximated using observations of the non- optimized system space.

An example of this phenomenon in public policy is when a relationship is

simpliﬁed for use as a measure without recognizing how selection press ure ap-

plied to the metric makes the simpliﬁcation problematic. In machine learning

this often happens due to underﬁtting, such as when a relationship is assumed

to be a low degree polynomial because higher order polynomial terms are small

in the observed region. Selection on the basis of the approximated metric moves

towards re gions where the higher-order terms are more important, so that use

of the machine learning system crea tes a Goodhart eﬀect.

Extremal Goodhart - Change in Regime - The proxy M may be related

to G diﬀerently in diﬀerent regions. E ven if the correct relationship is

learned for the observed region, in the region where the proxy takes an

extreme value the relationship to the goal may be fundamentally diﬀerent.

Selection on the basis of the proxy moves into such a region.

Simple Model:

G =

(

M + x, where M <= a

M + y, where M > a

(3)

The cas e of re gime change can be due to measurement error so that er rors

are systemically incorrect in a given region. Alternatively, it can be due to

a generating process that diﬀers in deﬀerent regions . An example of the

ﬁrst case is where wind-speed measurements may be systematically biased

downwards when the wind-speed excee ds the design tolerances of the in-

struments. In this case, we expect that the relationship between measur ed

windsp eed and wind damage to systematically diﬀer for measured wind-

speeds at and above the maximum tolerance. Any selection pressure for

the s econd case would cause entremal Goodhart to occur when a boundary

is reached. Fo r example, the relationship between wind speeds a nd height

undergoes a change above the atmospheric boundary layer. Even w he n

the correct relationship is learned for this lower altitude, the relationship

changes above it.

In the above cases of regressional and extremal Goodhart, there are issues

with selection pressure for even simple systems when proxies are used. These

two classes require selection pressure, but do no t involve an intervention on

the part of the regulator. When a regulator intervenes in the causal sense, it

changes the state-s pace and allows an additional type of error that depends on

the causal structure o f the intervention.

3 Causal Goodhart

Causal Goodhart - When the causal path between the proxy and the goal is

indirect, intervening can change the relationship between the measure and

proxy. If a regulator intervenes to maximize a metric , the causal pathway

can change such that the proxy no longer tracks the goal. In such cases,

extreme interventions c an be less eﬀective than more moderate ones, and

further selection or intervention can be co unterproductive.

In the earlier cases, the corre lational issues found in Goodhart-like eﬀects ex-

ist regardless of causal structure. Here, however, a mode l that ignores causal

structure is problematic when the regulator’s actions themselves change the re-

lationships. These take the form of Goodhart-like problems, but can be better

discussed in terms of the eﬀect of regulator actions on the causal structure. As a

simple example, a r egulator attempting to us e a windspeed model to build wind-

mills must be careful not to ignore the eﬀect the windmills themselves have on

wind-speeds. Building many windmills in a given area can alter the wind-speed

in the region e nough to inva lidate the earlier relatio nship.

Interesting ly, this does not require any uncertainty, nor does it require an

incorrect understanding of the relationships, unlike earlier cases. Instead, the

eﬀect is induced by the regulator’s a c tion. The exact way in which this occurs

can vary, as discussed below.

Three Causal Goodhart Eﬀects

There are a three general cases which lead to causal Goodhart, as well as several

others that may appear similar, but are actually regressional or extremal Good-

hart eﬀects. The below diagram illustrates three classes of causal Goodhart

eﬀects. In each, the regulato r attempts to act based on a correlation between

the measure and the goal by intervening. T he node chosen fo r intervention is

shaded.

Goal

Metric

Shared Cause

Goal

Metric

Intermediary

Goal

Metric

Metric Manipulation

Shared Cause Intervention - T he regulator intervenes on a shared cause of

the metric and the goal.

Simple Model: The regulator sets the shared cause X to a maximal value.

The Metric no longer has a causal relationship to the Goal. The relationship

between Metric and Goal now consists entirely of the combination of the error

terms between each and X.

Shared cause intervention may restrict both the Goal and the Metric to a

region favored by the regulator, at the co st of changing the metric’s relatio nship

to the goal in a way that may b e counterproductive for further optimization.

Suppose, for example, that there are two tests administered to students which

are correlated. If a teacher trains skills related to test taking genera lly, one of the

traits which make the students likely to do well on both tests changes. Because

of this, the remaining relationship between scores on the two tests is due to other

factors, and the new c orrelation between scores is likely to be lower than the

earlier correlation. This does not imply that the overall outcome is worse than

without the optimization, but more extreme interventions can be less eﬀective

than less extreme ones. For this rea son, further selection or intervention on the

basis of the metric can be counterproductive.

Intermediary Intervention - The regulator intervenes on a variable in the

causal chain connecting Goal to Metric.

Simple Model: The regulator sets X to a speciﬁc value. The relationship be-

tween the Goal and Metric now no longer exists; they are independent variables.

This does not necessarily aﬀect the Goal at all, but can serve to increase the

value of the metric.

Metric Manipulation - The regulator intervenes to set the Metric, without

aﬀecting other nodes. (In this ca se, it does not matter how the goal was

related to X, as indicated in the ﬁgure.)

Simple Model: The regulator sets M to a speciﬁc value. The relationship

between Metric and Goal now no longer holds.

Imagine, to give another example, that a teacher changes test scores or

grades

. This doesn’t contribute to the goal of learning, it simply changes M so

that it is useless (or, if only some s cores are changed, less useful) in measuring

The re ason for causal Goodhart e ﬀects can either b e due to mistaken or

missing understanding of ca usal eﬀects, or simple lack of foresight. As with

extremal Goodhart eﬀects, mistakes a bout the causal connection can be due to

insuﬃcient data, or a change in regime. Importantly, the same mistakes can

also lead to extremal or (worsened) correlational Goodhart eﬀects if there is

selection instead of intervention.

Non-Causal Goodhart Eﬀects in Causal Systems

In the below three cases, the ﬁgures show where the regulator mistakes the

form of the causal relationship. T he dashed lines a re the true causal path, and

the dotted lines are the ass umed ones. If the regulator attempts to select on

the basis of the assumed model, it can lead to worsened regressional Goodha rt

eﬀects or extremal Goodhart eﬀects, but not c ausal Goodhart eﬀects.

Goal

Metric

Ignored Shared Cause

Goal

Metric

Ignored Intermediary

Ignored Shared Cause - The regulator assumes the relationship between the

Goal and Metric is a causal chain, or is direct, but there is instead a shared

cause. When ignoring X, the metr ic is valid as a proxy for the goal.

Simple Model:

Metric ∼ normal(X, σ

)

Goal ∼ normal(X, σ

)

(4)

However, because of the conditional independence of Goa l and Metric, for

any ﬁxed value of X they are uncorrelated. This means that our metric is

less valid if there is se le ction. With selection of a ﬁxed X, the case resembles

extremal Goodha rt, since the regime changed.

This is ass um ing the teacher is the regulator - if the teacher is an agent, this is a diﬀerent

case, explored below.

Ignored Intermediary - The regulator assumes the relationship between the

Goal and Metric is direct, but an intermediary ex ists which creates an

additional source of noise.

Without intervention, the causal mistake here adds a term to the error, leading

to the earlier cases of regressional and extremal Goodhart eﬀects.

Goal

Metric

Ignored Metric Cause

Goal

Metric

Ignored Goal Cause

Ignored Additional C ause - The metric is cause d by multiple factors, of

which the goal relates to only some. Alternatively, the goal is caused by

multiple factors, of which the metric relates to only some. (No Figure

provided.)

Simple Model:

Metric ∼ normal(X + Goal, σ

) (5)

Goal ∼ normal(X + M etric, σ

) (6)

If X has a distribution that matches the distribution of error of the assumed re-

lationship between X and the metric, this leads to worsened regressional Good-

hart eﬀects as deﬁned above, because additional noise is due to X. In this

simple model case the g oal is caused by the metric, for example height caus ing

windsp eed. More commonly, the metric would be caused by some Y , which also

causes the Goal. In such a case, the metric might be measured height, where

the measurement itself c an have error. On the other hand, if X is distributed

diﬀerently than the assumed distribution of error in the relationship between

Goal and Metric, there is a model insuﬃciency issue. If X contains s ome nonlin-

earity, it is a regime change. The e arlier examples of extremal Goodhar t eﬀects

are of these types. (More complex errors can lead to similar mistakes.)

4 Adversarial Goodhart

The discussion of causal relationships is important for a diﬀerent class o f sit-

uations where there are multiple actors. There are numerous such dynamics,

but they can be classiﬁed as happening in one of two cases . First, the actor

may have goals which the regulator is unaware of (or insuﬃciently wary about)

and the agent can act indepe ndently of the regulator in a way that adversely

aﬀects the regulator’s goal. We refer to such cases a s adversarial misalignment.

Second, the regulator can use incentive s to align the agent’s goals and use their

actions as a way to optimize, while not acting themself. We refer to these as

“Cobra eﬀects.” In all of these cases, which were called “adversarial Goodhart”

in the previous work, other a gents react in diﬀerent ways to create Goodhart-

like eﬀects following any of the earlier paradigms. We introduce and formulate

several speciﬁc cases of each.

Adversarial Misalignment Goodhart - The agent applies selection pres-

sure knowing the regulator will a pply diﬀerent selection pressure on the

basis of the metric

The adversarial misalignment failure can occur due to the agent creating ex-

tremal Goodhart eﬀects, or by exacerbating always-present regressional Good-

hart, or due to caus al intervention by the agent which changes the eﬀect of the

regulator optimization.

Campbell’s Law - Age nts select a metric knowing the choice of regula tor met-

ric. Agents ca n correla te their metric with the regulator’s metric, and

select on their metric

. This fur ther reduces the usefulness of selection

using the metric for acheiving the original goal.

Simple Model: G

, M

are the regulator’s goal and metric. G

, M

are the

agent’s go al and metric. (The agent sele cts M

after seeing the regulator choice

of metric.)

= G

+ X

= G

· X

(7)

Here, the agent selects for values with high M

, and the regulator’s later se-

lection then creates a relationship between X and their goal, especially at the

extremes. The agent does this by selecting for a metric such that even weak

selection on M

hijacks the regulator’s selection on M

to acheive their goal.

The agent choice of metric need not be a useful proxy for their goal absent the

regulator’s action. In the example given, if X ∼ normal(µ, σ

), the correlation

between G

and M

is zer o over the full set of states, but becomes positive on

the subspace sele c ted by the regulator.

There are clearly further dy namics worth exploring, but this case serves

to introduce the issues involve d in adversarial conﬂict over metrics without

incentives. When the regulator attempts to regulate via incentives, a new set

of cases can occur, which are generally related to what is know as the “Cobra

Eﬀect.” [

7] This is named after a supposed s ituation in colonial India where

British authorities oﬀered a reward for dead cobr as. Instead of hunting cobras,

This is the case most closely related to Campbell’s law[

6], which was originally stated

as “The more any quantitative social indicator is used for social decision-making, the more

subject it w ill be to corruption pressur es and the more apt it will be to distort and corrupt

the social pro cesses it is intended to monitor.” In this case, the choice of measurement aﬀects

the outcome because agents attempt to corrupt the measure.

They could instead alter causal structure to create the same eﬀect. It seems unclear that

this diﬀerence is cr itical to the dynamics considered.

however, some people bred and killed their own cobras to kill in o rder to receive

rewards. This not only failed to achieve the goal, but led to more cobras than

before the reward was oﬀered. Because of the (supposedly) historical meaning

and popular usage, we diﬀerentiate b etween cobra eﬀects tha t ﬁt this model,

which we will call normal cobra eﬀects, and ones where the agent applies pressure

in a non-causal fashion to create Goodhart eﬀects, which we call non-c ausal

cobra eﬀects.

Normal Cobra Eﬀect - The regulator modiﬁes the agent goal, usually via an

incentive, to correlate it with the regulator metric. The agent then acts by

changing the observed causal structure due to incompletely aligned goals

in a way that cr eates a Goodhart eﬀect.

Simple Model: The agent ﬁnds cause Y (an “Ignored Additional Cause”)

which is unobserved or constant in the structure the regulator initially consid-

ered, and the agent changes the value of Y in order to maximize the metric.

(This is a form of causal Goodhart using metric manipulation.)

Goal

Metric

Cobra Eﬀect

= G

+ M

(8)

The Cobra eﬀect can also occur via an agent action that creates any of the

above-mentioned causal Goodhart eﬀects, namely shar ed cause, intermediary,

or metric manipulation.

Non-Causal Cobra Eﬀect - The regulator modiﬁes the agent goal to make

agent actions aligned with the regulator’s metric. Under selection pressure

from the agent, extremal Goodhart eﬀects occur or regressional Goodhart

eﬀects are worsened.

Simple Model: G

, M

are the regulator’s go al and metric. G

and G

are the agent’s goal be fore and after regulator modiﬁcation.

= G

+ M

(9)

When the agent applies selection pressure, it creates a Goodhart eﬀect on the

regulator metric.

Conclusion

This paper represents an attempt to categorize a cla ss of the simple statistical

misalignments that occur both in any algorithmic system used for optimization,

and in many human systems that rely on metrics for optimization. The dy-

namics highlighted are hopefully useful to explain many situations of interest in

policy design, in machine learning, and in speciﬁc questions about AI alignment.

In policy, these dynamics are commonly e ncounter ed but too-rarely discussed

clearly. In machine learning, these errors include extremal Goodhart eﬀects due

to using limited data and choosing overly parsimonious models, errors that oc-

cur due to myopic consideration of goa ls, and mistakes that occur when ignoring

causality in a system. Finally, in AI alignment, thes e issues are fundamental to

both aligning systems towards a goal, and a ssuring that the system’s metrics

do not have perverse eﬀects once the system begins optimizing for them.

References

[1] Charles E. Goodhart Problems of Monetary Management: The U.K. Ex-

perience 1975. Papers in Monetary E conomics. Reserve Bank of Australia.

[2] Scott Garrabrant Goodhart Taxonomy December 30, 2017. Lesserwrong.

https://www.le sserwrong.com/posts/EbFABnst8LsidYs5Y/goodha rt-tax onomy

[3] Jeﬀery Roda mar, U.S. Department of Education There ought to be a law!

Campbell v. Goodhart Signiﬁcance, Vol. 15, 2018. doi: 1 0.1111/j.174 0-

9713.2018.01205.x

[4] Nick Bostrom. Superintelligence: Paths, Dangers, Strategies, 2014.

[5] Gregor y Lewis (Thrasymachus) Why the tails come apart Lesswrong.com,

2014

http://lesswrong.com/lw/km6/why the tails come apart/

and The Polemical Medic, thepolemicalmedic.com, 2015.

http://www.thepolemicalmedic.com/why-the-tails-come-apart/

[6] Donald T. Campbell ”Assessing the impact of planned social change Evalu-

ation and Program Planning. 2 (1): 6790. doi:10.1016/0149-7189(79)90048-

[7] Horst Siebert Der Kobra-Eﬀekt. Wie man Irrwege der Wirtschaftspolitik

vermeidet 2001. Munich: Deutsche Verlags-Anstalt. ISBN 3-421-05562- 9

This is a nice summary of this paper in laymen's terms: https://www.holistics.io/blog/four-types-goodharts-law/ **Goodhart's law** is often stated as, "When a measure becomes a target, it ceases to be a good measure”, and it is named after the British economist Charles Goodhart, who is credited with expressing it in a 1975 article on monetary policy in the United Kingdom. At the time, "Goodhart's law was used to criticize the British Thatcher government for trying to conduct monetary policy on the basis of targets for broad and narrow money, but the law reflects a much more general phenomenon." Source: https://en.wikipedia.org/wiki/Goodhart%27s_law If you like this paper, also check out this work by David Manheim titled Building Less Flawed Metrics: https://mpra.ub.uni-muenchen.de/90649/1/MPRA_paper_90649.pdf >> The four categories of Goodhart eﬀects introduced by Garrabrant are 1) Regressional, where selection for an imperfect proxy necessarily also selects for noise, 2) Extremal, where selection for the metric pushes the state distribution into a region where old relationships no longer hold, 3) Causal, where an action on the part of the regulator causes the collapse, and 4) Adversarial, where an agent with diﬀerent goals than the regulator causes the collapse. These varied forms often occur together, but deﬁning them individually is useful. In doing so, this paper introduces and explains several sub categories which diﬀer in important ways. Goodhart effects indeed occur in many settings, and understanding these common failure modes can be extremely helpful: >> "The dynamics highlighted are hopefully useful to explain many situations of interest in policy design, in machine learning, and in speciﬁc questions about AI alignment. In policy, these dynamics are commonly e ncounter ed but too-rarely discussed clearly. In machine learning, these errors include extremal Goodhart eﬀects due to using limited data and choosing overly parsimonious models, errors that occur due to myopic consideration of goa ls, and mistakes that occur when ignoring causality in a system. Finally, in AI alignment, thes e issues are fundamental to both aligning systems towards a goal, and a ssuring that the system’s metrics do not have perverse eﬀects once the system begins optimizing for them."

Comments

Products

Project