Instructions: To add a question/comment to a specific line, equation, table or graph simply click on it.
Click on the annotations on the left side of the paper to read and reply to the questions and comments.
>**TL;DR:** 1. The transition to an AI controlled era could be mor...
**Go, considered as the most complex board game ever invented and f...
*“Let us now assume, for the sake of argument, that these machines ...
Renowned scientists, such as Stephen Hawking and Stuart Russell bel...
Human-level AIs might not share human values. The risks associated ...
Elon Musk and Sam Altman and a group of Computer Scientists decided...
**One proposed solution for a better transition to human-level AIs ...
AIs could have none or a completely different definition of moralit...
We should raise awareness about AI and the threats it poses to huma...
http://journals.cambridge.orgDownloaded: 21 Apr 2014 IP address: 163.1.72.155
WHY WE NEED FRIENDLY AI
Luke Muehlhauser and N ick Bostrom
Humans will not always be the most intelligent
agents on Earth, the ones steering the future. What
will happen to us when we no longer play that role,
and how can we prepare for this transition?
The human level of intelligence is an evolutionary acci-
dent a small basecamp on a vast mountain side, far
below the highest ridges of intelligence allowed by physics.
If we were visited by extraterrestrials, these beings would
almost certainly be very much more intelligent and techno-
logically advanced than we are, and thus our future would
depend entirely on the content of their goals and desires.
But aliens are unlikely to make contact anytime soon. In
the near term, it seems more likely we will create our intel-
lectual successors. Computers far outperform humans in
many narrow niches (e.g. arithmetic and chess), and there
is reason to believe that similar large improvements over
human performance are possible for general reasoning and
technological development.
Though some doubt that machines can possess certain
mental properties like consciousness, the absence of such
mental properties would not prevent machines from becom-
ing vastly more able than humans to efficiently steer the
future in pursuit of their goals. As Alan Turing wrote, ...it
seems probable that once the machine thinking method
has started, it would not take long to outstrip our feeble
powers... At some stage therefore we should have to
expect the machines to take control...
There is, of course, a risk in passing control of the future
to machines, for they may not share our values. This risk is
increased by two factors that may cause the transition from
doi:10.1017/S1477175613000316
#
The Royal Institute of Philosophy, 2014
Think 36, Vol. 13 (Spring 2014)
Think Spring 2014
41
http://journals.cambridge.orgDownloaded: 21 Apr 2014 IP address: 163.1.72.155
human control to machine control to be quite sudden and
rapid: the possibilities of computing overhang and recursive
self-improvement.
What is computing overhang? Suppose that computing
power continues to double according to Moore’s law, but
figuring out the algorithms for human-like general intelli-
gence proves to be fiendishly difficult. When the software
for general intelligence is finally realized, there could exist
a ‘computing overhang’: tremendous amounts of cheap
computing power available to run human-level artificial intel-
ligences (AIs). AIs could be copied across the hardware
base, causing the AI popul ation to quickly surpass the
human population. These digital minds might run thou-
sands or millions of times faster than human minds. AIs
might have further advantages, such as superior communi-
cation speed, transparency and self-editability, goal coordi-
nation, and improved rationality.
And what is recursive self-improvement? We can predict
that advanced AIs will have instrumental goals to preserve
themselves, acquire resources, and self-improve, because
those goals are useful intermediaries to the achievement of
almost any set of final goals. Thus, wh en we build an AI
that is as skilled as we are at the task of designing AI systems,
we may thereby initiate a rapid, AI-motivated cascade of self-
improvement cycles. Now when the AI improves itself, it
improves the intelligence that does the improving, quickly
leaving the human level of intelligence far behind.
A superintelligent AI might thus quickly become superior
to humanity in harvesting resources, manufacturing, scienti-
fic discovery, social aptitude, and strategic action, among
other abilities. We might not be in a position to negotiate
with it or its descendants, just as chimpanzees are not in a
position to negotiate with humans.
At the same time, the convergent instrumental goal of
acquiring resources poses a threat to humanity, for it
means that a superintelligent machine with almost any final
goal (say, of solving the Riemann hypothesis) would want
to take the resources we depend on for its own use. Such
Muehlhauser and Bostrom Why We Need Friendly AI
42
http://journals.cambridge.orgDownloaded: 21 Apr 2014 IP address: 163.1.72.155
an AI ‘does not love you, nor does it hate you, but you are
made of atoms it can use for something else’.
1
Moreover,
the AI would correctly recognize that humans do not
want their resources used for the AI’s purposes, and that
humans therefore pose a threat to the fulfillment of its
goals a threat to be mitigated however possible.
But because we will create our own successors, we may
have the ability to influence their goals and make them
friendly to our concerns. The problem of encoding human
(or at least humane) values into an AI’s utility function is a
challenging one, but it may be possible. If we can build
such a ‘Friendly AI,’ we may not only avert catastrophe, but
also use the powers of machine superintelligence to do
enormous good.
Many scientific naturalists accept that machines can be
far more intelligent and powerful than humans, and that this
could pose a danger for the things we value. Still, they may
have objections to the line of thought we have developed
so far. Philosophe r David Chalmers has responded to
many of these objections;
2
we will respond to only a few of
them here.
First: why not just keep potentially dangerous AIs safely
confined, e.g. without access to the internet? This may
sound promising, but there are many complications.
3
In
general, such solutions would pit human intelligence
against superhuman intelligence, and we shouldn’t be con-
fident the former would prevail. Moreover, such methods
may only delay AI risk without preventing it. If one AI devel-
opment team has built a human-level or superintelligent AI
and successfully confined it, then other AI development
teams are probably not far behind them, and these other
teams may not be as cautious. Governments will recognize
that human-level AI is a powerful tool, and the race to be
the first nation with such a great advantage may incentivize
development speed over development safety. (Confinement
measures may, however, be useful as an extra precaution
during the development phase of safe AI.)
Think Spring 2014
43
http://journals.cambridge.orgDownloaded: 21 Apr 2014 IP address: 163.1.72.155
Second: some have suggested that advanced AIs greater
intelligence will cause them to be more moral than we are; in
that case, who are we to protest when they do not respect
our primitive values? That would be downright immoral!
Intelligent search for instrumentally optimal plans,
however, can be performed in the service of any goal.
Intelligence and motivation are in this sense logically
orthogonal axes along which possible artificial intellects
can vary freely. The imputed connection between intelli-
gence and morality is therefore sheer anthropomorphism.
(It is an anthropomorphism that does not even hold true for
humans: it is easy to find humans who are quite intelligent
but immoral, or who are unintelligent but thoroughly decent.)
Economist Robin Hanson suggests that inter-generational
conflicts analogous to the ones that could arise between
humans and machines are common. Generations old and new
compete for resour ces, and the older generation often wants
to control the values of the younger generation. The values of
the younger generation end up dominating as the older gener-
ation passes away. Must we be so selfish as to insist that the
values of Homo sapiens dominate the solar system forever?
Along a similar line, the roboticist Hans Moravec once
suggested that while we should expect that fu ture robotic
corporations will eventually overrun humanity and expropri-
ate our resources, we should th ink of these robotic descen-
dants as our ‘mind children.’ Framed in this way, Moravec
thought, the prospect might seem more attractive.
It must be said that a scenario in which the children kill
and cannibalize their parents is not everyone’s idea of a
happy family dynamic. But even if we were willing to sacri-
fice ourselves (and our fellow human beings?) for the sake
of some ‘greater good, we would still have to put in hard
work to ensure that the result would be something more
worthwhile than masses of computer hardware used only
to evaluate the Riemann hypothesis (or to calculate the
decimals of pi, or to manufacture as many paperclips as
possible, or some ot her arbitrary goal that might be easier
to specify than what humans value).
Muehlhauser and Bostrom Why We Need Friendly AI
44
http://journals.cambridge.orgDownloaded: 21 Apr 2014 IP address: 163.1.72.155
There is, however, one good reason not to insist that
superhuman machines be made to share all our current
values. Suppose that the ancient Greeks had been the
ones to face the transition from human to machine control,
and they coded their own values as the machines’ final
goal. From our perspective, this would have resulted in
tragedy, for we tend to believe we have seen moral pro-
gress since the Ancient Greeks (e.g. the prohibition of
slavery). But presumably we are still far from perfection.
We therefore need to allow for continued moral progress.
One proposed solution is to give machines an algorithm
for figuring out what our values would be if we knew more,
were wiser, were more the people we wished to be, and so
on. Philosophers have wrestled with this approach to the
theory of values for decades, and it may be a productive
solution for machine ethics.
Third: others object that we are too far from the transition
from human to machine control to work on the problem
now. But we must remember that economic incentives
favor development speed over development safety.
Moreover, our scientific curiosity can sometimes overwhelm
other considerations such as safety. To quote J. Robert
Oppenheimer, the physicist who headed the Manhattan
Project: ‘When you see something that is technically sweet,
you go ahead and do it and you argue about what to do
about it only after you have had your technical success.
That is the way it was with the atomic bomb.’
4
Still, one might ask: What can we do about the problem
of AI risk when we know so little about the design of future
AIs? For a start, we can do the kind of work currently per-
formed by the two research institutes currently working
most directly on this difficult problem: the Machine
Intelligence Research Institute in Berkeley and the Future
of Humanity Institute at Oxford University. This includes:
1. Strategic research. Which types of technological
development are risk-increasing or risk-
decreasing, and how can we encourage
Think Spring 2014
45
http://journals.cambridge.orgDownloaded: 21 Apr 2014 IP address: 163.1.72.155
governments and corporations to shift funding
from the former to the latter? What is the
expected value of certain kinds of research, or
of certain kinds of engagement with
governments and the public? What can we do
to reduce the risk of an AI arms race? How
does AI risk compare to risks from nuclear
weapons, biotechnology, near earth objects,
etc.? Can economic models predict anything
about the impact of AI technologies? Can we
develop technological forecasting methods
capable of giving advance warning of the
invention of AI?
2. Technical research. Can we develop safe
confinement methods for powerful AIs? How
can an agent with a desirable-to-humans utility
function maintain its desirable goals during
updates to the ontology over which it has
preferenc es? How can we extract a coherent
utility function from inconsistent human
behavior, and use it to inform an AI’s own
utility function? Can we develop an advanced
AI that will answer questions but not manifest
the dangerous capacities of a superintelligent
agent?
3. Raising awareness . Outreach to researchers,
philanthropists, and the public can attract more
monetary and human capital with which to
attack the problem.
The quest for AI has come a long way. Computer scientists
and other researchers must begin to take the implications
of AI more seriously.
Luke Muehlhauser is Executive Director of the Machine
Intelligence Research Institute. Nick Bostrom is Director of
the Future of Humanity Institute at the University of Oxford.
luke@intelligence.org
Muehlhauser and Bostrom Why We Need Friendly AI
46
http://journals.cambridge.orgDownloaded: 21 Apr 2014 IP address: 163.1.72.155
Notes
1
E. Yudkowsky, ‘AI as a positive and a negative factor in
global risk’, Global Catastrophic Risks (eds) N. Bostrom and
M. Cirkovic (New York: Oxford University Press, 2008).
2
D. Chalmers, ‘The Singularity: a reply to commentators’,
Journal of Consciousness Studies, vol. 19, nos. 78 (2012),
141167.
3
S. Armstrong, A. Sandberg, N. Bostrom, ‘Thinking inside
the box: Using and controlling Oracle AI’, Minds and
Machines, vol. 22, no. 4 (2012), 299 324.
4
Robert Jungk, Brighter than a Thousand Suns: A Personal
History of the Atomic Scientists, trans. Lames Cleugh
(New York: Harcourt Harvest, 1958), 296.
Think Spring 2014
47

Discussion

Elon Musk and Sam Altman and a group of Computer Scientists decided to found OpenAI with the goal of creating **friendly AIs**. OpeAI is a non-profit artificial intelligence research company whose mission is to build safe and friendly AIs. Their focus is creating a positive long-term human impact and ensure AI's benefits are as widely and evenly distributed as possible. You can learn more about here: [About OpenAI](https://openai.com/about/) >**TL;DR:** 1. The transition to an AI controlled era could be more abrupt than we expect 2. Human-level AIs could have goals of self-improvement independent of human values 3. Confinement of potentially dangerous AIs does not solve the problem 4. We need to code Morals and Ethics into AIs but leave degrees of freedom for morals to evolve 5. Prepare the transition as soon as possible by raising awareness and focusing research on friendly AIs. We should raise awareness about AI and the threats it poses to humanity, focus on developing friendly AIs and prepare the transition as soon as possible. This type of work as already started with research institutions such as the [Machine Intelligence Research Institute in Berkeley](https://intelligence.org/) and [Future of Humanity Institute at Oxford University](https://www.fhi.ox.ac.uk/) and other companies like [Open AI](http://openai.com). **One proposed solution for a better transition to human-level AIs is Confinment, i.e. not giving AIs access to the internet.** Some arguments against the confinement of AIs are: 1. It depends on all AI-research groups being careful on keeping their AIs confined 2. When governments understand the power of AIs and the advantage it represents they will incentivize developemtn speed over safety. I'd like to leave a comment regarding the use of a *utility function* as the exclusive guideline to what a machine will do. To me this seems to be compatible only to a strictly utilitarian world view, and I feel as though the all-consuming general AI is just Nozick's [utility monster](https://en.wikipedia.org/wiki/Utility_monster) thought experiment made real. Which is to say that a utility function is probably not the ideal way to build *general AI* (although it works very well for specific AI applications, such as classification and learning algorithms). Human-level AIs might not share human values. The risks associated with passing control to human-level AIs are aggravated by the two following factors: - **Computing Overhang:** Assume that computing power continues to increase according to Moore's Law and it takes us more time than expected to create human-level AIs. When we finally develop the human-level AIs there could be an excess computing power available that would allow AIs quickly replicate themselves and thus surpass the human population very abruptly. - **Recursive Self-improvement:** Assume that AIs have goals to preserve themselves and self-improve. When AIs start replicating and improving themselves their intelligence level could quickly surpass human-level intellingence. Their level of intelligence could be so far superior to ours that it would be impossible for us to negotiate with them as it is impossible for chimpanzees to negotiate with humans. A superintelligent AIs will want to pursue their own goals independently of humans. **Go, considered as the most complex board game ever invented and famously difficult for computers to crack has already been mastered by computers.** In 2015 AlphaGo first defeated a professional human player, and then went on to beat one of the Go world's top players, Lee Sedol. You can read more about it here: - [Google reveals secret test of AI bot to beat top Go players](http://www.nature.com/news/google-reveals-secret-test-of-ai-bot-to-beat-top-go-players-1.21253) - [Google’s AI Wins Fifth And Final Game Against Go Genius Lee Sedol](https://www.wired.com/2016/03/googles-ai-wins-fifth-final-game-go-genius-lee-sedol/) - [AlphaGo](https://en.wikipedia.org/wiki/AlphaGo) Renowned scientists, such as Stephen Hawking and Stuart Russell believe that AI poses a significant threat to the human species. If human-level AIs have the ability to re-design and improve themselves at an ever-increasing rate then we could witness an **unstoppable "intelligence explosion" could lead to human extinction.** AIs could have none or a completely different definition of morality. We need to code some of our values in the AIs but we do not want to dictate all of their values because morals tend to evolve and improve with time. A possible solution to this problem would be to give machines an algorithm for figuring out what human values would be if we were wiser. This is an interesting philosophical problem and it could be a good solution for machine moral/ethics. I think that the formulation of this risk is often made a bit too strong. Yes, the risk exists, but the development and deployment of a general AI would not automatically imply all of this. That assumes that several well-known challenges and limitations in distributed systems are solved by general AI. Assuming that such general AI is deployed, there is no guarantee that a population of (small) independent AIs is as "good" as a single super-intelligent and highly distributed AI (running everywhere at the same time, ala Skynet). A highly distributed AI would need to deal with limitations such as the [CAP Theorem](https://en.wikipedia.org/wiki/CAP_theorem), while a population of AIs would need to deal with potentially malicious instances of the "AI population" (formally speaking, they'd need things like [Byzantine Fault Tolerance](https://en.wikipedia.org/wiki/Byzantine_fault_tolerance)). I'm curious to know whether there is any conceptual work in this direction (what are the fundamental computational limitations of general AI?). Is there actual research happening in this area? I'd be very interested to read more about that. *“Let us now assume, for the sake of argument, that these machines are a genuine possibility, and look at the consequences of constructing them. To do so would of course meet with great opposition, unless we have advanced greatly in religious toleration from the days of Galileo. There would be great opposition from the intellectuals who were afraid of being put out of a job. It is probable though that the intellectuals would be mistaken about this. There would be plenty to do, trying to understand what the machines were trying to say, i.e. in trying to keep one’s intelligence up to the standard set by the machines, for it seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers. There would be no question of the machines dying, and they would be able to converse with each other to sharpen their wits. At some stage therefore we should have to expect the machines to take control, in the way that is mentioned in Samuel Butler’s ‘Erewhon’.”* **– Alan Turing, “Intelligent Machinery, A heretical theory”** You can read Turing's paper here: [Intelligent Machinery, A heretical theory](http://fermatslibrary.com/s/intelligent-machinery-a-heretical-theory)