Fermat's Library | Why We Need Friendly AI annotated/explained version.

>**TL;DR:** 1. The transition to an AI controlled era could be mor...

**Go, considered as the most complex board game ever invented and f...

*“Let us now assume, for the sake of argument, that these machines ...

Renowned scientists, such as Stephen Hawking and Stuart Russell bel...

Human-level AIs might not share human values. The risks associated ...

Elon Musk and Sam Altman and a group of Computer Scientists decided...

**One proposed solution for a better transition to human-level AIs ...

AIs could have none or a completely different definition of moralit...

We should raise awareness about AI and the threats it poses to huma...

http://journals.cambridge.orgDownloaded: 21 Apr 2014 IP address: 163.1.72.155

WHY WE NEED FRIENDLY AI

Luke Muehlhauser and N ick Bostrom

Humans will not always be the most intelligent

agents on Earth, the ones steering the future. What

will happen to us when we no longer play that role,

and how can we prepare for this transition?

The human level of intelligence is an evolutionary acci-

dent – a small basecamp on a vast mountain side, far

below the highest ridges of intelligence allowed by physics.

If we were visited by extraterrestrials, these beings would

almost certainly be very much more intelligent and techno-

logically advanced than we are, and thus our future would

depend entirely on the content of their goals and desires.

But aliens are unlikely to make contact anytime soon. In

the near term, it seems more likely we will create our intel-

lectual successors. Computers far outperform humans in

many narrow niches (e.g. arithmetic and chess), and there

is reason to believe that similar large improvements over

human performance are possible for general reasoning and

technological development.

Though some doubt that machines can possess certain

mental properties like consciousness, the absence of such

mental properties would not prevent machines from becom-

ing vastly more able than humans to efficiently steer the

future in pursuit of their goals. As Alan Turing wrote, ‘...it

seems probable that once the machine thinking method

has started, it would not take long to outstrip our feeble

powers... At some stage therefore we should have to

expect the machines to take control...’

There is, of course, a risk in passing control of the future

to machines, for they may not share our values. This risk is

increased by two factors that may cause the transition from

doi:10.1017/S1477175613000316

The Royal Institute of Philosophy, 2014

Think 36, Vol. 13 (Spring 2014)

Think Spring 2014

†

http://journals.cambridge.orgDownloaded: 21 Apr 2014 IP address: 163.1.72.155

human control to machine control to be quite sudden and

rapid: the possibilities of computing overhang and recursive

self-improvement.

What is computing overhang? Suppose that computing

power continues to double according to Moore’s law, but

figuring out the algorithms for human-like general intelli-

gence proves to be fiendishly difficult. When the software

for general intelligence is finally realized, there could exist

a ‘computing overhang’: tremendous amounts of cheap

computing power available to run human-level artificial intel-

ligences (AIs). AIs could be copied across the hardware

base, causing the AI popul ation to quickly surpass the

human population. These digital minds might run thou-

sands or millions of times faster than human minds. AIs

might have further advantages, such as superior communi-

cation speed, transparency and self-editability, goal coordi-

nation, and improved rationality.

And what is recursive self-improvement? We can predict

that advanced AIs will have instrumental goals to preserve

themselves, acquire resources, and self-improve, because

those goals are useful intermediaries to the achievement of

almost any set of final goals. Thus, wh en we build an AI

that is as skilled as we are at the task of designing AI systems,

we may thereby initiate a rapid, AI-motivated cascade of self-

improvement cycles. Now when the AI improves itself, it

improves the intelligence that does the improving, quickly

leaving the human level of intelligence far behind.

A superintelligent AI might thus quickly become superior

to humanity in harvesting resources, manufacturing, scienti-

fic discovery, social aptitude, and strategic action, among

other abilities. We might not be in a position to negotiate

with it or its descendants, just as chimpanzees are not in a

position to negotiate with humans.

At the same time, the convergent instrumental goal of

acquiring resources poses a threat to humanity, for it

means that a superintelligent machine with almost any final

goal (say, of solving the Riemann hypothesis) would want

to take the resources we depend on for its own use. Such

Muehlhauser and Bostrom Why We Need Friendly AI

†

http://journals.cambridge.orgDownloaded: 21 Apr 2014 IP address: 163.1.72.155

an AI ‘does not love you, nor does it hate you, but you are

made of atoms it can use for something else’.

Moreover,

the AI would correctly recognize that humans do not

want their resources used for the AI’s purposes, and that

humans therefore pose a threat to the fulfillment of its

goals – a threat to be mitigated however possible.

But because we will create our own successors, we may

have the ability to influence their goals and make them

friendly to our concerns. The problem of encoding human

(or at least humane) values into an AI’s utility function is a

challenging one, but it may be possible. If we can build

such a ‘Friendly AI,’ we may not only avert catastrophe, but

also use the powers of machine superintelligence to do

enormous good.

Many scientific naturalists accept that machines can be

far more intelligent and powerful than humans, and that this

could pose a danger for the things we value. Still, they may

have objections to the line of thought we have developed

so far. Philosophe r David Chalmers has responded to

many of these objections;

we will respond to only a few of

them here.

First: why not just keep potentially dangerous AIs safely

confined, e.g. without access to the internet? This may

sound promising, but there are many complications.

general, such solutions would pit human intelligence

against superhuman intelligence, and we shouldn’t be con-

fident the former would prevail. Moreover, such methods

may only delay AI risk without preventing it. If one AI devel-

opment team has built a human-level or superintelligent AI

and successfully confined it, then other AI development

teams are probably not far behind them, and these other

teams may not be as cautious. Governments will recognize

that human-level AI is a powerful tool, and the race to be

the first nation with such a great advantage may incentivize

development speed over development safety. (Confinement

measures may, however, be useful as an extra precaution

during the development phase of safe AI.)

Think Spring 2014

†

http://journals.cambridge.orgDownloaded: 21 Apr 2014 IP address: 163.1.72.155

Second: some have suggested that advanced AIs’ greater

intelligence will cause them to be more moral than we are; in

that case, who are we to protest when they do not respect

our primitive values? That would be downright immoral!

Intelligent search for instrumentally optimal plans,

however, can be performed in the service of any goal.

Intelligence and motivation are in this sense logically

orthogonal axes along which possible artificial intellects

can vary freely. The imputed connection between intelli-

gence and morality is therefore sheer anthropomorphism.

(It is an anthropomorphism that does not even hold true for

humans: it is easy to find humans who are quite intelligent

but immoral, or who are unintelligent but thoroughly decent.)

Economist Robin Hanson suggests that inter-generational

conflicts analogous to the ones that could arise between

humans and machines are common. Generations old and new

compete for resour ces, and the older generation often wants

to control the values of the younger generation. The values of

the younger generation end up dominating as the older gener-

ation passes away. Must we be so selfish as to insist that the

values of Homo sapiens dominate the solar system forever?

Along a similar line, the roboticist Hans Moravec once

suggested that while we should expect that fu ture robotic

corporations will eventually overrun humanity and expropri-

ate our resources, we should th ink of these robotic descen-

dants as our ‘mind children.’ Framed in this way, Moravec

thought, the prospect might seem more attractive.

It must be said that a scenario in which the children kill

and cannibalize their parents is not everyone’s idea of a

happy family dynamic. But even if we were willing to sacri-

fice ourselves (and our fellow human beings?) for the sake

of some ‘greater good,’ we would still have to put in hard

work to ensure that the result would be something more

worthwhile than masses of computer hardware used only

to evaluate the Riemann hypothesis (or to calculate the

decimals of pi, or to manufacture as many paperclips as

possible, or some ot her arbitrary goal that might be easier

to specify than what humans value).

Muehlhauser and Bostrom Why We Need Friendly AI

†

http://journals.cambridge.orgDownloaded: 21 Apr 2014 IP address: 163.1.72.155

There is, however, one good reason not to insist that

superhuman machines be made to share all our current

values. Suppose that the ancient Greeks had been the

ones to face the transition from human to machine control,

and they coded their own values as the machines’ final

goal. From our perspective, this would have resulted in

tragedy, for we tend to believe we have seen moral pro-

gress since the Ancient Greeks (e.g. the prohibition of

slavery). But presumably we are still far from perfection.

We therefore need to allow for continued moral progress.

One proposed solution is to give machines an algorithm

for figuring out what our values would be if we knew more,

were wiser, were more the people we wished to be, and so

on. Philosophers have wrestled with this approach to the

theory of values for decades, and it may be a productive

solution for machine ethics.

Third: others object that we are too far from the transition

from human to machine control to work on the problem

now. But we must remember that economic incentives

favor development speed over development safety.

Moreover, our scientific curiosity can sometimes overwhelm

other considerations such as safety. To quote J. Robert

Oppenheimer, the physicist who headed the Manhattan

Project: ‘When you see something that is technically sweet,

you go ahead and do it and you argue about what to do

about it only after you have had your technical success.

That is the way it was with the atomic bomb.’

Still, one might ask: What can we do about the problem

of AI risk when we know so little about the design of future

AIs? For a start, we can do the kind of work currently per-

formed by the two research institutes currently working

most directly on this difficult problem: the Machine

Intelligence Research Institute in Berkeley and the Future

of Humanity Institute at Oxford University. This includes:

1. Strategic research. Which types of technological

development are risk-increasing or risk-

decreasing, and how can we encourage

Think Spring 2014

†

http://journals.cambridge.orgDownloaded: 21 Apr 2014 IP address: 163.1.72.155

governments and corporations to shift funding

from the former to the latter? What is the

expected value of certain kinds of research, or

of certain kinds of engagement with

governments and the public? What can we do

to reduce the risk of an AI arms race? How

does AI risk compare to risks from nuclear

weapons, biotechnology, near earth objects,

etc.? Can economic models predict anything

about the impact of AI technologies? Can we

develop technological forecasting methods

capable of giving advance warning of the

invention of AI?

2. Technical research. Can we develop safe

confinement methods for powerful AIs? How

can an agent with a desirable-to-humans utility

function maintain its desirable goals during

updates to the ontology over which it has

preferenc es? How can we extract a coherent

utility function from inconsistent human

behavior, and use it to inform an AI’s own

utility function? Can we develop an advanced

AI that will answer questions but not manifest

the dangerous capacities of a superintelligent

agent?

3. Raising awareness . Outreach to researchers,

philanthropists, and the public can attract more

monetary and human capital with which to

attack the problem.

The quest for AI has come a long way. Computer scientists

and other researchers must begin to take the implications

of AI more seriously.

Luke Muehlhauser is Executive Director of the Machine

Intelligence Research Institute. Nick Bostrom is Director of

the Future of Humanity Institute at the University of Oxford.

luke@intelligence.org

Muehlhauser and Bostrom Why We Need Friendly AI

†

http://journals.cambridge.orgDownloaded: 21 Apr 2014 IP address: 163.1.72.155

Notes

E. Yudkowsky, ‘AI as a positive and a negative factor in

global risk’, Global Catastrophic Risks (eds) N. Bostrom and

M. Cirkovic (New York: Oxford University Press, 2008).

D. Chalmers, ‘The Singularity: a reply to commentators’,

Journal of Consciousness Studies, vol. 19, nos. 7–8 (2012),

141–167.

S. Armstrong, A. Sandberg, N. Bostrom, ‘Thinking inside

the box: Using and controlling Oracle AI’, Minds and

Machines, vol. 22, no. 4 (2012), 299 –324.

Robert Jungk, Brighter than a Thousand Suns: A Personal

History of the Atomic Scientists, trans. Lames Cleugh

(New York: Harcourt Harvest, 1958), 296.

Think Spring 2014

†

Discussion

**One proposed solution for a better transition to human-level AIs is Confinment, i.e. not giving AIs access to the internet.** Some arguments against the confinement of AIs are: 1. It depends on all AI-research groups being careful on keeping their AIs confined 2. When governments understand the power of AIs and the advantage it represents they will incentivize developemtn speed over safety. Renowned scientists, such as Stephen Hawking and Stuart Russell believe that AI poses a significant threat to the human species. If human-level AIs have the ability to re-design and improve themselves at an ever-increasing rate then we could witness an **unstoppable "intelligence explosion" could lead to human extinction.** AIs could have none or a completely different definition of morality. We need to code some of our values in the AIs but we do not want to dictate all of their values because morals tend to evolve and improve with time. A possible solution to this problem would be to give machines an algorithm for figuring out what human values would be if we were wiser. This is an interesting philosophical problem and it could be a good solution for machine moral/ethics. *“Let us now assume, for the sake of argument, that these machines are a genuine possibility, and look at the consequences of constructing them. To do so would of course meet with great opposition, unless we have advanced greatly in religious toleration from the days of Galileo. There would be great opposition from the intellectuals who were afraid of being put out of a job. It is probable though that the intellectuals would be mistaken about this. There would be plenty to do, trying to understand what the machines were trying to say, i.e. in trying to keep one’s intelligence up to the standard set by the machines, for it seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers. There would be no question of the machines dying, and they would be able to converse with each other to sharpen their wits. At some stage therefore we should have to expect the machines to take control, in the way that is mentioned in Samuel Butler’s ‘Erewhon’.”* **– Alan Turing, “Intelligent Machinery, A heretical theory”** You can read Turing's paper here: [Intelligent Machinery, A heretical theory](http://fermatslibrary.com/s/intelligent-machinery-a-heretical-theory) >**TL;DR:** 1. The transition to an AI controlled era could be more abrupt than we expect 2. Human-level AIs could have goals of self-improvement independent of human values 3. Confinement of potentially dangerous AIs does not solve the problem 4. We need to code Morals and Ethics into AIs but leave degrees of freedom for morals to evolve 5. Prepare the transition as soon as possible by raising awareness and focusing research on friendly AIs. We should raise awareness about AI and the threats it poses to humanity, focus on developing friendly AIs and prepare the transition as soon as possible. This type of work as already started with research institutions such as the [Machine Intelligence Research Institute in Berkeley](https://intelligence.org/) and [Future of Humanity Institute at Oxford University](https://www.fhi.ox.ac.uk/) and other companies like [Open AI](http://openai.com). **Go, considered as the most complex board game ever invented and famously difficult for computers to crack has already been mastered by computers.** In 2015 AlphaGo first defeated a professional human player, and then went on to beat one of the Go world's top players, Lee Sedol. You can read more about it here: - [Google reveals secret test of AI bot to beat top Go players](http://www.nature.com/news/google-reveals-secret-test-of-ai-bot-to-beat-top-go-players-1.21253) - [Google’s AI Wins Fifth And Final Game Against Go Genius Lee Sedol](https://www.wired.com/2016/03/googles-ai-wins-fifth-final-game-go-genius-lee-sedol/) - [AlphaGo](https://en.wikipedia.org/wiki/AlphaGo) Elon Musk and Sam Altman and a group of Computer Scientists decided to found OpenAI with the goal of creating **friendly AIs**. OpeAI is a non-profit artificial intelligence research company whose mission is to build safe and friendly AIs. Their focus is creating a positive long-term human impact and ensure AI's benefits are as widely and evenly distributed as possible. You can learn more about here: [About OpenAI](https://openai.com/about/) I'd like to leave a comment regarding the use of a *utility function* as the exclusive guideline to what a machine will do. To me this seems to be compatible only to a strictly utilitarian world view, and I feel as though the all-consuming general AI is just Nozick's [utility monster](https://en.wikipedia.org/wiki/Utility_monster) thought experiment made real. Which is to say that a utility function is probably not the ideal way to build *general AI* (although it works very well for specific AI applications, such as classification and learning algorithms). Human-level AIs might not share human values. The risks associated with passing control to human-level AIs are aggravated by the two following factors: - **Computing Overhang:** Assume that computing power continues to increase according to Moore's Law and it takes us more time than expected to create human-level AIs. When we finally develop the human-level AIs there could be an excess computing power available that would allow AIs quickly replicate themselves and thus surpass the human population very abruptly. - **Recursive Self-improvement:** Assume that AIs have goals to preserve themselves and self-improve. When AIs start replicating and improving themselves their intelligence level could quickly surpass human-level intellingence. Their level of intelligence could be so far superior to ours that it would be impossible for us to negotiate with them as it is impossible for chimpanzees to negotiate with humans. A superintelligent AIs will want to pursue their own goals independently of humans. I think that the formulation of this risk is often made a bit too strong. Yes, the risk exists, but the development and deployment of a general AI would not automatically imply all of this. That assumes that several well-known challenges and limitations in distributed systems are solved by general AI. Assuming that such general AI is deployed, there is no guarantee that a population of (small) independent AIs is as "good" as a single super-intelligent and highly distributed AI (running everywhere at the same time, ala Skynet). A highly distributed AI would need to deal with limitations such as the [CAP Theorem](https://en.wikipedia.org/wiki/CAP_theorem), while a population of AIs would need to deal with potentially malicious instances of the "AI population" (formally speaking, they'd need things like [Byzantine Fault Tolerance](https://en.wikipedia.org/wiki/Byzantine_fault_tolerance)). I'm curious to know whether there is any conceptual work in this direction (what are the fundamental computational limitations of general AI?). Is there actual research happening in this area? I'd be very interested to read more about that.

Comments

Products

Project