ArticlesBlog

Concrete Problems in AI Safety (Paper) – Computerphile

Concrete Problems in AI Safety (Paper) – Computerphile


Today I thought I’d talk about a paper fairly recent. It was last year A paper called “Concrete Problems in AI Safety” Which is going to be related to the stuff I was talking about before with the “Stop Button”. It’s got a bunch of authors; mostly from Google Brain Google’s AI research department, I guess.. Well a lot of it’s AI research, but
specifically Google Brain and some people from Stanford and Berkeley and
opening iEARN. Whatever… it’s a collaboration between a lot of different
authors The idea of the paper is trying to lay out a set
of problems that we are able to currently make progress on like if we’re
concerned about this far-off sort of super intelligence stuff.. Sure; it seems
important and it’s interesting and difficult and whatever, but it’s quite
difficult to sit down and actually do anything about it because we don’t know
very much about what a super intelligence would be like or how it
would be implemented or whatever…. The idea of this paper is that it… It lays
out some problems that we can tackle now which will be helpful now and that I
think will be helpful later on as well with more advanced AI systems and making
them safe as well. It lists five problems: Avoiding negative side effects, which is
quite closely related to the stuff we’ve been talking about before with the stop
button or the stamp collector. A lot of the problems with that can be framed as
negative side effects. They do the thing you ask them to but in the process of
doing that they do a lot of things but you don’t want them to. These are like
the robot running over the baby right? Yeah, anything where it does the thing
you wanted it to, like it makes you the cup of tea or it collects you stamps or
whatever, but in the process of doing that, it also does things you don’t want
it to do. So those are your negative side effects. So that’s the first of the
research areas is how do we avoid these negative side effects.. Then there’s
avoiding reward hacking, which is about systems gaming their reward function. Doing something which technically counts but isn’t really what you intended the
reward function to be. There’s a lot of different ways that that can manifest
but this is like this is already a common problem in machine learning
systems where you come up with your evaluation function or your reward
function or whatever your objective function and the system very carefully
optimizes to exactly what you wrote and then you realize what you wrote isn’t
what you meant. Scalable oversight is the next one. It’s a problem that human
beings have all the time, anytime you’ve started a new job. You don’t know what to
do and you have someone who does who’s supervising you. The question is what questions do you ask and how many questions do you ask
because current machine learning systems can learn pretty well if you give them a
million examples but you don’t want your robot to ask you a million questions, you
know. You want it to only ask a few questions and use that information
efficiently to learn from you. Safe exploration is the next one which is
about, well, about safely exploring the range of possible actions. So, you will
want the system to experiment, you know, try different things, try out different
approaches. That’s the only way it’s going to find what’s going to work but
there are some things that you don’t want it to try even once like the baby.
Right, right.. Yeah you don’t want it to say “What happens if I run over this
baby?” Do you want certain possible things that it might consider trying to
actually not try at all because you can’t afford to have them happen even
once in the real world. Like a thermonuclear war option; What happens if
I do this? You don’t want it to try that. Is that the sort of thing that.. Yeah, yeah..
I’m thinking of war games.. Yes, yeah.. yeah. Global Thermal Nuclear War . It runs through a
simulation of every possible type of nuclear war, right? But it does it in
simulation. You want your system not to run through every possible type of
thermonuclear war in real life to find out it doesn’t work cause you can’t.. It’s
too unsafe to do that even once. The last area to look into is robustness to
distributional shift. Yeah It’s a complicated term but the
concept is not. It’s just that the situation can change over time. So you
may end up; you may make something. You train it; it performs well and then
things change to be different from the training scenario and that is inherently
very difficult. It’s something humans struggle with. You
find yourself in a situation you’ve never been in before
but the difference I think or one of the useful things that humans do is, notice
that there’s a problem a lot of current machine learning systems. If
something changes underneath them and their training is no longer useful
they have no way of knowing that. So they continue being just as confident in
their answers that now make no sense because they haven’t noticed
that there’s a change. So.. if we can’t make systems that can just react to
completely unforeseen circumstances, we may be able to make systems that at
least can recognize that they’re in unforeseen circumstances and ask for
help and then maybe we have a scalable supervision situation there where they
recognize the problem and that’s when they ask for help. I suppose a simplified
simplistic example of this is when you have an out-of-date satnav and it doesn’t seem
to realize that you happen to be doing 70 miles an hour over a plowed field because somebody else, you know, built a road there. Yeah, exactly. The general
tendency of unless you program them specifically not to; to just plow on with
what they think they should be doing. Yeah. It can cause problems and in a large
scale heavily depended on , you know , in this case, it’s your sat-nav. So it’s not
too big of a deal because it’s not actually driving the car and you know
what’s wrong and you can ignore it As AI systems become more
important and more integrated into everything, that kind of thing, can become
a real problem. Although, you would hope the car
doesn’t take you in plowed field in first place. Yeah. Is it an open paper or does it leave us with any answers? Yeah. So the way it does all of these
is it gives a quick outline of what the problem is. The example they usually use
is a cleaning robot like we’ve made this. We’ve made a robot it’s in an office or
something and it’s cleaning up and then they sort of framed the different
problems those things that could go wrong in that scenario. So it’s pretty
similar to they get me a cup of tea and don’t run over the baby type set up. It’s
clean the office and, you know, not knock anything over or destroy anything. And
then, for each one, the paper talks about possible approaches to each problem and things we can work on, basically. Things
that we don’t know how to do yet but which seem like they might be doable in
a year or two and some careful thought This paper. Is this one for people to read? Yeah, really good. It doesn’t cover anything like the range of the problems
in AI safety but of the problems specifically about avoiding accidents,
because all of these are these are ways of creating possible
accidents, right? Possible causes of accidents. There’s all kinds of other
problems you’ve been having in AI that don’t fall under accidents but within
that area I think it covers everything and it’s quite readable. It’s quite… It doesn’t
require really high-level because it’s an overview paper, doesn’t require
high-level AI understanding for the most part. Anyone can read it and it’s on
archive so you know it’s freely available. These guys now working on AI
safety, or did this then They’ve hung their hat up. They’ve
written a paper and they’re hoping someone else is gonna sort it all out. These people are working on AI safety right now but they’re not the
only people. This paper was released in summer of 2016, so it’s been about a year
since it came out and since then there have been more advances and some of the
problems posed have had really interesting solutions or well.. Not
solutions, early work, that looks like it could become a solution or approaches
new interesting ideas about ways to tackle these problems. So I think as a
paper, it’s already been successful in stirring new research and giving people
a focus to build their AI safety research on top of. So we just need to watch this space, right? Yeah, exactly..

Comments (100)

  1. So….. how far away are we from "true" AI?

  2. I'm not sure if the camera person intentionally shifted focus to the bookshelf periodically of if it's an autofocus thing but it's interesting to see what's there.

  3. Paul Christiano??? Ohhh, from UC Berkeley. He shares the same name as the award-winning, prodigal dancer of Chicago, Illinois. That one suicided primarily due to the scorn he dealt with for being sexually attracted to children. He never had sexual contact with children, and he vowed that he never would. Nonetheless, people knew he had that attraction, and it had unbearable consequences for him. It's quite tragic because he had a lot to offer society in arts and education.

  4. Rob Miles' brony confirmed

  5. Bit of focal point issue on this one. Not that it's a big deal on a video liked this

  6. 5:40 reminds me of the office lol. Dwight the machine knows. No, there's no road here, this is the lake, this is the lake!!!!

  7. The concept that there exists AI safety researchers seems interesting. What do said researchers do? Do they just sit around all day like philosophers thinking about hypothetical situations? Or are they just normal AI researchers that happen to also dabble in the safety aspect?

  8. How are these radically different from human ethical questions? If a human wouldn't do something, then AI shouldn't either.

  9. 'Not the Robots' also starts with AI cleaning. It doesn't end well…

  10. the camera is not in focus on the speaker. So utterly…. unprofessional. Come on people! You are computerphile!

  11. Nice Ikea shelves

  12. I see fluttershy.

    I like.

    yay.

  13. Dat framed fluttershy sketch. <3

  14. I'm watching this while my iRobot Braava is cleaning the floor…

  15. Why don't they just create an AI to solve these problems?

  16. Yes, let's not focus on the speaker but let's focus on the my little pony picture in the background

  17. one possible ai problem: autofocus 😉

  18. ~6:00 instead of satnav i would compare it to selfdriving cars and not with a newly built road, but a desrtroyed one. granted, selfdriving cars do not navigate via satellite, but if they were they were, the car would have to notice there is no longer a road there and find a different path.

  19. … because we have obviously solved these problems for people and corporations. Reward hacking seems quite common among people in business, law, and politics already.

  20. Rob Miles has the most fascinating insight, especially on AI. I do want to hear him talk about some of the books in the
    'parapsychology woo' section behind him.

  21. Finally talk of AI safety without all the alarmism.

  22. the videos linked at the very end have the wrong title text below them

  23. I like that he said, that testing every possible thermonuclear war is unsafe 😀

  24. Is that a wood burner near a bookshelf full of books?

  25. There IS a windows book on his shelf!

  26. There is a my little pony picture in the background. Just wanted to point that out for no reason.

  27. Why does a robot or an AI need a reward system??? It's a machine, not an intelligence that has needs, and if it does then whoever designed it needs to be sacked.

  28. Sir, you are fuzzy.

  29. Camera work is sloppy in this one… White balance is not fixed prior to interview, and for the long section of the camera was focused on Fluttershy pony in the background… She's judging me… Oh, no, not that STARE!!!

  30. Pony spotted next to his left shoulder.

  31. That's one very focused book shelf

  32. Funny how a smart person says "Summer of 2016".

  33. Just a little feedback: I feel like things get slightly blurry at 4-5 minutes or so (maybe more but I'm lazy). It seems the camera really likes your bookshelf! Thanks for the video as always.

  34. Keep Summer safe

  35. Anyone thought it was Justin Timberlake?

  36. When I saw the humbnail on my wall, with Rob and the title "AI" I went Aaaaaaah finally.
    Couldn't wait for a video like this again

  37. google, stanford and berkeley and… whatever

  38. I'm a simple man. I see Rob Miles, I watch the video.

  39. Another concrète problem in AI is that it might decide to make "music" by banging together everyday objects.

  40. Is that a hikki on his neck

  41. Do more videos about papers

  42. Background Fluttershy!

  43. A lot of these problems are just the same as problems that occur in managing people. Especially gaming the system.

  44. manual focus would have been better. For the rest, interesting.

  45. More of this guy plz ^^

  46. the reward hacking issue sounds a lot like what leprechauns and genies do; they do "exactly what you said", but in such a way that it screws you over or at least not what you really wanted.

    I hereby suggest we call it the "Leprechaun Behavior"

  47. Has anybody else noticed the huge stack of Jeremy Clarkson books on the bottom right?

  48. 04:28 – I spy with my little eye a little pony

  49. What if your AI controlled vacuum cleaner learns that humans are the major source of dust?

  50. I think "unsafe" may just slightly understate the problems with nuclear war.

  51. -1 because

    https://youtu.be/AjyM-f8rDpg?t=5m4s bs starts in 3…2…1…go … why shouldn't we be able to build systems which can notice these things? Certainly current BS ML systems fail because the scientics are ignorant of some self properties which are required for self reflection and self awareness…it's typical mainstream AI.

    Another reason for the downvote is the terrible lighting in this video.

  52. At least I can read all the book titles on the shelves!

  53. Read the paper, pretty good stuff

  54. This dude rocks.

  55. For some reason I was randomly unsubscribed from this channel. Youtube, why you do this?

  56. What is Rob is himself a super-intelligent AI and his goal is to prevent future AI's from getting out of hand? 🤔

  57. I could listen to this guy talk all day. I just find the things he talks about fascinating, the way he delivers it is very relatable too. 🙂

  58. That moment you realize that Fluttershy is on the bookcase.

  59. Just make a machine that can give a million answers to the robot's million question. Let them talk when you first make a robot and done !

  60. Robs computerphile videos are always my favorite.

  61. Those Linux tomes look real sharp in 4K =P

  62. What's wrong with stepping on the baby though? Some animal species have evolved to snack on their young and they're doing fine!

  63. re "gaming the reward function// common problem in machine learning": Yeah, it's a common problem with regular squishy humans too.

  64. Where does this appalling insistence on things being "safe" come from? Safe is the opposite of interesting – it's precisely facing and exploring the unknown – the unsafe that gives meaning to our existence.
    About the step-on-a-baby issue – how many families have cats or dogs at home? They are absolutely not safe – yet no-one is objecting to that. And the children themselves are anything but safe – they have an incredible capacity to wreak havoc. Are you suggesting that all human life should be forced into a carefully controlled, unchanging mold where any divergent behavior is instantly killed – all in the name of safety.

  65. Is that a hickey on the right side of his neck?

  66. >AI Safety isn't just Rob Miles' hobby horse,
    No, but Worst Pony is.

  67. So it's unsafe to go even once through all the possible types of nuclear war in the real world… Didn't know that.

  68. what if someone purposely wrote an AI to destroy humanity? Seems plausible and terrifying.

  69. Why not just use a camera stand or something? This will definitively help improving the quality of your videos (shakes, lack of stability, focus…).
    Great content though!

  70. Do you have a bunch of Jeremy Clarkson books filed under History?

  71. He goes out of focus but it still looks good

  72. ask Martha and Jonathan Kent… they raised superman =D

  73. Program some laziness into it. I want a Bender, not a HAL.

  74. Can we have more videos like this introducing open research problems/topics?

  75. Is it just me or has he acquired a hickie on his neck after the cut scene at 3:42 ?

  76. Huh, there's a Fluttershy in the background. Neat, wouldn't have expected that.

  77. I'm writing a novel in the comments lol

  78. I love the strange mix of books, dvds, wood stove, childrens toys, etc. in the background, but maybe focus on the guy speaking?

  79. Does he have hickie?

  80. I want an AI with access to youtube to generate content and count views, subs, likes etc. as "reward".

  81. Look at the Book categorys in the background (whooo) 🙂

  82. Jaysus, grandpa. Get on with it.

  83. "Gaming the reward function."
    Humans do this all the time. Addictive drugs, candy, and self-pleasure are all these.

  84. One simple way to make AI behave somewhat like people is to make its training data consist of human behavior. Even very simple neural networks will begin mimicking the general way humans act, to the extent its behavioral complexity allows.

  85. The only way i can think of getting an A.I to be beside us is not containing it and trying to use it to our cause, but to install some kind of compassion emotion into the machine, it is simply too smart and you would be crazy nieve to think we will be able to use this at our will….

  86. Spot the pony, whoever spotted it earliest wins.
    5:03

  87. The most importent thing seems to me:

    Dont let the the AI dont actually do anything which is dangerous. If you got a robot which should should get you a cup of tea: Dont give him the power to do anything which is dangerous. It is not necessary to give the robot a way to do this.

    Just build him with a engine which does not have the force to damage anything. This robot does not need to have the capapilities to do this. This robot gets only the computing power which is needed to get you a cup of tea in your room. There is no reason to give him a engine which is capable to do anything else or more computing power to do just this.

    And if you use AI to use for war, its the same. Why build a "skynet" with control over everything? There is no reason to do so. Build a AI for a UAV. This AI can control this single UAV but does not have the computing power to do anything else.

    In reality, nobody will implent a Skynet-like network capable to start a doomsday-device. Why should this guy/group do this? Anyone who is able to start a doomsday-device does not have the wish to delegate this power to a machine which he does not understand and make himself powerless.

  88. You realise these are all problems SOCIETY hasnt solved yet, and here we are a group of narrow minded AI techs not learning from the persistent the big red flags. You are obsessed with continuing down this money and time sink which is hilarious because you are trapped within all of these points yourselves. YOU DONT EVEN NOTICE THAT THE PROBLEMS THAT ARE THERE AND ARE STILL CONFIDENT IN YOUR OWN HUBRIS

  89. 5:00 makes me think about the movies where the machine/computer will ask an annoying question or give a response over and over again, until it eerily shuts down 💀💀💀😰😰😱😱🤫

  90. Is that a stove in the background? Where no stove should be

  91. Radical differential or random differential are why humans practice games and try to exploit patterns in them. Very important observation.

  92. once the brick walls known, it becomes a concrete problem, but these noone knows they are there for sure because they havent thought enough.

  93. Build one AI whose positive feed back is stopping AI from being bad. Put them in the same room together when doing experiments. Profit

  94. one action to do is a simple do multi actions to reach the last… when the robot do the cup of tea isn't doing only the last action, but a sequence of actions, at the same, wen robot is walking to reach a point is not doing only one action, but a series of step and when it run in to a baby the IA simply react at the baby and prevents impact, becaus is in the walking action informations, like an automatic-driving car prevents collisions with pedestrians or others veicles.
    the IA have to learn what it have to do for reach the goal and how to split and order the sequences of action thet it have to do, should not to be a single action "do the tea" thet contains al the steps, but only the goal and the IA have to make itself the question "what have i to do for doing tea among the actions thet i have learn and in what order?" and eventually we have to teach it the sequence.
    walking, open the box, take the tea, take the kettle, fill it with wather, etc…
    and the problem is, the developers want thet a "new" IA brain is non like a baby's brain and don't doing like a baby?!
    then they suppose thet an IA should have the prediction of what it hav to do without it have learning, unlike of natural intelligence?

  95. Ahhh! 3d studio max 2 book in the background <3

  96. One problem they should definitely add to the paper that currently is a problem is the amount of power given to an ai and how a human might take advantage of that power.

  97. I read the paper. Those problems are based on a projection of AI. They are not actually problems for AGI. Waste of a video.

  98. Thermonuclear war: Not even once.

  99. 4:05 How to stop your AI voting for Donald Trump.

  100. "Never try killing the baby" I think we have to teach the AI common sense. Which can be adapted. And common sense is allways negative so positive assumption about killing the baby will never occur and common sense will be updated and never complete. That's what i observed about my common sense.

Comment here