

Human Compatible: Artificial Intelligence and the Problem of Control [Russell, Stuart] on desertcart.com. *FREE* shipping on qualifying offers. Human Compatible: Artificial Intelligence and the Problem of Control Review: An important book, everyone should read it - Stuart Russell's new book, Human Compatible: Artificial Intelligence and the Problem of Control (HC2019), is great and everyone should read it. And I am proud that the ideas in my AGI-12 paper, Avoiding Unintended AI Behaviors (AGI2012), are very similar to ideas in HC2019. AGI2012 had its moment of glory, winning the Singularity Institute's (now called MIRI) Turing Prize for the Best AGI Safety Paper at AGI-12, but has since been largely forgotten. I see agreement with Stuart Russell as a form of vindication for my ideas. This article will explore the relation between HC2019 and AGI2012. Chapters 7 - 10 of HC2019 "suggest a new way to think about AI and to ensure that machines remain beneficial to humans, forever." Chapter 7 opens with three principles for beneficial machines, which are elaborated over Chapters 7 - 10: 1. The machine's only objective is to maximize the realization of human preferences. 2. The machine is initially uncertain about what those preferences are. 3. The ultimate source of information about human preferences is human behavior. AGI2012 defines an AI agent that is similar to Marcus Hutter's Universal AI (UAI2004). However, whereas the UAI2004 agent learns a model of its environment as a distribution of programs for a universal Turing machine, the AGI2012 agent learns a model of its environment as a single stochastic, finite-state program. The AGI2012 agent is finitely computable (assuming a finite time horizon for possible futures), although not practically computable. The ideas of AGI2012 correspond quite closely with the HC2019 principles: 1. The objective of the AGI2012 agent is to maximize human preferences as expressed by a sum of modeled utility values for each human (utility functions are a way to express preferences, as long as the set of preferences is complete and transitive). These modeled utility values are not static. Rather, the AGI2012 agent relearns its environment model and its models for human utility values periodically, perhaps at each time step. 2. The AGI2012 agent knows nothing about human preferences until it learns an environment model, so AGI2012 proposes a "two-stage agent architecture." The first stage agent learns an environment model but does not act in the world. The second stage agent, which acts in the world, takes over from the first stage agent only after it has learned a model for the preferences of each human. 3. The AGI2012 agent learns its environment model, including its models for human preferences, from its interactions with its environment, which include its interactions with humans. Subject to the length limits for AGI-12 papers, AGI2012 is terse. My on-line book, Ethical Artificial Intelligence (EAI2014), combines some of my papers into a (hopefully) coherent and expanded narrative. Chapter 7 of EAI2014 provides an expanded narrative for AGI2012. On page 178, HC2019 says, "In principle, the machine can learn billions of different predictive preference models, one for each of the billions of people on Earth." The AGI2012 agent does this, in principle. On pages 26, 173 and 237, HC2019 suggests that humans could watch movies of possible future lives and express their preferences. The AGI2012 agent connects models of current humans to interactive visualizations of possible futures (see Figure 7.4 in EAI2014) and asks the modeled humans to assign utility values to those futures (a weakness of AGI2012 is that it did not reference research on inverse reinforcement learning algorithms). As an author of Interactivity is the Key (VIS1989) I prefer interactive visualizations to movies. As HC2019 and AGI2012 both acknowledge, there are difficult issues for expressing human preferences as utility values and combining utility values for different humans. AGI2012 argues that constraining utility values to the fixed range [0.0, 1.0] provides a sort of normalization. Regarding the issues of the tyranny of the majority and evil human intentions, AGI2012 proposes applying a function with positive first derivative and negative second derivative to utility values to give the AI agent greater total utility for actions that help more dissatisfied humans (justified in Section 7.5 of EAI2014 on the basis of Rawl's Theory of Justice). This is a hack but there seem to be no good theoretical answers for human utility values. HC2019 and AGI2012 both address the issue of the agent changing the size of the human population. On page 201, HC2019 says, "Always allocate some probability, however small, to preferences that are logically possible." The AGI2012 agent does this using Bayesian logic. On page 245, HC2019 warns against the temptation to use the power of AI to engineer the preferences of humans. I wholeheartedly agree, as reflected in my recent writings and talks. Given an AI agent that acts to create futures valued by (models of) current humans, it is an interesting question how current humans would value futures in which their values are changed. On pages 254-256, HC2019 warns of possible futures in which humans are so reliant on AI that they become enfeebled. Again, it is an interesting question how current humans would value futures in which they must overcome challenges versus futures in which they face no challenges. On page 252, HC2019 says, "Regulation of any kind is strenuously opposed in the [Silicon] Valley," and on page 249 it says that "three hundred separate efforts to develop ethical principles for AI" have been identified. I believe one goal of these AI ethics efforts is to substitute voluntary for mandatory standards. Humanity needs mandatory standards. Most importantly, humanity needs developers to be transparent about how their AI systems work and what they are used for. (VIS1989) Hibbard, W., and Santek, D., 1989. Interactivity is the Key. Proc. Chapel Hill Workshop on Volume Visualization, pp. 39-43. (AGI2012) Hibbard, B. 2012. Avoiding unintended AI behaviors. In: Bach, J., and Ikle', M. (eds) AGI 2012. LNCS (LNAI), vol. 7716, pp. 107-116. Springer. (EAI2014) Hibbard, B. 2014. Ethical Artificial Intelligence. arXiv:1411.1373. (UAI2004) Hutter, M. 2004. Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability. Springer. (HC2019) Russell, S. 2019. Human Compatible: Artificial Intelligence and the Problem of Control. Viking. Review: AI does not only serve! - AI poses societal problems that are not digital. The books brings them out and discusses them very readably.



| Best Sellers Rank | #31,796 in Books ( See Top 100 in Books ) #9 in Robotics & Automation (Books) #33 in Computers & Technology Industry #50 in Artificial Intelligence & Semantics |
| Customer Reviews | 4.5 out of 5 stars 843 Reviews |
W**D
An important book, everyone should read it
Stuart Russell's new book, Human Compatible: Artificial Intelligence and the Problem of Control (HC2019), is great and everyone should read it. And I am proud that the ideas in my AGI-12 paper, Avoiding Unintended AI Behaviors (AGI2012), are very similar to ideas in HC2019. AGI2012 had its moment of glory, winning the Singularity Institute's (now called MIRI) Turing Prize for the Best AGI Safety Paper at AGI-12, but has since been largely forgotten. I see agreement with Stuart Russell as a form of vindication for my ideas. This article will explore the relation between HC2019 and AGI2012. Chapters 7 - 10 of HC2019 "suggest a new way to think about AI and to ensure that machines remain beneficial to humans, forever." Chapter 7 opens with three principles for beneficial machines, which are elaborated over Chapters 7 - 10: 1. The machine's only objective is to maximize the realization of human preferences. 2. The machine is initially uncertain about what those preferences are. 3. The ultimate source of information about human preferences is human behavior. AGI2012 defines an AI agent that is similar to Marcus Hutter's Universal AI (UAI2004). However, whereas the UAI2004 agent learns a model of its environment as a distribution of programs for a universal Turing machine, the AGI2012 agent learns a model of its environment as a single stochastic, finite-state program. The AGI2012 agent is finitely computable (assuming a finite time horizon for possible futures), although not practically computable. The ideas of AGI2012 correspond quite closely with the HC2019 principles: 1. The objective of the AGI2012 agent is to maximize human preferences as expressed by a sum of modeled utility values for each human (utility functions are a way to express preferences, as long as the set of preferences is complete and transitive). These modeled utility values are not static. Rather, the AGI2012 agent relearns its environment model and its models for human utility values periodically, perhaps at each time step. 2. The AGI2012 agent knows nothing about human preferences until it learns an environment model, so AGI2012 proposes a "two-stage agent architecture." The first stage agent learns an environment model but does not act in the world. The second stage agent, which acts in the world, takes over from the first stage agent only after it has learned a model for the preferences of each human. 3. The AGI2012 agent learns its environment model, including its models for human preferences, from its interactions with its environment, which include its interactions with humans. Subject to the length limits for AGI-12 papers, AGI2012 is terse. My on-line book, Ethical Artificial Intelligence (EAI2014), combines some of my papers into a (hopefully) coherent and expanded narrative. Chapter 7 of EAI2014 provides an expanded narrative for AGI2012. On page 178, HC2019 says, "In principle, the machine can learn billions of different predictive preference models, one for each of the billions of people on Earth." The AGI2012 agent does this, in principle. On pages 26, 173 and 237, HC2019 suggests that humans could watch movies of possible future lives and express their preferences. The AGI2012 agent connects models of current humans to interactive visualizations of possible futures (see Figure 7.4 in EAI2014) and asks the modeled humans to assign utility values to those futures (a weakness of AGI2012 is that it did not reference research on inverse reinforcement learning algorithms). As an author of Interactivity is the Key (VIS1989) I prefer interactive visualizations to movies. As HC2019 and AGI2012 both acknowledge, there are difficult issues for expressing human preferences as utility values and combining utility values for different humans. AGI2012 argues that constraining utility values to the fixed range [0.0, 1.0] provides a sort of normalization. Regarding the issues of the tyranny of the majority and evil human intentions, AGI2012 proposes applying a function with positive first derivative and negative second derivative to utility values to give the AI agent greater total utility for actions that help more dissatisfied humans (justified in Section 7.5 of EAI2014 on the basis of Rawl's Theory of Justice). This is a hack but there seem to be no good theoretical answers for human utility values. HC2019 and AGI2012 both address the issue of the agent changing the size of the human population. On page 201, HC2019 says, "Always allocate some probability, however small, to preferences that are logically possible." The AGI2012 agent does this using Bayesian logic. On page 245, HC2019 warns against the temptation to use the power of AI to engineer the preferences of humans. I wholeheartedly agree, as reflected in my recent writings and talks. Given an AI agent that acts to create futures valued by (models of) current humans, it is an interesting question how current humans would value futures in which their values are changed. On pages 254-256, HC2019 warns of possible futures in which humans are so reliant on AI that they become enfeebled. Again, it is an interesting question how current humans would value futures in which they must overcome challenges versus futures in which they face no challenges. On page 252, HC2019 says, "Regulation of any kind is strenuously opposed in the [Silicon] Valley," and on page 249 it says that "three hundred separate efforts to develop ethical principles for AI" have been identified. I believe one goal of these AI ethics efforts is to substitute voluntary for mandatory standards. Humanity needs mandatory standards. Most importantly, humanity needs developers to be transparent about how their AI systems work and what they are used for. (VIS1989) Hibbard, W., and Santek, D., 1989. Interactivity is the Key. Proc. Chapel Hill Workshop on Volume Visualization, pp. 39-43. (AGI2012) Hibbard, B. 2012. Avoiding unintended AI behaviors. In: Bach, J., and Ikle', M. (eds) AGI 2012. LNCS (LNAI), vol. 7716, pp. 107-116. Springer. (EAI2014) Hibbard, B. 2014. Ethical Artificial Intelligence. arXiv:1411.1373. (UAI2004) Hutter, M. 2004. Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability. Springer. (HC2019) Russell, S. 2019. Human Compatible: Artificial Intelligence and the Problem of Control. Viking.
P**S
AI does not only serve!
AI poses societal problems that are not digital. The books brings them out and discusses them very readably.
D**T
This is a "read thrice" book...
Stuart Russel's Human Compatible is a primer for interested readers who want (need) to know what current AI means now and in the future. While writing a tale of a mad artificial person, I discovered Dr. Stuart's book. It is informative, giving the reader the vocabulary for understanding what current AI is and what the important issues faced by AI researchers, ethicists and consumers. Read it, take notes and then read it again. RGP
T**M
Approachable and insightful
Very well written. I like how the author presents a complex topic in an approachable way. I’d love to see a follow up with the growth in generative AI recently. Does that change Russell’s thoughts on timing of advancements?
G**G
An insightful book
This is a great book. Although it is written by an AI expert, there is not much computer jargon and it is relative easy to understand. Too bad the AI companies did not take his advice and I am not sure if it is still possible to build AI that is human compatible.
T**N
Questionable Extrapolations
Professor Russell’s book starts out with an entertaining journey through the history of AI and automation, as well as cautionary thinking about them. This discussion is well informed - he is a renown AI academic and co-author of a comprehensive and widely used AI textbook. Having provided historical background, the remainder of the book argues two main points: (1) the current approach to AI development is having dangerous side-effects, and it could get much worse; and (2) what we need to do is build AIs that can learn to satisfy human preferences. Concerning the dangers of AI, the author first addresses current perils: misuse of surveillance, persuasion, and control; lethal autonomous weapons; eliminating work as we know it; and usurping other human roles. I found this part of the book an informative and well-reasoned analysis. Beyond AI’s current perils, the author next addresses the possibility of AIs acquiring superhuman intelligence and eventually ruling and perhaps exterminating humankind. The author believes this is a definite possibility, placing him in basic agreement with works such as Bostrom’s Superintelligence and Tegmark’s Life 3.0. AI’s existential threat is the subject of continuing debate in the AI community, and Russell attempts to refute the arguments made against his position. Russell bases his case for AI’s existential threat on two basic premises. The first is that in spite of all the scientific breakthroughs required to initiate superintelligence (well documented by Russell), you cannot rule out humans achieving these breakthroughs. While I appreciate this respect for science and engineering, clearly some human achievements are more within reach than others. Humans understanding human intelligence, let alone creating human-level machine intelligence, seems to me too distant to speculate about except in science fiction. Russell’s second premise is that unless we change course, superintelligence will be achieved using what he calls the standard model, which creates AIs by optimizing them to meet explicit objectives. This would pose a threat to humanity, because a powerful intellect pursuing explicitly defined objectives can easily spell trouble, for example if an AI decides to fix global warming by killing all the people. I don't follow this reasoning. I find it contradictory that an AI would somehow be both super intelligent and bound by fixed concrete objectives. In fact in the last part of the book, Russell goes to great pains to illustrate how human behavior, and presumably human-level intelligence, is far more complicated than sequences of explicit objectives. In the last part of the book Russell advocates developing provably beneficial AI, a new approach that would build AIs that learn to satisfy human preferences instead of optimizing explicit objectives. While I can see how this would be an improvement over homicidal overlords, I don’t think Russell makes the case that this approach would be even remotely feasible. To point out how we might grapple with provably beneficial AI he spends a good deal of time reviewing mathematical frameworks that address human behavior, such as utility theory and game theory, giving very elementary examples of their application. I believe these examples are intended to make this math accessible to a general audience, which I applaud. However what they mainly illustrate is how much more complicated real life is, compared to these trivial examples. Perhaps this is another illustration of Russell’s faith that human ingenuity can reach almost any goal, as long as it knows where to start. Like scaling up a two-person game to billions of interacting people. I was very pleased to read Russell’s perspective on the future of AI. He is immersed in the game, and he is definitely worth listening to. However, I have real difficulty following his extrapolations from where we are today to either superintelligence or provably beneficial AI.
C**E
How to save the world: a modern approach
There are not many books I can accurately describe as existentially terrifying. Human Compatible, however, is one of them. This is the type of book that, unless you are able to live with an incredible amount of cognitive dissonance, will radically upset your view of the future. The reason this book is so terrifying is it shows our current path in AI is headed straight off a cliff with sharp rocks at the bottom and the entire human race is in a car barreling towards the cliff and no one knows where the steering wheel is or how to operate the brake pedal. There are several loud passengers int he back seat trying to convince everyone in the car that the cliff is not a cliff and SOMEONE will eventually grab the wheel and therefore there's no need to panic. In this metaphor, Russell is the diligent co-pilot pointing out that yes there is indeed a cliff coming up, the brakes are not strong enough to bring the car to a halt, and someone better figure out where the damn steering wheel is and turn this car around before everyone is dashed to bits. Russell spends the first quarter of the book simply describing all the problems with the "standard model" of AI and why the people who don't think there's a problem are wrong. The most famous of these flawed proposals to control powerful AI can be summarized as "just turn the computer off". Russell points out that any simple objective given to a machine will be less likely to occur if the machine is turned off, and such machines will therefore resist attempts to turn them off in order to accomplish the objective they are given. This problem is best summed up by Russell's one line rebuttal "You can't fetch the coffee if you're dead." Russell has come up with a pretty clever solution to the problem of how to prevent powerful AI from destroying us, which is summarized in his "three principles for creating aligned AI": 1. The robot's only objective is to maximize the realization of human values. 2. The robot is initially uncertain about what those values are. 3. Human behavior provides information about human values. The second half of the book is mostly about exploring all the challenges involved in implementing these principles and all the ways they could go wrong if we're not smart about it. While this book doesn't contain a comprehensive plan for creating aligned AI, it provides the best high-level overview I've read. The neatest part of his plan is how easily other ideas for creating aligned AI can slot into it. It's not too hard to imagine how most proposals, from iterated amplification to attainable utility preservation, could be slotted into this framework to address various problems. Overall I think this book is my new go-to recommendation for anyone trying to understand why powerful AI will be a threat to all sentient life in our light cone and what we can do to prevent it from destroying everything we've ever cared about.
A**A
Accessibly written for non-computer science types.
The topic is highly relevant and very current; the writing is solid, and the contents are well-organized. However, this reader who is new to the subject of AI is finding the material dense and overwhelming. I ordered this right before the COVID19 pandemic ensnared the USA. My book arrived days before a "Societal Pause," and orders to remain at home came down. I was looking forward to curling up and reading this book. I am having a hard time staying with the material but will revisit the book, and attempt to read and review it later this summer of 2020.
Trustpilot
2 months ago
1 month ago