Oxford University Press's
Academic Insights for the Thinking World

What might superintelligences value?

If there were superintelligent beings – creatures as far above the smartest human as that person is above a worm – what would they value? And what would they think of us? Would they treasure, tolerate, ignore, or eradicate us? These are perennial questions in Western thought, where the superior beings are Gods, angels, or perhaps extraterrestrials. The possibility of artificial intelligence gives these old questions a new life. What will our digital descendents think of us? Can we engineer human-friendly artificial life? Can we predict the motivations or values of superintelligence? And what can philosophers contribute to these questions?

Philosophical debate about the relationship between beliefs and values is dominated by two eighteenth-century thinkers: David Hume and Immanuel Kant. Humeans draw a sharp divide between belief and desire. Beliefs are designed to fit the world: if the world doesn’t fit my beliefs, I should change my beliefs. Otherwise, I am irrational. But desires are beyond rational criticism. If the world doesn’t fit my desires, I should change the world! As Hume colourfully put it: “Reason is, and ought only to be, the slave of the passions.” Learning more about the world may tell you how to satisfy your desires, but it shouldn’t change them.

Kantians disagree. They argue that some things are worth wanting or doing, while others are not. Gaining extra knowledge should change your desires, and the desires of rational beings will ultimately converge.

Popular debate about superintelligence sides with Hume. Superintelligent agents can have any arbitrary desires, no matter how much knowledge they acquire. Indeed, one popular rational requirement is an unwillingness to change your desires. (If you allow your motivations to change, then your future self won’t pursue your present goals.)

Humeanism about superintelligence seems reasonable. Possible artificial minds are much more diverse than actual human ones. Superintelligence could emerge from programmes designed to make paperclips, fight wars, exploit stock markets, prove mathematical theorems, win chess games, or empathise with lonely humans. Isn’t it ridiculous to expect all these radically different beings to want the same things? Even if Kantian converge exists, it only applies to rational agents with conscious goals, plans, values. But a myopic machine could be superintelligent in practice (i.e., superefficient at manipulating its environment to satisfy its desires) without any rational sophistication.

Future of the Earth by Fsgregs. CC BY-SA 3.0 via Wikimedia Commons.
Future of the Earth by Fsgregs. CC BY-SA 3.0 via Wikimedia Commons.

However, the debate between Hume and Kant does matter in one very important domain. Suppose we want to engineer future superintelligences that are reliably friendly to humans. In a world dominated by superintelligences, our fate depends on their attitude to us. Can we ensure they look after us? Superintelligences can only reliably benefit creatures like us if they understand – from the inside – what it is like to have goals that can be thwarted or advanced, experiences that can be pleasant or distressing, achievements that can be shallow or deep, relationships that can go well or badly. Reliably friendly superintelligences must be rational agents. But then, if Kant is right, their values will converge.

Here is one possible route to convergence. Even Humeans agree that every rational agent wants knowledge. Whatever your goals, you will pursue them more efficiently if you understand the world you inhabit. And no superintelligence will be satisfied with our shallow human understanding. In their quest to truly understand the universe, superintelligences might discover a God, a cosmic purpose, a reason why the universe exists, or some other source of transcendent values. Until we understand the universe ourselves, we can’t be confident that it can be truly understood without positing these things. Perhaps this cosmic knowledge reliably transforms every rational agent’s desires – leading it to freely abandon its previous inclinations to embrace the cosmic purpose. (Artificial agents, who can reprogramme themselves, might undergo a much more wholesale motivational change than humans.)

It is tempting to think that, even if Kantianism and theism are true, they can safely be ignored. If superintelligences converge on divine purpose or correct values, surely they will be friendly to us? We should presuppose Humeanism, not because it is necessarily correct, but because that is where the danger lies.

This optimism is illicitly anthropocentric. We might expect humans to converge on human-friendly values, and to worship a God who cares for us. But we cannot attribute our values to superintelligences, nor to the God they might discover. Perhaps superintelligences will posit a God to whom we are simply irrelevant. Superintelligences designed to be friendly to humans may one day realise that we do not matter at all – reluctantly setting us aside to pursue some higher purpose.

This raises an especially worrying prospect. We naturally think that we should try to engineer reliably friendly superintelligence. Even if we fail, we won’t do any harm. But if superintelligent rationality leads to non-human values, then we might end up worse-off than if we had left superintelligence to its own devices and desires.

Featured image credit: Connection, by geralt. Public domain via Pixabay.

Recent Comments

  1. Joe

    I am no historian of philosophy, but the Hume and Kant exegesis here is quite poor. Hume did not say that knowledge of facts can’t or shouldn’t change your desires: if you want a glass of gin and learn it’s gasoline, you drop your desire to drink what’s in the glass.

    “Kantians disagree. They argue that some things are worth wanting or doing, while others are not.”

    I am at a loss to see how any Kantian believes this. For Kantians, commitment to the categorical imperative is not supposed to be an ordinary desire, but an inescapable presupposition of agency itself. But it’s a purely formal principle, it doesn’t direct you towards any particular end. Moreover, Kant suggests you have the most moral worth when you do the right thing in SPITE of your still having the desire to do the wrong thing.

Comments are closed.