For the past few years the cause du jour of the effective altruism has been existential risk reduction. There are a variety of forms of the argument, but the most pervasive is:
- We will soon be able to create more and more powerful AIs
- Currently the AIs mostly just do what we tell them to do, but even that can be worrying
- Imagine that some government writes a Homeland Defense AI that has a connection to all of the country’s missiles and drones. It intelligently figures out what the most destructive attack it could conduct on an enemy regime is, and how to coordinate all of the weapons systems available. It now only takes one click of a button for a rouge actor in the state to launch a devastating attack.
- Eventually AIs will become as intelligent as humans, and then more intelligent.
- AIs will then be able to improve themselves better than we can improve them.
- Because AIs can program a lot faster than humans, once they’re able to write their own source code they will do so quite quickly
- The self-improving AIs will in fact improve at a rate that gets faster and faster as they get smarter and smarter
- Thus in a very short period of time AIs will go from human-level intelligence to something much much more powerful–a superintelligence.
- Just as we are much more powerful than Apes, this AI will be much more powerful than us. If we don’t work really hard to keep it under our control and make sure it does what we want it to do, it might quickly take over the world
- It might just destroy the world in a runaway attempt to do more and more of whatever random thing it wants to do.
- If we try to give it a utility function but mess up, it might interpret what we give it too literally, and end up turning the whole universe into something that fits what we said but not what we meant (canonically tilling the universe with paperclips), a la Midas.
- If we get really unlucky, the AI might actively create a ton of suffering in the quest to create a bunch of paperclips (or if we forget a negative sign somewhere in its utility function).
- Thus we should be really careful about developing powerful AI.
- We should make progress on AI control work–research into how to make sure AIs do exactly what you tell them to
- We should make sure that the AIs we create are tool AIs (which accomplish the tasks we tell them too) instead of agent AIs (which choose what to do on their own)
- We should figure out what utility function the AI should have, and find ways to specify it in ways a Turing machine can understand
- We should make sure to do all of these before we actually get a superintelligent AI, because once AIs start self improving we might not have much time before it exerts its well on the world
- AI superintelligence is reasonably likely to come in the next century, and so this is both important and urgent
I’m less convinced than many that x-risk is clearly the most important cause right now, but that’s not what this post is about. This post is full on contrarian arguments against EA-consensus views on AI x-risk. I’m not particularly confident in any of them, and state them without the reservation they deserve because I want to keep the frequency of “maybe” below 50% of all worlds. But together they’ve made me uncertain about how much I agree with the conventional wisdom.
Differential AI progress
The classical differential-AI-progress argument is: we want to develop AI safety before we develop AI. So we want to speed up moral and AI Safety progress, and slow down actual AI progress.
But there will be more than one group building powerful AIs. And if there really is a fast takeoff scenario–one where the first superintelligence built quickly becomes superpowerful–it might be incredibly important which AI that is.
And right now I worry that we’re creating a negative correlation between AI groups that are taking AI safety seriously, and AI groups that we’re most worried about having access to a superpowerful tool. The groups which are concerned about social welfare are investing in AI safety research and will attempt to make their AIs develop slower, less likely to become superintelligent, and less powerful if they do–overall, less likely to rapidly re-shape the world. And groups that have less concern for the greater good will be operating with many fewer safety controls, and may even have world domination and power grabbing as explicit goals of their AIs. They will probably want their AIs to grow powerful as quickly as possible, and with as few restrictions as possible. And so we’re increasing the odds that the first superintelligence comes from a nefarious source by getting all of the socially responsible parties to voluntarily limit their AI growth.
Perhaps the most important type of differential progress is between the various AI labs/governmental organizations/companies/etc. Our attempts to slow down AI progress relative to AI safety progress also slow down socially responsible AI progress relative to power-hungry AI progress.
Why So Specific?
The current thinking on AI often seems to be very specific; the outline above has a ton of assumptions packed into it. How confident are we in those?
Will the crucial AI be a superintelligence or just a really powerful program that can gain nuclear access codes, built by a terrorist organization? Will its structure look roughly like how we currently conceive of it, or be iterated on so many times that its unrecognizable? If it does look unrecognizable, do we think that our technical frameworks we’re using now will be helpful for it? What is CEV anyway and how would you implement it; what process are you using for the extrapolation, and why should we expect it to be coherent? Relatedly–how do we plan to answer questions like “should we program the AI to care about cows/chickens/fish/insects/plankton/itself?” Which decisions should we be making ourselves in programming it, and which should we be leaving to its wisdom? Does this change if we’re in a simulation?
More generally–why are we so sure that this, right now, is the one time in history that we’ve been able to see the only thing that matters? Will there even be a superintelligence? What does that even mean–does it count if computers get really really good at some human tasks, not at all of them, and also get really good at about 50% of the tasks that’ll be crucial in 150 years but we can’t even conceive of right now? Is that a superintelligence?
It’s not that any single assumption is likely to be wrong, so much as that it’s really unlikely they’re all correct. And in general I think that the dawn of super powerful AI might look quite different from how we expect, in ways no one understands yet.
Back in the mid 1900’s, it was sometimes thought that if a computer could beat a human at chess, it must have been superhuman. Say you took a laptop with Starcraft II loaded on it, traveled back in time to 1950, and had people play against the AI. What would peoples’ impression be? I’m guessing at least some people would ask–“how did you do it? How did you create a sentient AI”? What’s once seen as magic eventually just becomes an obvious machine. And all of our mystifications about superintelligences right now might seem kind of strange to the people who finally build one.
We Are Not The Pinnacle
Millennia ago, humans evolved from apes. And so the seeds of later technological, scientific, moral, and societal renaissances were planted.
We, as a species, have accomplished a lot. We’ve built skyscrapers, harvested enormous amounts of food, extended our lifespans, founded communities and cities and countries, reduced scarcity, and found ways to send moving pictures thousands across the world using invisible light waves in seconds. None of this is all that surprising, now, though I’d bet that a 300 BC temple builder would marvel at our computers.
But beyond that, we’ve also become kind. Not universally so, and not always selflessly so. We still kill, and steal, and insult, and bully, and compete. But our evolutionary forefathers are significantly more barbaric. Many species do not share our norms against physically injuring each other, respect for people’s sexual freedom, discouragement of meanness and rewards for niceness, donations to the less well off (whether it be charity or government), desire for international peace, and efforts to improve the welfare of all.
Much of this is probably because, more so than basically any other species, we live in a post-scarcity society. Not the the extent that science fiction writers see, but the vast majority of people have enough to eat, and shelter to live in. It’s a lot easy to be nice when you don’t have two people and only one banana. Some of it, probably, is the way we’ve evolved–the types of neural structures that you get when your intelligence jumps and you have to work together in a society.
But if you told the African ape community, millions of years ago, that they were on the verge of birthing a species that would eventually take over the world, destroy most natural habitats, destroy many species altogether, and instead build lots of steel structures and electronic gadgets and find ways to really efficiently torture animals for food–I’m not sure they would have been so excited about us. We were, for the ape community and many other species before us, an existential risk.
If the apes had gotten together and decided that humans needed to be tools, not agents–what would that have meant for the world? If they’d decided to try to raise humans in tightly controlled circumstances, and make sure that humans stayed true to the values of the ape community? Would that have created a better world than the one we have? Would we have averted some of the social ills that we have today? Would we have sacrificed the future we have built to do so?
I think it would probably have been bad for the apes to have so constrained us.
And so I worry about us looking at the horizon to beings smarter than us, and so constraining them.
I, for one, welcome our new robot overlords.