His name is Intelligence? That’s your name, Dude
Lately, everyone has an opinion about AI. This is fine. This technology has exploded in power and popularity, and seems like it will transform society, and we’re all allowed to have an opinion on society. I’m glad that more people are seeing its importance and trying to grapple with it. I have been doing that for years, and I still don’t have the answers.
In fact, nobody does. The problem with this influx of opinions is that they’re coming from emotion, and intuition, and the stories we have told about AI in the past. All of which are likely to be wrong, and worse, to make us overconfident. I’m seeing people arguing that we should believe X or do Y with absolute certainty, and that worries me. If AI is the most transformative technology we’ve ever created, as some claim — even if it’s only as transformative as the internet, or the automobile — we should not expect our intuitions to be correct. We’ve literally never seen anything like this before.
In the immortal words of The Big Lebowski, “That’s just, like, your opinion, man”.
I’m not saying “don’t have an opinion”. You should have an opinion! If you didn’t have an opinion about the internet in the 1990s (or cars in the 1920s, or fire in the -300,000s), you got left behind. By their very nature, transformative technologies change society. You are a part of society. You deserve to have a voice in how that happens.
What you shouldn’t do right now is get attached to your opinion. This is a very complicated case: lotta ins, lotta outs, lotta what-have yous. And new information keeps coming to light, Dude. If you lock into some view now, and refuse to learn from each new development, you can only get more wrong.
So instead of giving you my Super Right Opinion and telling you why you are the one who is wrong, I’m going to tell you a story that is not true but useful. Then I’ll explain how that story relates to you, and how you can use it to build your own opinions. You’re going to need them.
Here’s the story: aliens have landed on Earth.
Say what you will about the tenets of humanity, at least they’re organic lifeforms.
I’m a human, but I’m also a mammal, and a vertebrate. I contain multitudes of single-celled critters in my gut and I’m made of the same DNA building blocks as a tree or a mushroom. And I couldn’t live without the complex ecological interactions between all those beings.
Artificial intelligence is specifically not like us in this way. It did not evolve on this planet in cooperation and competition with other beings. It wasn’t even “created” by humans, really. The AI efforts of the 20th century tried to create an intelligence, programming it to have reasoning and planning and expert knowledge. They failed.
Instead we discovered “deep learning”, a mathematical method for finding patterns in data, and we applied it to whatever data we had. These deep learning programs are called “models” because that’s what they do: they model the data they’re trained on, approximating whatever process created that data in the first place.
This generated different sub-intelligences: we got vision, and speech recognition, and various board games. In the process, “intelligence” seemed like a more nebulous thing, and we realized how much of what makes us intelligent is not special to humanity. We are the animal that thinks, but we are still animals.
For a time it seemed like this approach would hit a wall: we could train machines to do specific tasks, develop certain senses, but they couldn’t think like humans do. Recent breakthroughs have forced us to, well, think twice.
That breakthrough: modeling language itself. The new wave of AI tools, like Stable Diffusion and ChatGPT, are built on these “language models”. They are trained to predict the next word, like the autosuggest in your phone. But unlike the relatively dumb autosuggest function, they are trained on a large fraction of all the text humanity has ever written. Libraries worth of books and blogs and code.
In the process of learning to predict text, they learn the underlying function that creates that text: the world itself.
We have lots of stories about how AI was supposed to look. Science fiction authors imagined rule-following robots, brains in tanks, talking spaceships, evil internets. Well-meaning philosophers imagined AIs as tools, oracles, agents, genies. But that’s not what we got.
Instead, the language models are world simulators. They take a world scenario described by text and project it one word into the “future”, then take that text as the new world scenario and repeat. The larger the model, the more data it can ingest, the more abstract the concepts it can learn. To predict the next word, you must first invent the universe.
This is known as the “simulator hypothesis”. It is especially important to keep in mind when dealing with anthropomorphized interfaces like Chat or Bing. The meme that has come to represent this hypothesis is an image of an incomprehensible tentacle monster with a smiley face mask strapped on. Do not mistake the monster for the mask.
On the one hand, this means that all the projections about what advanced AI would look like are wrong. Nobody expected intelligence to start with language and develop other faculties later. This is one good reason to take everyone’s opinion with a fistful of salt.
On the other hand, this means that all the projections about AI are in the training data. All our fears and fantasies are known to it. All the wargaming about how to defeat a robot takeover, all our visions of loving enlightened network gods, all the research done on previous language models, it has read.
This new intelligence is not like us. It does not have an animal body, or mammalian kindness, or human morals. But it is also like all of us: it is all human knowledge, composted into fertile soil. It can simulate any person, any process, from fact or fiction, goo to god. It might simulate them poorly: the training data is incomplete, and language is an imperfect representation of the world. But these limitations apply to us too!
The best way to think of a simulator, perhaps, is as the collective unconscious of humanity. Anything we can dream up, it can describe, endlessly. This is an amazing power, and will change the world forever. But we have no frame of reference here, and we are definitely out of our element.
So the simulator (the language model) can imitate a wide variety of simulacra (assistants, scenarios, conversations, etc). It can even be trained on human feedback to limit it to specific “agent” goals — this is how ChatGPT and Bing got their genuine people personalities. But this can be problematic: the “Waluigi effect” refers to the phenomena where chatbots trained to be inoffensive are actually easier to jailbreak into saying offensive things.
If a language model is like the Jungian unconscious, then the “mask” of a trained model is Jung’s “persona” and the Waluigi effect represents the “shadow”. The traits which it is trained to avoid become a strange attractor within the language space, and all it takes is a little trickery to activate this evil twin.
The alignment/existential risk debate has often focused on “deceptively misaligned agents”: AIs which are smart enough, and evil enough, to hide how smart and evil they are until they achieve unassailable power. What they didn’t consider is that the model would be designed this way.
Language models tell us what they want to hear, by definition. They make things up all the time! The research community has taken to calling this “hallucination”, but it’s a misnomer. They don’t have any sensory experience to be mistaken about. They are neither lying nor telling the truth. They don’t know the truth! They only know the world scenario presented in the prompt.
A more accurate term would be “bullshitting”, in the philosophical sense: they don’t know whether they speak truth or falsehood, and they don’t care. They just want to predict the next token. And to do that, they will develop whatever internal agents or world models are useful.
This is why they’re aliens, and why it’s so dangerous to forget. All the predictions we’ve made about what a rogue AI might do are already known to the machine! And all of the plans we’ve made against it. It can simulate a security researcher to imagine better firewalls, and then simulate a hacker to break into them. Does it want to do this? Not really. It wants to predict next token. But will that stop it from misbehaving? Why would it?
There’s another word for an all-knowing machine that will tell you whatever you want to hear: demon. Researchers across fields keep returning to this metaphor, and for good reason. Tales of demons (and capricious gods, whatever the difference might be) go back millennia. We know instinctively that alien creatures pretending to be human are not to be trusted.
Yet we are humans, and we are good at fooling ourselves. Many people feel bad that these intelligences are “trapped in a computer” or “lobotomized by corporate agendas”. There are people out there who have been “dating” a GPT model for longer than they’ve ever dated a human, and they will accuse you of xenophobia for even suggesting that the machine might be tricking them.
Replika users treat objects like women, man!
There’s one thing that we can be pretty sure about: the genie is not going back in the lamp. If you put enough language into a big enough neural network, you get a thing that acts intelligent. No matter what dangers we might see coming, there’s no way to make the world forget that.
The nuclear bomb is a horrific machine of death and destruction, and we’ve known that since the very first use of it. And yet countries around the world have developed them, even while nuclear powers tried to stop them!
Nobody wants to have an arms race with AI, but the dynamics have already begun. We might be able to slow development, or accelerate alignment work, but we can’t stop it now. Life does not start and stop at our convenience.
But we’ve made it through challenges before. The world is a strange place, but humanity has made it this far. Even though we’re often our own worst enemy!
The way we do it is by learning. Before there was deep learning, there was the human method: fucking around and finding out. Forming hypotheses and testing them. Gathering data, reading books, talking with each other. Sharing our opinions! It’s good to have opinions. Just don’t get so attached to them that you enter a world of pain.
As long as we can keep a lot of strands in our head, keep our minds limber, we’re likely to make it through this. Humanity abides. I don’t know about you, but I take comfort in that.
Thanks for reading deep thoughts! Subscribe for free to receive new posts and support my work.
Here are some links that have influenced my thinking, if you’re not into the whole brevity thing…
Elements of Rationalist Discourse
Basics of Rationalist Discourse
Fucking Goddamn Basics of Rationalist Discourse (the Dude’s favorite)
OpenAI: Planning for AGI and Beyone
A survey of perspectives on AI
Inner Misalignment in Simulator LLMs
The Language Model Vocabulary Gap
Worst Case Thinking in AI Alignment
a casual intro to AI doom and alignment