Längere Kontextfenster für Sprachmodelle – und die Zeit der Agenten bricht an
Eric Schmidt, der Ex-Google-CEO und jetzige K.I.-Investor hat ein faszinierendes Interview gegeben. Dem Magazin NOEMA erzählte er von zwei bahnbrechenden Entwicklungen: Erstens, die “Kontextfenster” von Sprachmodellen werden länger und zweitens, die Zeit der “Agenten” bricht an.
Beide Konzepte zu verstehen ist zentral um eine Ahnung zu bekommen, was uns in den nächsten Jahren bevor steht. Hier das Interview zum Ansehen. Ich habe auch das Transkript, leicht von Claude editiert, angehängt.
Zum Lesen:
The key thing that's going on now is we're moving very quickly through the capability ladder steps. And I think there are roughly three things going on now that are going to profoundly change the world very quickly. And when I say very quickly, the cycle is roughly a new model every year to 18 months.
The first is basically this question of context window. And for non-technical people, the context window is the prompt that you ask. So you know, "study John F Kennedy" or something, right? But in fact, that context window can have a million words in it. And this year, people are inventing a context window that is infinitely long. And this is very important because it means that you can take the answer from the system and feed it in and ask it another question.
So I want a recipe, let's say I want a recipe to make a drug or something. So I say, "What's the first step?" And it says, "Buy these materials." So then you say, "Okay, I've bought these materials. Now, what's my next step?" And then it says, "Buy a mixing pan." And then the next step is, "How long do I mix it for?" You see, it's a recipe. That's called Chain of Thought reasoning. And it generalizes really well.
We should be able, in five years for example, to be able to produce a thousand step recipes to solve really important problems in science, in medicine, in material science, climate change, that sort of thing. That's the first one.
Second one is Agents. An agent can be understood as a large language model that knows something new or has learned something. So an example would be, um, read all of chemistry, learn something about chemistry, have a bunch of hypothesis about chemistry, run some tests in a lab about chemistry, and then add that to your agent. These agents are going to be really powerful. And it's reasonable to expect that agents will be not only will there be a lot of them, and I mean millions, but there'll be like the equivalent of GitHub for agents. There'll be lots and lots of agents running around and available to you.
And the third one, which to me is the most profound, which is already beginning to happen, is text to action. And what that is is, "Write me a piece of software to do something," right? You just say it. And can you imagine having programmers that actually do what you say you want? And it does it 24 hours a day? And strangely, these systems are good at writing codes such as language like Python.
You put all that together and you've got infinite context window, the ability for agents, and then the ability to do this programming. Now, this is very interesting. What then happens? There's a lot of questions here. And now we get into the questions of science fiction.
I'm sure the three things I've named are happening because that work is happening now. But it's some point, these systems will get powerful enough that you'll be able to take the agents and they'll start to work together, right? So your agent and my agent and her agent and his agent will all combine to solve a new problem.
At some point, people believe that these agents will develop their own language. And that's the point when we don't understand what we're doing. You know what we should do? Pull the plug. Literally unplug the computer. So it's really a problem when agents start to communicate in ways and doing things that we, as humans, do not understand. That's the limit, in my view.
And you think again, how far off in the future? Well, there have been many, many predictions. Clearly, agents and these things will occur in the next few years. And it won't occur in like, there won't be one day where everybody says, "Oh my God!" It's more a question of capabilities every month, every 6 months, and so forth.
A reasonable expectation is we'll be in this new world within 5 years. Wow, not 10. And the reason is there's so much money. And not there are also so many ways in which people are trying to accomplish this. You have the big guys, the three large so-called Frontier models. But you have a very large number of players who are programming at one level lower, at much lesser lower cost, who are iterating very quickly.
Plus, you have a great deal of research. I think there's every reason to think that some version of what I'm saying will occur within 5 years, and maybe sooner.
Well now, so you say pull the plug. So two questions. So how do you pull the plug? But even before you pull the plug, if you know you're already in Chain of Thought reasoning and you're headed to what you fear, don't you need to regulate at some point that it doesn't get there? Or is that beyond the scope of regulation?
Well, a group of us have been working very closely with the governments in the West. And we've started talking to the Chinese, which of course is complicated and takes time, about these issues. And at the moment, the governments, with the exception of Europe which is always kind of slightly confused, have been doing the right thing, which is they've set up trust and safety institutes. They're beginning to learn how to measure things and check things.
And the right approach is for the governments to watch us and make sure we don't get confused on what the goal is, right? So as long as the companies are well-run Western companies with shareholders and lawsuits and all that, we'll be fine. There's a great deal of concern in these Western companies about liability, doing bad things. Nobody wants to hurt people. They're not, they don't wake up in the morning saying, "Let's hurt somebody," right?
Now, of course, there's the proliferation problem. But in terms of the core research, the researchers are trying to be honest.
Okay, so that's the West. So by saying the West, you're implying that proliferation outside the West is where the danger is. The bad guys are out there somewhere.
Well, one of the things that we know, and it's always useful to remind the techno-optimists in my world, there are evil people. And they will use your tools to hurt people. My favorite example is that the face recognition stuff was invented not to constrain the Uyghurs, you know? They didn't say, "We're going to invent face recognition in order to constrain this minority in China called the Uyghurs," right? But it's happening. All technology is dual use. All of these inventions can be misused. And it's important for the inventors to be honest with that.
So in open source, which is for those of you who don't follow it, open source is where the source code, in models the weights, that is the numbers that have been calculated, are released to the public. Those immediately go throughout the world. And who do they go to? They go to China, of course. They go to Russia. They go to Iran. They go to Bulgaria. They go to North Korea.
When I was most recently in China, the vast, essentially all of the work I saw started with open-source models from the West, which were then amplified. So it sure looks to me like these leading firms, the ones I'm talking about, the ones that are putting 10 billion, you know, a billion, 10 billion dollar eventually into this, will be tightly regulated. I worry that the rest will not.
You can see, I'll give you another example. Look at this problem of misinformation. I think it's largely unsolvable. And the reason is the code to generate misinformation is essentially free. Any, you know, person, right, a good person, a bad person, has access to them. It doesn't cost anything. And they produce very, very good images. There are regulatory solutions to that. But the important point is that that cat is out of the bag, or whatever metaphor you want.
It's important that these more powerful systems, especially as they get closer to general intelligence, have some limits on proliferation. And that problem is not yet solved.
Yet to follow up on your point about the funding, Fei-Fei Li at Stanford argues that's the biggest problem, is that there's so much money going into the private sector. And who's their competition to look at what the red lines are or whatever? It's the universities, which don't have a lot of money.
So you really trust these companies to be transparent enough to be regulated by government that doesn't know what they're talking about, really?
The correct answer is always trust but verify. And the truth is, you should trust and you should also verify. And at least in the West, the best way to verify is to use private companies that are set up as verifiers. Because they can employ the right people and so forth.
So in all of our industry conversations, it's pretty clear that the way it will really work is you'll end up with AI checking AI. It's too hard. Think about it. You build a new model. It's been trained on new data. You worked really hard on it. How do you know what it knows? Now, you can ask it all the previous questions, but what if it's discovered something completely new and you don't think about it, right? And the systems can't regurgitate everything they know. You have to ask them chunk by chunk by chunk.
So it makes perfect sense that an AI would be the only way to police that. People are working on that.
With Fei-Fei's argument, she's completely correct. We have the rich private industry companies and we have the poor universities who have incredible talent. It should be a major national priority in all of the Western countries to get research funding for the hardware.
If you were a physicist 50 years ago, you had to move to where the cyclotrons were, because they were really hard and expensive. And by the way, they still are really hard and expensive. You need to be near a cyclotron to do your work as a physicist.
We never had that in software. Our stuff was capital cheap, not capital expensive. The arrival of heavy-duty training in our industry is a huge economic change. And what's happening is that companies are figuring this out. And the really rich companies, I'm thinking of Microsoft and Google as an example, are planning to spend billions of dollars because they have the cash. They have big businesses. The money's coming in. That's good.
Where does the innovation come from? They don't have that kind of hardware and yet they need access to that.
Okay, let's go to China. So you just, um, you on Kissinger's last trip to China, you went with him and he had a discussion with Wang Yi on exactly this set of issues. Your idea was to set up a high-level group to discuss the potential and catastrophic possibilities of AI.
Where do the Chinese fit in on this? On the one hand, I've heard you say, and not only you, that we need to go all out to compete with the Chinese for some of the reasons you just said, because there could be bad players or bad intentions. But where is it appropriate to cooperate and why?
Well, first place, the Chinese should be pretty worried about generative AI. And the reason is that they don't have free speech. And so what do you do when the system generates something that's not permitted in their country? Who do you jail? The computer? The user? The developer? The training data? It's not at all obvious.
And the Chinese regulators so far have been relatively intelligent about this. But it's obvious, if you think about it, that the spread of these things will be highly restricted in China because it fundamentally addresses their information monopoly. That makes sense.
So in our conversation with China, both Dr. Kissinger and I, when we were together, and unfortunately he passed away, and the subsequent meetings have been set up as a result of his inspiration to do them, everyone agrees that there's a problem. But we're at the moment with China, we're speaking in generalities. There is not a proposal in front of either side that's actionable.
And that's okay because it's complicated. And a lot of this, because of the stakes involved, it's actually good to take your time to actually explain what you view as the problem.
So many Western computer scientists are visiting with their Chinese counterparts and trying to say, "If you allow this stuff to proliferate, you could end up with a terrorist act." The misuse of these for biological weapons, the misuse of these for cyber.
The long-term worry is much more existential. But at the moment, I think the Chinese conversations are largely very constrained by concerns about bio-threats and cyber threats.
The long-term threat goes something like this. When I talk about AI, I talk about it as human-generated. So you or I give it, at least in theory, a command. And it may be a very long command, and it may be recursive in the sense, but it starts with a human judgment.
There is something technically called recursive self-improvement, where the model actually runs on its own and it just learns and gets smarter and smarter. When that occurs, or when agent-to-agent interaction that's heterogeneous occurs, we have a very different set of threats, which we're not ready to talk to anybody about because we don't understand them. But they're coming.
Do you see, I guess I'm trying to think about what a kind of dialogue with the Chinese could mean. Would it be something like nuclear proliferation? I mean, where if they understand the existential threat, to start at that level, maybe an IAEA-type of thing for proliferation. Do you think that's possible on the political horizon?
It's going to be very difficult to get any actual treaties with China. What I'm engaged with is called a Track Two dialogue, which means that it's informal. It's not, it's educational. It's interesting.
It's very hard to predict, by the time we get to real negotiations between the US and China, what the political situation will be, what the threat situation would be.
A simple requirement would be that if you're going to do training for something that's completely new, you have to tell the other side that you're doing it. So that you don't surprise them. So it's like the Open Skies during the Cold War.
So an example would be a "no surprises" rule. When a missile is launched anywhere in the world, all the countries acknowledge that they know it's coming. That way they don't jump to a conclusion and think it's targeted at them. That strikes me as a basic rule.
Furthermore, that if you're doing powerful training, there needs to be some agreements around safety. In biology, there's a broadly accepted set of layers, BSL-1 to 4, for biosafety containment, which makes perfect sense because these things are dangerous.
Eventually, there will be a small number of extremely powerful computers that I want you to think about. They'll be in an army base and they'll be powered by some nuclear power source in the army base. And they'll be surrounded by even more barbed wire and machine guns, because their capability for invention, for power and so forth, exceeds what we want as a nation to give either to our own citizens without permission, as well as to our competitors.
It makes sense to me that there will be a few of those. And there'll be a lot of other systems that are more broadly available.
But you're saying that you would notify the Chinese that those systems exist?
Again, it's possible that that would be an answer. And vice versa. All of these things are mutual. But you want to avoid a situation where a runaway agent in China ultimately gets access to a weapon and launches it, foolishly thinking that that's some game, without, because remember, these are not human humans. They don't necessarily understand the consequence.
These systems are all based on a simple principle of predicting the next word. So we're not talking about high intelligence here. We're certainly not talking about the kind of emotional understanding and history that humans have and human values.
So when you're dealing with a non-human intelligence that does not have the benefit of human experience, what bounds do you put on it? And maybe we can come to some agreements on what those are.
Are they moving as exponentially as we are in the West, with the billions going into generative AI? Is China having the commensurate billions coming in from government or companies?
It's not at the same level in China for reasons I don't fully understand. My estimate, having now reviewed it at some length, is that they're about two years behind. Two years is not very much, by the way. But they're definitely behind.
There are at least four companies that are attempting to do large-scale model training, which is similar to what I've been talking about. And they're the obvious big tech companies in China.
They're hobbled because they don't have access to the very best hardware, which is restricted from export by the Trump and now Biden administrations. Those restrictions are likely to get tougher, not easier. And so as the Nvidia and their competitor chips go up in value, China will be struggling to stay relevant, because their stuff won't move at the same pace.
Do you agree with not letting those chips flow to China?
The chips are important because they enable this kind of learning. It's always possible to do it with slower chips. You just need more of them. And so it's effectively a cost tax for Chinese development. That's a way to think about it.
And is it ultimately dispositive? Does it mean that China can't get there? No, but it makes it harder and makes it means that it takes them longer to do so. And we should do that as the West.
Well, the West has agreed to do it. I think it's fine. It's a fine strategy. I'm much more concerned about the proliferation of open source. And the reason is, and I'm sure the Chinese would have the same concern. So again, these are the kinds of things that we'll be talking to them about. Do you understand that these things can be misused against your government as well as ours?
So the scenario is, open source folks basically do something called basically guardrails. And they fine-tune and they use a technology called RHF to eliminate some of the bad answers.
There's plenty of evidence that it's relatively easy, if I gave you all of the weights, all of the stuff, and so forth, it'd be relatively easy for you to back them out and see the raw power of the model. And that's a great concern. That problem's not been solved engineering-wise. Reverse engineer. And that's not been solved yet.