Can AI make a game?
A company called Etched has just released Oasis, the first game of its kind. In conjunction with Decart, an AI lab, Etched has built a game that’s AI-generated, a derivative version of Minecraft.
Spokespersons describe Oasis as generating video frame by frame, from keyboard and mouse inputs.
You can read more in the MIT Technology Review where Scott Mulligan gives us a Halloween preview of what we’re going to see in the years to come, and where Etched and partners are at right now.
Mulligan notes that the Oasis game is essentially a “proof of concept” not a finished plan, and quotes Decart co-founder and CEO Dean Leitersdorf as explaining: “Your screen can turn into a portal—into some imaginary world that doesn’t need to be coded, that can be changed on the fly. And that’s really what we’re trying to target here.”
All that, in 360p resolution.
But this is a nascent foray into this kind of gaming.
“A major limitation right now is hardware,” Mulligan explains. “(Game makers) relied on Nvidia cards for their current demo, but in the future, they plan to use Sohu, a new card that Etched has in development, which the firm claims will improve performance by a factor of 10. This gain would significantly cut down on the cost and energy needed to produce real-time interactive video. It would allow Decart and Etched to make a better version of their current demo, allowing the game to run longer, with fewer hallucinations, and at higher resolution. They say the new chip would also make it possible for more players to use the model at once.”
That said, when you dig into how this new game was made, you get inklings of future worlds beyond our current understanding
Sohu and the Transformer Architecture
What’s special about what Etched is doing? A lot of it comes down to that hardware design.
The Oasis game, as Mulligan’s coverage notes, will eventually run on a very special ship called Sohu that Etched is making – in a new way, putting the transformer architecture right onto the chip. Etched spokespersons explain:
“Because Sohu can only run one algorithm, the vast majority of control flow logic can be removed, allowing it to have many more math blocks. As a result, Sohu boasts over 90% FLOPS utilization (compared to ~30% on a GPU7 with TRT-LLM).”
By putting the transformer into the chip, the company is making a big play on the current transformer method, and betting that this kind of specialization will ultimately pay off, for example, in helping these new transformer chips take over from Nvidia GPUs.
What is a Transformer?
If you want to understand what Etched is working on, it’s important to know how a transformer model works in network architecture.
One concise way to explain it is that the transformer helps the model to ‘pay attention’ to itself. As explained by this tutorial/article on Medium, the transformer model makes text into tokens, and then makes those tokens into a vector matrix. It breaks down elements into queries, keys and values, and then injects each of these into a multihead attention mechanism that will weigh and evaluate each input.
The eventual result is that the system will amplify relevant items, and amplify what’s less relevant.
That’s different from backpropagation, where network systems learned by analyzing what they had done in the past.
In some of our recent lectures, you could see people demonstrating the attention mechanism with computer vision, showing where the AI system was looking when it made a decision.
In this case, Etched is applying it to gaming, where the AI can do things like allow players to break blocks, or generate new vistas when a player moves in a certain direction.
Real-Time Generation and the Future of the Internet
Two important things here:
One is that building the transformer architecture onto the chip helps with real-time generation, so that you’re going to see new use cases where AI delivers video to you in real time. That’s going to add a lot to the persuasive capability of what technologies can do.
The other interesting point comes from Etched’s own page where they talk about what trends will accompany the transformer revolution.
Basically, the writers suggest that pretty soon, most Internet content (or almost all of it) will be AI generated. And a lot of it will be video. So AI will actually be showing us things to look at, and guiding our human attention in ways that we never thought possible.
What does this mean? Well, imagine that instead of watching streaming video made by people, you’re watching streaming video made by AI. What is it going to trigger in your mind? What sort of responses is it going to elicit from you? OR do you think you’ll be immune to this new method of persuasion?
Those questions are really too broad to be answered until we really test out these systems and get a better handle how all this works. It’s a big transition, and we should be mindful of that as we move forward. But you can see with this breaking story, how hardware evolution complements what stakeholders are trying to do with explorative video technology.