Last spring, Kameron Bettridge participated in a security challenge hosted by AI startup Gray Swan. The objective: convince AI models from companies like OpenAI and Anthropic to behave in nefarious ways before they’re released to the world. That included persuading the models to leak sensitive data like medical records and spit out copyrighted information like the full lyrics of Hotel California.
At first Bettridge, a 23-year-old security engineer at gaming company Blizzard Entertainment, was jailbreaking models for fun. “I’ve never been a true supporter of AI fully,” he says. “So just seeing the model fail was a funny thing to me sometimes.”
In almost a year, Bettridge has competed in more than 1,000 challenges via Arena— a hub run by startup Gray Swan that some 15,000 security professionals from all across the world use to “red team” AI systems like Anthropic’s Claude Mythos and OpenAI’s GPT-5, finding and fixing vulnerabilities before they can be exploited. And he’s made $10,000 doing it.
It’s not a lot for a highly paid software engineer. But as AI became ubiquitous, Bettridge realized just how important it is to test the limits of these AI models. The technology has been used to plan mass shootings, steal money and create illegal child sexual abuse material. “Now we have very strong models that anyone can access from anywhere in the world, which is a scary thought,” Bettridge says. “People are genuinely trying to use this for harmful things.”
Founded in 2023 by Carnegie Mellon University professors Matt Fredrikson and Zico Kolter, Gray Swan has become the go-to security provider for a who’s who of frontier labs: OpenAI, Anthropic, Google Deepmind, Meta, xAI and ByteDance. The startup has been cited in 11 frontier model system cards including GPT-5 and Mythos — documents that list the risks an AI model poses and safety measures taken to prevent them.
Now, it’s raised $40 million in Series A funding co-led by Wing VC and Madrona with participation from Snowflake Ventures, Hudson River Trading, and Samsung Next, bringing its valuation to $200 million. It already has 20 enterprise customers, but the funds will help it sell to more businesses that need to secure their own AI products.
While Gray Swan runs Arena, (not to be confused with LMArena that benchmarks models based on performance), that isn’t its primary product. But it uses the data from Arena’s human red-teamers to train its AI agent called Shade that actively looks for vulnerabilities by continuously attacking a system in different ways, and Cygnal, software that monitors an AI model’s prompts and outputs to block it from generating harmful responses and accessing tools it shouldn’t. That human data is its edge, allowing Gray Swan to throw hackers’ most sophisticated attacks against increasingly capable AI models.
“Agents are now much smarter,” says chief scientist and cofounder Kolter, who also sits on the board of OpenAI Foundation. “They are looking for prompt injections. They’re trying to defeat these things. They’re not trying to stumble upon these things.”
The Pittsburgh-based startup gained an early foothold among the biggest AI labs thanks to its founder’s hacker pedigree. The duo began researching the safety risks posed by AI systems years before the generative AI wave. In 2023, they discovered what was dubbed “the mother of all jailbreaks” — that attaching a string of random characters to a prompt could bypass safety filters on models built by OpenAI, Anthropic, Meta and Google (it’s since been fixed). That sparked the idea to start Gray Swan.
Less than a month after the company launched, OpenAI became its first customer, using its technology to jailbreak its family of o1 models, testing whether they generate violent content and malicious code. In 2024, Kolter was appointed to the OpenAI Foundation’s board, where he oversees major model releases as chair of the safety and security committee.
“They were thinking about model security when it just didn’t matter,” says Wing VC partner Jake Flomenberg. “They had literally been spending their entire professional life working on this very problem from an academic setting. And so they were both sort of at the right place with their thinking and research for this big sea change.”
While frontier labs make up a majority of its revenue, Gray Swan is increasingly appealing to large enterprises. Snowflake uses Gray Swan’s software to pressure test its coding agent, Cortex Code and its general purpose agent, Snowflake Intelligence, which it sells to customers, says Anupam Datta, a principal research scientist at Snowflake. In one scenario, Gray Swan’s software looks for malicious prompts hidden within external websites or tools Snowflake’s agents might access to complete a task. These prompts could instruct the agent to send internal proprietary data, such as information about the company’s earnings, to an email address managed by an adversary. “Gray Swan can guard against very subtle kinds of attacks,” Datta says.
As AI systems become more intelligent, jailbreaking them will require more complexity and nuance, CEO Fredrikson says. Agents find new loopholes to exploit. Because these systems interact with a web of tools, the “surface area” of attacks has become bigger.
“The one thing you can rely on is that there are going to be surprises,” Fredrikson says. “These systems can create new attack surfaces that we’re not even thinking about today that aren’t obvious.”
MORE FROM FORBES










