Ai2 says its new AI model beats one of DeepSeek's best

Move over, DeepSeek. There’s a new AI champion in town — and they’re American.

On Thursday, Ai2, a nonprofit AI research institute based in Seattle, released a model that it claims outperforms DeepSeek V3, one of Chinese AI company DeepSeek’s leading systems.

Ai2’s model, called Tulu 3 405B, also beats OpenAI’s GPT-4o on certain AI benchmarks, according to Ai2’s internal testing. Moreover, unlike GPT-4o (and even DeepSeek V3), Tulu 3 405B is open source, which means all of the components necessary to replicate it from scratch are freely available and permissively licensed.

A spokesperson for Ai2 told TechCrunch that the lab believes Tulu 3 405B “underscores the U.S.’ potential to lead the global development of best-in-class generative AI models.”

“This milestone is a key moment for the future of open AI, reinforcing the U.S.’ position as a leader in competitive, open source models,” the spokesperson said. “With this launch, Ai2 is introducing a powerful, U.S.-developed alternative to DeepSeek’s models — marking a pivotal moment not just in AI development, but in showcasing that the U.S. can lead with competitive, open source AI independent of the tech giants.”

Tulu 3 405B is a rather large model. Containing 405 billion parameters, it required 256 GPUs running in parallel to train, according to Ai2. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.

Ai2 Tulu3-405B — Ai2 tested Tulu3 405B on popular benchmarks.Image Credits:Ai2

According to Ai2, one of the keys to attaining competitive performance with Tulu 3 405B was a technique called reinforcement learning with verifiable rewards. Reinforcement learning with verifiable rewards, or RLVR, trains models on tasks with “verifiable” outcomes, like math problem solving and following instructions.

Techcrunch event

San Francisco | October 27-29, 2025

REGISTER NOW

Ai2 claims that on the benchmark PopQA, a set of 14,000 specialized knowledge questions sourced from Wikipedia, Tulu 3 405B beat not only DeepSeek V3 and GPT-4o, but also Meta’s Llama 3.1 405B model. Tulu 3 405B also had the highest performance of any model in its class on GSM8K, a test containing grade school-level math word problems.

Tulu 3 405B is available to test via Ai2’s chatbot web app, and the code to train the model is on GitHub and the AI dev platform Hugging Face. Get it while it’s hot — and before the next benchmark-beating flagship AI model comes along.

TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.

Topics

AI, AI, AI2, DeepSeek v3, Generative AI, open source, open source ai, Tulu3-405B

Kyle Wiggers

AI Editor

Kyle Wiggers was TechCrunch’s AI Editor until June 2025. His writing has appeared in VentureBeat and Digital Trends, as well as a range of gadget blogs including Android Police, Android Authority, Droid-Life, and XDA-Developers. He lives in Manhattan with his partner, a music therapist.

View Bio

Topics

More from TechCrunch

Ai2 says its new AI model beats one of DeepSeek’s best

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

AI recruiter Alex raises $17M to automate initial job interviews

Vibe-coding startup Anything nabs a $100M valuation after hitting $2M ARR in its first two weeks

The AI services transformation may be harder than VCs think

Famed roboticist says humanoid robot bubble is doomed to burst

Electronic Arts will reportedly be acquired for $50B

Spotify to label AI music, filter spam and more in AI policy change

It isn’t your imagination: Google Cloud is flooding the zone

Ai2 says its new AI model beats one of DeepSeek’s best

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Most Popular

AI recruiter Alex raises $17M to automate initial job interviews

Vibe-coding startup Anything nabs a $100M valuation after hitting $2M ARR in its first two weeks

The AI services transformation may be harder than VCs think

Famed roboticist says humanoid robot bubble is doomed to burst

Electronic Arts will reportedly be acquired for $50B

Spotify to label AI music, filter spam and more in AI policy change

It isn’t your imagination: Google Cloud is flooding the zone