Announcing GPT-NeoX-20B
Announcing GPT-NeoX-20B, a 20 billion parameter model trained in collaboration with CoreWeave.
As of February 9, 2022, GPT-NeoX-20B checkpoints are available for download from The Eye under Apache 2.0. More in-depth information on GPT-NeoX-20B can be found in the associated technical report on arXiv.
Looking for a demo? Try GPT-NeoX-20B via CoreWeave and Anlatan's inference service, GooseAI!
After a year-long odyssey through months of chip shortage-induced shipping delays, technical trials and tribulations, and aggressively boring debugging, we are happy to finally announce EleutherAI's latest open-source language model: GPT-NeoX-20B, a 20 billion parameter model trained using our GPT-NeoX framework on GPUs generously provided by our friends at CoreWeave.
GPT-NeoX-20B is, to our knowledge, the largest publicly accessible pretrained general-purpose autoregressive language model, and we expect it to perform well on many tasks.
We hope that the increased accessibility of models of this size will aid in research towards the safe use of AI systems, and encourage anyone interested in working in this direction to reach out to us.
As a thank you to our generous compute donors, we are delaying the public downloadable release of the model by 7 days. On February 9, 2022, the full model weights will be downloadable for free under a permissive Apache 2.0 license from The Eye.
There will be a #20b channel set up in our Discord for discussions of this model. Please note that much like our other language models and codebases, GPT-NeoX and GPT-NeoX-20B are very much research artifacts and we do not recommend deploying either in a production setting without careful consideration. In particular, we strongly encourage those looking to use GPT-NeoX-20B to read the paper and datasheet on our training data. There are still bugs to be ironed out and many inefficiencies that could be addressed---but hey, we do this in our free time, give us a break lol
Task Category Babbage Curie GPT-J-6B FairSeq-13B GPT-NeoX-20B DaVinci
LAMBADA Sentence Completion 62.49% 69.51% 68.29% 70.95% 72.00% 75.16%
ANLI R1 Natural Language Inference 32.40% 32.80% 32.40% 34.00% 34.00% 36.30%
ANLI R2 Natural Language Inference 30.90% 33.50% 34.00% 33.00% 34.40% 37.00%
ANLI R3 Natural Language Inference 33.75% 35.50% 35.50% 34.75% 35.40% 36.83%
WSC Coreference Resolution 54.54% 49.54% 49.54% 55.44% 50.00% 59.18%
WinoGrande Coreference Resolution 59.51% 64.56% 64.01% 67.40% 66.10% 69.93%
HellaSwag Sentence Completion 40.38% 54.81% 36.53% 57.69% 53.50% 63.46%
Average
44.85% 48.60% 45.75% 50.43% 49.34% 53.98%
Accuracy on standard language modeling tasks.
Subject Group Babbage Curie GPT-J-6B FairSeq-13B GPT-NeoX-20B DaVinci
Humanities 27.01% 26.48% 28.07% 27.27% 28.70% 32.30%
Social Science 27.94% 29.24% 28.73% 27.94% 30.80% 35.87%
STEM 25.83% 24.25% 25.71% 24.63% 27.20% 28.60%
Other 26.86% 28.84% 27.95% 27.33% 29.20% 36.85%
Average 26.91% 27.20% 27.62% 26.79% 28.98% 33.41%
Zero-shot accuracy of factual knowledge by subject group, as measured by the HendrycksTest evaluation.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelbillionAnthropic struggling with Chinese competition, its own safety obsession
<h4>The maker of Claude faces headwinds as it rushes to go public</h4> <p>Anthropic, riding a wave of goodwill after resisting demands from the US Defense Department to soften model safeguards, is reportedly planning to go public as soon as Q4 2026.…</p>

Quantum simulations that once needed supercomputers now run on laptops
A team at the University at Buffalo has made it possible to simulate complex quantum systems without needing a supercomputer. By expanding the truncated Wigner approximation, they’ve created an accessible, efficient way to model real-world quantum behavior. Their method translates dense equations into a ready-to-use format that runs on ordinary computers. It could transform how physicists explore quantum phenomena.
Optimizing for efficiency with IBM’s Granite
We often judge AI models by leaderboard scores, but what if efficiency matters more? Kate Soule from IBM joins us to discuss how Granite AI is rethinking AI at the edge—breaking tasks into smaller, efficient components and co-designing models with hardware. She also shares why AI should prioritize efficiency frontiers over incremental benchmark gains and how seamless model routing can optimize performance. Featuring: Kate Soule – LinkedIn Chris Benson – Website , GitHub , LinkedIn , X Daniel Whitenack – Website , GitHub , X Links: IBM Granite IBM Granite on Hugging Face IBM Expands Granite Model Family with New Multi-Modal and Reasoning AI Built for the Enterprise ]]>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
Anthropic struggling with Chinese competition, its own safety obsession
<h4>The maker of Claude faces headwinds as it rushes to go public</h4> <p>Anthropic, riding a wave of goodwill after resisting demands from the US Defense Department to soften model safeguards, is reportedly planning to go public as soon as Q4 2026.…</p>

Quantum simulations that once needed supercomputers now run on laptops
A team at the University at Buffalo has made it possible to simulate complex quantum systems without needing a supercomputer. By expanding the truncated Wigner approximation, they’ve created an accessible, efficient way to model real-world quantum behavior. Their method translates dense equations into a ready-to-use format that runs on ordinary computers. It could transform how physicists explore quantum phenomena.
Optimizing for efficiency with IBM’s Granite
We often judge AI models by leaderboard scores, but what if efficiency matters more? Kate Soule from IBM joins us to discuss how Granite AI is rethinking AI at the edge—breaking tasks into smaller, efficient components and co-designing models with hardware. She also shares why AI should prioritize efficiency frontiers over incremental benchmark gains and how seamless model routing can optimize performance. Featuring: Kate Soule – LinkedIn Chris Benson – Website , GitHub , LinkedIn , X Daniel Whitenack – Website , GitHub , X Links: IBM Granite IBM Granite on Hugging Face IBM Expands Granite Model Family with New Multi-Modal and Reasoning AI Built for the Enterprise ]]>
Software and hardware acceleration with Groq
How do you enable AI acceleration (at both the hardware and software layers) that stays ahead of rapid industry shifts? In this episode, Dhananjay Singh from Groq dives into the evolving landscape of AI inference and acceleration. We explore how Groq optimizes the serving layer, adapts to industry shifts, and supports emerging model architectures. Featuring: Dhananjay Singh – LinkedIn , X Chris Benson – Website , GitHub , LinkedIn , X Daniel Whitenack – Website , GitHub , X Links: Groq Sponsors: Augment Code - Developer AI that uses deep understanding of your large codebase and how you build software to deliver personalized code suggestions and insights. Augment provides relevant, contextualized code right in your IDE or Slack. It transforms scattered knowledge into code or answers, elimin
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!