Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessNavigating the Challenges of Cross-functional Teams: the Role of Governance and Common GoalsDEV Community[Side B] Pursuing OSS Quality Assurance with AI: Achieving 369 Tests, 97% Coverage, and GIL-Free CompatibilityDEV Community[Side A] Completely Defending Python from OOM Kills: The BytesIO Trap and D-MemFS 'Hard Quota' Design PhilosophyDEV CommunityFrom Attention Economy to Thinking Economy: The AI ChallengeDEV CommunityHow We're Approaching a County-Level Education Data System EngagementDEV CommunityI Built a Portable Text Editor for Windows — One .exe File, No Installation, Forever FreeDEV CommunityBuilding Global Crisis Monitor: A Real-Time Geopolitical Intelligence DashboardDEV CommunityGoogle's TurboQuant saves memory, but won't save us from DRAM-pricing hellThe Register AI/MLWriting Better RFCs and Design DocsDEV CommunityAnthropic took down thousands of Github repos trying to yank its leaked source code — a move the company says was an accidentTechCrunchIntroducing The Screwtape LaddersLessWrong AIA Very Fine UntuningTowards AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessNavigating the Challenges of Cross-functional Teams: the Role of Governance and Common GoalsDEV Community[Side B] Pursuing OSS Quality Assurance with AI: Achieving 369 Tests, 97% Coverage, and GIL-Free CompatibilityDEV Community[Side A] Completely Defending Python from OOM Kills: The BytesIO Trap and D-MemFS 'Hard Quota' Design PhilosophyDEV CommunityFrom Attention Economy to Thinking Economy: The AI ChallengeDEV CommunityHow We're Approaching a County-Level Education Data System EngagementDEV CommunityI Built a Portable Text Editor for Windows — One .exe File, No Installation, Forever FreeDEV CommunityBuilding Global Crisis Monitor: A Real-Time Geopolitical Intelligence DashboardDEV CommunityGoogle's TurboQuant saves memory, but won't save us from DRAM-pricing hellThe Register AI/MLWriting Better RFCs and Design DocsDEV CommunityAnthropic took down thousands of Github repos trying to yank its leaked source code — a move the company says was an accidentTechCrunchIntroducing The Screwtape LaddersLessWrong AIA Very Fine UntuningTowards AI

Evaluating Different Fewshot Description Prompts on GPT-3

EleutherAI BlogMay 24, 20211 min read0 views
Source Quiz

We evaluate different fewshot prompts on GPT-3 to see how it changes performance.

Adam Shimi suggested the idea of trying different fewshot prompts on GPT-3, and hopefully observing something that evidenced larger models being able to handle a wider variety of prompting. He also wrote up a bunch of prompts to try on SST.

Unfortunately, the results were kinda mixed: the GPT-2 models all did absolutely terrible and their results were basically useless; the performance wasn't monotonic with model size (1.3B did better than 2.7B, and babbage did better than curie). Also, the variance increased with performance in general.

Mean Accuracy Standard Deviation in Accuracy

gpt3-ada 51.9 0.0368

gpt3-babbage 69.4 0.0840

gpt3-curie 67.4 0.0807

neo-1.3B 63.0 0.0522

neo-2.7B 56.5 0.0684

However, there was one interesting and unexpected result: there's basically no correlation between different models on which prompts do the best. This is highly unexpected because I'd expect a priori that models trained on the same/similar data should have similar preferences for what kinds of prompts work well, and that surely some prompts must be better than other prompts in general.

Here's what that looks like plotted out. Each point in these plots is one prompt, and the axes are different models. The values are SST accuracy:

The code for the experiment is here.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Evaluating …EleutherAI …

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 188 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models