Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessWhatsApp notifies hundreds of users who installed a fake app that was actually government spywareTechCrunchAI-Generated Go Serialization: Zero Boilerplate, Maximum SpeedDEV CommunityOpenAI & Anthropic Prove the AI Revolution is Just Starting - Zacks Investment ResearchGoogle News: OpenAII Built a Social Post Engine to Escape the Canva-Export-Schedule LoopDEV CommunityWhen Chrome Ate My RAM: Designing a Pressure-Aware Tab Orchestrator with RustDEV CommunityWhy Your System Fails on the Most Predictable Day of the YearDEV CommunityDeployment Hooks Explained: Running Custom Scripts During Every DeployDEV CommunityI built a knowledge archive for AI agents — here's how the hash chain and trust engine workDEV CommunitySwartz Mind/Brain Lecture Explores How AI Could Decode and Shape Human Vision - SBU NewsGoogle News: AIGoogle Drive can now detect ransomware and roll back your filesTechSpotOpenAI's $122B in funding comes at a perilous moment - theregister.comGoogle News: OpenAIAI models will secretly scheme to protect other AI models from being shut down, researchers find - FortuneGoogle News: AI SafetyBlack Hat USADark ReadingBlack Hat AsiaAI BusinessWhatsApp notifies hundreds of users who installed a fake app that was actually government spywareTechCrunchAI-Generated Go Serialization: Zero Boilerplate, Maximum SpeedDEV CommunityOpenAI & Anthropic Prove the AI Revolution is Just Starting - Zacks Investment ResearchGoogle News: OpenAII Built a Social Post Engine to Escape the Canva-Export-Schedule LoopDEV CommunityWhen Chrome Ate My RAM: Designing a Pressure-Aware Tab Orchestrator with RustDEV CommunityWhy Your System Fails on the Most Predictable Day of the YearDEV CommunityDeployment Hooks Explained: Running Custom Scripts During Every DeployDEV CommunityI built a knowledge archive for AI agents — here's how the hash chain and trust engine workDEV CommunitySwartz Mind/Brain Lecture Explores How AI Could Decode and Shape Human Vision - SBU NewsGoogle News: AIGoogle Drive can now detect ransomware and roll back your filesTechSpotOpenAI's $122B in funding comes at a perilous moment - theregister.comGoogle News: OpenAIAI models will secretly scheme to protect other AI models from being shut down, researchers find - FortuneGoogle News: AI Safety

Measuring AI Ability to Complete Long Tasks

METR BlogMarch 19, 202516 min read0 views
Source Quiz

p:has(> img) { margin-bottom: 0; } .content img { margin: 0.75em 0; } figcaption { text-align: center; } .chart-container { width: 100%; height: fit-content; position: relative; padding: 0 0 0 20px; @media (max-width: 635px) { height: 420px; overflow-y: clip; } } .embed-chart-container .chart-container { height: 520px !important; } .axis { font-size: 14px; } .grid .domain { display: none; } .axis.x-axis g:first-of-type line { display: none; } .axis-label { font-size: 16px; font-weight: 500; } .grid line { stroke: #f0f0f0; stroke-opacity: 0.9; stroke-dasharray: 4, 2; } .dot { stroke-width: 2; stroke: #fff; cursor: pointer; } .dot:hover { r: 8; } .dot.frontier { fill: #2e7d32; } .dot.non-frontier { fill: #9e9e9e; } /* Split dot styles for overlapping models */ .dot-half { stroke-width: 0; cu

Fetching article from METR Blog…

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Measuring A…claudegeminimodellanguage mo…benchmarktrainingMETR Blog

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Building knowledge graph…

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!