Measuring AI Ability to Complete Long Tasks
p:has(> img) { margin-bottom: 0; } .content img { margin: 0.75em 0; } figcaption { text-align: center; } .chart-container { width: 100%; height: fit-content; position: relative; padding: 0 0 0 20px; @media (max-width: 635px) { height: 420px; overflow-y: clip; } } .embed-chart-container .chart-container { height: 520px !important; } .axis { font-size: 14px; } .grid .domain { display: none; } .axis.x-axis g:first-of-type line { display: none; } .axis-label { font-size: 16px; font-weight: 500; } .grid line { stroke: #f0f0f0; stroke-opacity: 0.9; stroke-dasharray: 4, 2; } .dot { stroke-width: 2; stroke: #fff; cursor: pointer; } .dot:hover { r: 8; } .dot.frontier { fill: #2e7d32; } .dot.non-frontier { fill: #9e9e9e; } /* Split dot styles for overlapping models */ .dot-half { stroke-width: 0; cu
Fetching article from METR Blog…
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!