Models Claude Anthropic Reasoning Benchmarks

Claude 3.7 Sonnet Sets New Benchmark in Reasoning and Code Generation

Anthropicby Anthropic TeamMarch 26, 20265 min read15,427 views

Anthropic releases Claude 3.7 Sonnet with extended thinking capabilities, achieving state-of-the-art results on SWE-bench and GPQA Diamond. The model introduces hybrid reasoning that can switch between fast and deliberate thought modes.

Anthropic has unveiled Claude 3.7 Sonnet, marking a significant leap in AI reasoning capabilities. The model introduces a novel hybrid architecture that allows it to seamlessly switch between rapid response generation and deep, deliberate reasoning chains.

In benchmark evaluations, Claude 3.7 Sonnet achieved 70.3% on SWE-bench Verified, surpassing all previous models on software engineering tasks. On GPQA Diamond, a graduate-level science reasoning benchmark, it scored 84.8%, demonstrating exceptional scientific reasoning.

The extended thinking feature allows the model to work through complex problems step-by-step, with users able to observe the reasoning process in real-time. This transparency in reasoning has been particularly valued by researchers and developers building complex agentic systems.

Anthropic's CEO Dario Amodei noted that this release represents "a meaningful step toward AI systems that can engage in genuine scientific and engineering work," while maintaining the company's commitment to safety and interpretability research.

Original source

Anthropic

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

ClaudeAnthropicReasoning

Products

Anthropic wins injunction against Trump administration over Defense Department saga

A federal judge has ordered that the Trump administration rescind recent restrictions it placed on the AI company.

TechCrunch AI

2m4 days ago

Products

Judge sides with Anthropic to temporarily block the Pentagon’s ban

After Anthropic's weeks-long standoff with the Pentagon, the company won one milestone: A judge granted Anthropic a preliminary injunction in its lawsuit, which sought to reverse its government blacklisting while the judicial process plays out. "The Department of War's records show that it designated Anthropic as a supply chain risk because of its 'hostile manner […]

The Verge AI

7m4 days ago

Products

Anthropic Supply-Chain-Risk Designation Halted by Judge

A judge temporarily blocked the Trump administration’s designation, clearing the way for Anthropic to keep doing business without the label starting next week.

Wired AI

3m4 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 338 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

GPT-5 Architecture Leak Reveals Mixture-of-Experts with 1.8 Trillion Parameters

Leaked documents suggest GPT-5 employs a sparse Mixture-of-Experts architecture with 1.8 trillion total parameters, activating only 200B per forward pass. OpenAI has neither confirmed nor denied the reports.

OpenAI

6m6 days ago

Models

Gemini Ultra 2.0 Achieves Human-Level Performance on Medical Licensing Exams

Google DeepMind's Gemini Ultra 2.0 scores 90%+ on USMLE Step 1, 2, and 3, demonstrating expert-level medical knowledge. The model also shows strong performance in radiology image interpretation.