Claude 3.7 Sonnet Sets New Benchmark in Reasoning and Code Generation
Anthropic releases Claude 3.7 Sonnet with extended thinking capabilities, achieving state-of-the-art results on SWE-bench and GPQA Diamond. The model introduces hybrid reasoning that can switch between fast and deliberate thought modes.
Anthropic has unveiled Claude 3.7 Sonnet, marking a significant leap in AI reasoning capabilities. The model introduces a novel hybrid architecture that allows it to seamlessly switch between rapid response generation and deep, deliberate reasoning chains.
In benchmark evaluations, Claude 3.7 Sonnet achieved 70.3% on SWE-bench Verified, surpassing all previous models on software engineering tasks. On GPQA Diamond, a graduate-level science reasoning benchmark, it scored 84.8%, demonstrating exceptional scientific reasoning.
The extended thinking feature allows the model to work through complex problems step-by-step, with users able to observe the reasoning process in real-time. This transparency in reasoning has been particularly valued by researchers and developers building complex agentic systems.
Anthropic's CEO Dario Amodei noted that this release represents "a meaningful step toward AI systems that can engage in genuine scientific and engineering work," while maintaining the company's commitment to safety and interpretability research.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
ClaudeAnthropicReasoning
Judge sides with Anthropic to temporarily block the Pentagon’s ban
After Anthropic's weeks-long standoff with the Pentagon, the company won one milestone: A judge granted Anthropic a preliminary injunction in its lawsuit, which sought to reverse its government blacklisting while the judicial process plays out. "The Department of War's records show that it designated Anthropic as a supply chain risk because of its 'hostile manner […]
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.

