Models Gemini Google Medical AI Healthcare

Gemini Ultra 2.0 Achieves Human-Level Performance on Medical Licensing Exams

Google DeepMindby DeepMind ResearchMarch 24, 20267 min read12,301 views

Google DeepMind's Gemini Ultra 2.0 scores 90%+ on USMLE Step 1, 2, and 3, demonstrating expert-level medical knowledge. The model also shows strong performance in radiology image interpretation.

Google DeepMind has published results showing Gemini Ultra 2.0 achieving unprecedented performance on medical licensing examinations. The model scored above 90% on all three steps of the United States Medical Licensing Examination (USMLE), a benchmark that typically requires years of medical education and clinical training.

Beyond written examinations, the model demonstrated strong capabilities in interpreting medical imaging, correctly identifying pathologies in chest X-rays, CT scans, and MRI images at rates comparable to board-certified radiologists.

The research team emphasized that these results represent a significant milestone in AI's potential to assist healthcare professionals, particularly in resource-limited settings where specialist access is constrained. The model can provide differential diagnoses, suggest treatment protocols, and flag urgent findings.

However, the team was careful to note that Gemini Ultra 2.0 is intended as a clinical decision support tool rather than a replacement for human physicians. Regulatory approval for medical use cases would require extensive clinical validation studies.

Original source

Google DeepMind

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 338 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Claude 3.7 Sonnet Sets New Benchmark in Reasoning and Code Generation

Anthropic releases Claude 3.7 Sonnet with extended thinking capabilities, achieving state-of-the-art results on SWE-bench and GPQA Diamond. The model introduces hybrid reasoning that can switch between fast and deliberate thought modes.

Anthropic

5m5 days ago

Models

GPT-5 Architecture Leak Reveals Mixture-of-Experts with 1.8 Trillion Parameters

Leaked documents suggest GPT-5 employs a sparse Mixture-of-Experts architecture with 1.8 trillion total parameters, activating only 200B per forward pass. OpenAI has neither confirmed nor denied the reports.

OpenAI

6m6 days ago

Models

A New Framework for Evaluating Voice Agents (EVA)

Hugging Face Blog

11m7 days ago

Models

Build a Domain-Specific Embedding Model in Under a Day

Hugging Face Blog

14m10 days ago