Gemini Ultra 2.0 Achieves Human-Level Performance on Medical Licensing Exams
Google DeepMind's Gemini Ultra 2.0 scores 90%+ on USMLE Step 1, 2, and 3, demonstrating expert-level medical knowledge. The model also shows strong performance in radiology image interpretation.
Google DeepMind has published results showing Gemini Ultra 2.0 achieving unprecedented performance on medical licensing examinations. The model scored above 90% on all three steps of the United States Medical Licensing Examination (USMLE), a benchmark that typically requires years of medical education and clinical training.
Beyond written examinations, the model demonstrated strong capabilities in interpreting medical imaging, correctly identifying pathologies in chest X-rays, CT scans, and MRI images at rates comparable to board-certified radiologists.
The research team emphasized that these results represent a significant milestone in AI's potential to assist healthcare professionals, particularly in resource-limited settings where specialist access is constrained. The model can provide differential diagnoses, suggest treatment protocols, and flag urgent findings.
However, the team was careful to note that Gemini Ultra 2.0 is intended as a clinical decision support tool rather than a replacement for human physicians. Regulatory approval for medical use cases would require extensive clinical validation studies.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
Claude 3.7 Sonnet Sets New Benchmark in Reasoning and Code Generation
Anthropic releases Claude 3.7 Sonnet with extended thinking capabilities, achieving state-of-the-art results on SWE-bench and GPQA Diamond. The model introduces hybrid reasoning that can switch between fast and deliberate thought modes.
GPT-5 Architecture Leak Reveals Mixture-of-Experts with 1.8 Trillion Parameters
Leaked documents suggest GPT-5 employs a sparse Mixture-of-Experts architecture with 1.8 trillion total parameters, activating only 200B per forward pass. OpenAI has neither confirmed nor denied the reports.