Models model benchmark training open-source product platform

[P] PhAIL (phail.ai) – an open benchmark for robot AI on real hardware. Best model: 5% of human throughput, needs help every 4 minutes.

Reddit r/MachineLearningby /u/svertix https://www.reddit.com/user/svertixApril 2, 20262 min read0 views

I spent the last year trying to answer a simple question: how good are VLA models on real commercial tasks? Not demos, not simulation, not success rates on 10 tries. Actual production metrics on real hardware. I couldn't find honest numbers anywhere, so I built a benchmark. Setup: DROID platform, bin-to-bin order picking – one of the most common warehouse and industrial operations. Four models fine-tuned on the same real-robot dataset, evaluated blind (the operator doesn't know which model is running). We measure Units Per Hour (UPH) and Mean Time Between Failures (MTBF) – the metrics operations people actually use. Results (full data with video and telemetry for every run at phail.ai ): Model UPH MTBF OpenPI (pi0.5) 65 4.0 min GR00T 60 3.5 min ACT 44 2.8 min SmolVLA 18 1.2 min Teleop / Fi

Could not retrieve the full article text.

Read on Reddit r/MachineLearning →

Original source

Reddit r/MachineLearning

https://www.reddit.com/r/MachineLearning/comments/1sajdwr/p_phail_phailai_an_open_benchmark_for_robot_ai_on/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelbenchmarktraining

ReleasesFresh

HotJobs launched as Sri Lanka’s first AI driven recruitment platform - The Morning

HotJobs launched as Sri Lanka’s first AI driven recruitment platform The Morning

Google News - AI Sri Lanka

1mabout 3 hours ago

Models

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model - WSJ

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model WSJ

GNews AI Llama

1m11 months ago

ProductsLive

Highlights from my conversation about agentic engineering on Lenny's Podcast

I was a guest on Lenny Rachitsky's podcast, in a new episode titled An AI state of the union: We've passed the inflection point, dark factories are coming, and automation timelines . It's available on YouTube , Spotify , and Apple Podcasts . Here are my highlights from our conversation, with relevant links. The November inflection point Software engineers as bellwethers for other information workers Writing code on my phone Responsible vibe coding Dark Factories and StrongDM The bottleneck has moved to testing This stuff is exhausting Interruptions cost a lot less now My ability to estimate software is broken It's tough for people in the middle It's harder to evaluate software The misconception that AI tools are easy Coding agents are useful for security research now OpenClaw Journalists a

Simon Willison Blog

17mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 168 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

[P] PhAIL (phail.ai) – an open benchmark for robot AI on real hardware. Best model: 5% of human throughput, needs help every 4 minutes.

Daily AI Digest

More about

HotJobs launched as Sri Lanka’s first AI driven recruitment platform - The Morning

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model - WSJ

Highlights from my conversation about agentic engineering on Lenny's Podcast

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

LLaMA leak mixed blessing for Facebook AI - techhq.com

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model - WSJ

Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ

AI World Models: What Leaders Should Know - WSJ