Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessCoreWeave Stock Analysis: Buy or Sell This Nvidia-Backed AI Stock? - The Motley FoolGNews AI NVIDIAIntel Arc B70 Benchmarks/Comparison to Nvidia RTX 4070 SuperReddit r/LocalLLaMABanning All Anthropic EmployeesHacker NewsMicrosoft is automatically updating Windows 11 24H2 to 25H2 using machine learning - TweakTownGoogle News: Machine Learning80 Years to an Overnight Success: The Real History of Artificial Intelligence - Futurist SpeakerGoogle News: AIWhat next for the struggling rural mothers in China who helped to build AI?SCMP Tech (Asia AI)Apple reportedly signed a 3rd-party driver, by Tiny Corp, for AMD or Nvidia eGPUs for Apple Silicon Macs; it s meant for AI research, not accelerating graphics (AppleInsider)TechmemeBest Resume Builders in 2026: I Applied to 50 Jobs to Test TheseDEV CommunityTruth Technology and the Architecture of Digital TrustDEV CommunityI Switched From GitKraken to This Indie Git Client and I’m Not Going BackDEV CommunityWhy I Run 22 Docker Services at HomeDEV CommunityHow to Embed ChatGPT in Your Website: 5 Methods Compared [2026 Guide]DEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessCoreWeave Stock Analysis: Buy or Sell This Nvidia-Backed AI Stock? - The Motley FoolGNews AI NVIDIAIntel Arc B70 Benchmarks/Comparison to Nvidia RTX 4070 SuperReddit r/LocalLLaMABanning All Anthropic EmployeesHacker NewsMicrosoft is automatically updating Windows 11 24H2 to 25H2 using machine learning - TweakTownGoogle News: Machine Learning80 Years to an Overnight Success: The Real History of Artificial Intelligence - Futurist SpeakerGoogle News: AIWhat next for the struggling rural mothers in China who helped to build AI?SCMP Tech (Asia AI)Apple reportedly signed a 3rd-party driver, by Tiny Corp, for AMD or Nvidia eGPUs for Apple Silicon Macs; it s meant for AI research, not accelerating graphics (AppleInsider)TechmemeBest Resume Builders in 2026: I Applied to 50 Jobs to Test TheseDEV CommunityTruth Technology and the Architecture of Digital TrustDEV CommunityI Switched From GitKraken to This Indie Git Client and I’m Not Going BackDEV CommunityWhy I Run 22 Docker Services at HomeDEV CommunityHow to Embed ChatGPT in Your Website: 5 Methods Compared [2026 Guide]DEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

MIT created duplicate AI workers to tackle thousands of different tasks. The verdict? Most of the time AI is still just minimally sufficient

Fortune Techby Tristan BoveApril 3, 20264 min read1 views
Source Quiz

AI is improving fast, but it has a long way to go before it can outperform humans in demanding tasks.

The growing share of American office workers who have experimented with artificial intelligence in their day-to-day work have likely had a few moments of doubt as to their long-term job stability.

But for all the improvements in AI over the past few years, the technology is still only able to hit low bars in specific workplace tasks, according to recent data published by MIT. Even then, it might still be making some big mistakes.

Workers concerned they might soon be replaced by AI will likely be reassured by the new research coming out of MIT, which frames the AI-driven jobs takeover narrative not so much as a fast-paced action movie, but more like a slow-burn think piece.

AI is gradually improving at accomplishing a variety of tasks across a number of professions, according to a study of preliminary findings released on Thursday. But in most cases, the performance of currently available models are similar to that of a disenchanted intern—hitting minimum benchmarks but overall struggling to produce quality work without a human hand to refine its output.

Clearing the bar

MIT researchers used 41 different LLMs—including versions of Claude, Gemini, and ChatGPT—to analyze performance on more than 11,000 primarily text-based tasks for various job roles listed by the Labor Department. Their outputs were then scored by humans with actual on-the-job experience in those fields. The goal was to see how often an AI worker replacement could produce an output that a manager would find acceptable without any human edits, and then to evaluate its quality.

The researchers found AI has become more reliable over the years for many types of work, but still falls short whenever the stakes or standards are raised. The MIT study utilized a 1–9 scoring scale to judge AI’s performance, in which a 7 was defined as “minimally sufficient,” meaning the work is useful as is and requires no edits. As of late 2025, AI models scored a 7 in roughly 65% of tasks.

Most importantly for companies considering replacing patches of their workforce with AI, the MIT data suggests AI struggles to perform more complicated tasks. Regardless of how much time an AI model had to complete a task, the probability of success when graded against a 9 or “superior” quality score never exceeded 50%. In other words, when a job requires multiple steps, creativity or precision, AI replacements are more likely to fail than succeed.

The research matches some aspects of corporate America’s current AI adoption narrative. Companies that use AI are more likely to automate routine tasks and roles once left for entry-level positions, while some highly technical skills, particularly digital ones, have actually been associated with wage premiums.

That was reflected in MIT’s data, which found average success rates lower for skilled roles in legal and IT jobs, while AI models generally had an easier time tackling the text-based tasks associated with construction and maintenance professions.

Companies that have experimented with fully automating certain parts of their workload have dealt with teething pains. Last year, Deloitte produced two reports for government clients in Australia and Canada that were both found to be riddled with fabrications. Media outlets including CNET and Sports Illustrated have also been caught using AI to generate inaccurate stories under made-up bylines. Lawyers have also relied on AI to prepare their briefs, with one law firm publicly apologizing last year after it emerged fake AI-generated citations had contributed to a bankruptcy filing in one of its cases.

The anecdotal evidence and MIT’s data suggest AI still requires a human hand to maximize its upside, though the technology is still rapidly improving. The MIT researchers estimated AI’s success rate at the tasks analyzed increased by up to 11 percentage points each year due to more capable models.

By 2029, the authors estimate most AI models will be able to accomplish between 80% and 95% of text-based tasks at the minimally sufficient benchmark.

Whether AI will ever be able to scale toward excellent or even perfect performance remains unknown.

“Widespread automation, particularly in domains with low tolerance for errors, may still be some distance away,” the researchers wrote.

AI might be able to do the bare-minimum work that comes with drafting, emailing, and number-crunching, but it has yet to hit the superior performance territory where humans can still stand out.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
MIT created…Fortune Tech

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 205 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!