MIT created duplicate AI workers to tackle thousands of different tasks. The verdict? Most of the time AI is still just minimally sufficient

Fortune Techby Tristan BoveApril 3, 20264 min read1 views

AI is improving fast, but it has a long way to go before it can outperform humans in demanding tasks.

The growing share of American office workers who have experimented with artificial intelligence in their day-to-day work have likely had a few moments of doubt as to their long-term job stability.

But for all the improvements in AI over the past few years, the technology is still only able to hit low bars in specific workplace tasks, according to recent data published by MIT. Even then, it might still be making some big mistakes.

Workers concerned they might soon be replaced by AI will likely be reassured by the new research coming out of MIT, which frames the AI-driven jobs takeover narrative not so much as a fast-paced action movie, but more like a slow-burn think piece.

AI is gradually improving at accomplishing a variety of tasks across a number of professions, according to a study of preliminary findings released on Thursday. But in most cases, the performance of currently available models are similar to that of a disenchanted intern—hitting minimum benchmarks but overall struggling to produce quality work without a human hand to refine its output.

Clearing the bar

MIT researchers used 41 different LLMs—including versions of Claude, Gemini, and ChatGPT—to analyze performance on more than 11,000 primarily text-based tasks for various job roles listed by the Labor Department. Their outputs were then scored by humans with actual on-the-job experience in those fields. The goal was to see how often an AI worker replacement could produce an output that a manager would find acceptable without any human edits, and then to evaluate its quality.

The researchers found AI has become more reliable over the years for many types of work, but still falls short whenever the stakes or standards are raised. The MIT study utilized a 1–9 scoring scale to judge AI’s performance, in which a 7 was defined as “minimally sufficient,” meaning the work is useful as is and requires no edits. As of late 2025, AI models scored a 7 in roughly 65% of tasks.

Most importantly for companies considering replacing patches of their workforce with AI, the MIT data suggests AI struggles to perform more complicated tasks. Regardless of how much time an AI model had to complete a task, the probability of success when graded against a 9 or “superior” quality score never exceeded 50%. In other words, when a job requires multiple steps, creativity or precision, AI replacements are more likely to fail than succeed.

The research matches some aspects of corporate America’s current AI adoption narrative. Companies that use AI are more likely to automate routine tasks and roles once left for entry-level positions, while some highly technical skills, particularly digital ones, have actually been associated with wage premiums.

That was reflected in MIT’s data, which found average success rates lower for skilled roles in legal and IT jobs, while AI models generally had an easier time tackling the text-based tasks associated with construction and maintenance professions.

Companies that have experimented with fully automating certain parts of their workload have dealt with teething pains. Last year, Deloitte produced two reports for government clients in Australia and Canada that were both found to be riddled with fabrications. Media outlets including CNET and Sports Illustrated have also been caught using AI to generate inaccurate stories under made-up bylines. Lawyers have also relied on AI to prepare their briefs, with one law firm publicly apologizing last year after it emerged fake AI-generated citations had contributed to a bankruptcy filing in one of its cases.

The anecdotal evidence and MIT’s data suggest AI still requires a human hand to maximize its upside, though the technology is still rapidly improving. The MIT researchers estimated AI’s success rate at the tasks analyzed increased by up to 11 percentage points each year due to more capable models.

By 2029, the authors estimate most AI models will be able to accomplish between 80% and 95% of text-based tasks at the minimally sufficient benchmark.

Whether AI will ever be able to scale toward excellent or even perfect performance remains unknown.

“Widespread automation, particularly in domains with low tolerance for errors, may still be some distance away,” the researchers wrote.

AI might be able to do the bare-minimum work that comes with drafting, emailing, and number-crunching, but it has yet to hit the superior performance territory where humans can still stand out.

Original source

Fortune Tech

https://fortune.com/2026/04/03/mit-finds-ai-mostly-produces-minimally-sufficient-work/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 205 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Countries

CountriesLive

What next for the struggling rural mothers in China who helped to build AI?

Before autonomous driving freed up the hands of Beijing’s middle class, thousands of workers some 1,500km (930 miles) away in China’s southwestern Guizhou province clicked away at computer screens to teach AI about navigating traffic. In the mountainous city of Tongren, where incomes are less than half those in Beijing, the work of data labelling – marking residential buildings, pavements, roadways and traffic lights – shaped the artificial intelligence guiding those vehicles. The job required...