Microsoft Goes Beyond LLMs With New Voice, Image Models
The new AI models signal a stronger push toward Microsoft-developed AI systems.
Microsoft on Thursday unveiled three new AI models, marking an expansion beyond typical large language models to multimodal, in-house capabilities.
The models were introduced under the Microsoft AI (MAI) division.
The release includes MAI-Transcribe-1, a new speech-to-text system, as well as voice generation and image models MAI-Voice-1 and MAI-Image-2. All three are the first models of their kind for Microsoft and are available on Microsoft Foundry and the MAI Playground.
MAI-Transcribe-1 is Microsoft’s first dedicated transcription model, designed to convert audio into text across 25 languages. Potential applications include video captioning, meeting transcriptions and voice-enabled agents.
According to Microsoft, the model can operate at speeds up to 2.5 times faster than its existing Azure Fast transcription model.
MAI-Voice-1, meanwhile, is designed for high-quality speech generation.
The model can generate up to a minute of audio in a single second, with an emphasis on natural, emotional tone and speaker personality.
Related:Microsoft to Invest $5.5 billion in AI in Singapore
The third release, MAI-Image-2, represents the second generation of Microsoft’s in-house image model. The company says it offers at least twice the generation speed of its predecessor while providing more realistic details, such as skin tone, lighting and textures.
The model is targeted for use in the creative industries, and is already being rolled out across Microsoft products, with integrations planned for the Bing search engine and PowerPoint.
Early customers include marketing and communications firm WPP, Microsoft said.
“MAI-Image-2 is a genuine game-changer,” Rob Reilly, global chief creative officer at WPP said in a MAI blog post on the launch. “It’s a platform that not only responds to the intricate nuance of creative direction, but deeply respects the sheer craft involved in generating real-world, campaign-ready images.”
In the post, Microsoft said the updates come as it pursues a more "humanist" AI.
“We have a distinct view when creating our AI models -- putting humans at the center, optimizing for how people actually communicate, training for practical use,” the company said.
The launches also reflect a broader strategic shift as Microsoft looks to diversify its AI portfolio and reduce reliance on external partners such as OpenAI. It is also aiming to strengthen its competitive standing against rivals such as Google and Amazon, both of which have been investing heavily in proprietary AI stacks.
About the Author
Contributing Writer
Scarlett Evans is a freelance writer with a focus on emerging technologies and the minerals industry. Previously, she served as assistant editor at IoT World Today, where she specialized in robotics and smart city technologies. Scarlett also has a background in the mining and resources sector, with experience at Mine Australia, Mine Technology and Power Technology. She joined Informa in April 2022 before transitioning to freelance work.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
model
Seedance 2.0 API: A Technical Guide to AI Video Generation with Pricing and Integration Examples
This post covers the Seedance 2.0 API — a unified AI video generation interface from EvoLink.ai that exposes text-to-video, image-to-video, and reference-to-video capabilities through a single consistent API. The focus here is on technical integration patterns, model selection logic, and cost modeling — the parts that matter when you’re building a real system around this. API Design: Unified Async Task Model The central design of Seedance 2.0 is a unified async task pattern across all generation modes. Rather than separate endpoints with different request and response schemas, every generation request follows the same lifecycle: POST /v1/videos/generations — submit task, receive ID immediately GET /v1/tasks/{id} — poll for status and progress Download video from result URL once status == "
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!