Microsoft Goes Beyond LLMs With New Voice, Image Models

AI Businessby Scarlett EvansApril 2, 20263 min read1 views

The new AI models signal a stronger push toward Microsoft-developed AI systems.

Microsoft on Thursday unveiled three new AI models, marking an expansion beyond typical large language models to multimodal, in-house capabilities.

The models were introduced under the Microsoft AI (MAI) division.

The release includes MAI-Transcribe-1, a new speech-to-text system, as well as voice generation and image models MAI-Voice-1 and MAI-Image-2. All three are the first models of their kind for Microsoft and are available on Microsoft Foundry and the MAI Playground.

MAI-Transcribe-1 is Microsoft’s first dedicated transcription model, designed to convert audio into text across 25 languages. Potential applications include video captioning, meeting transcriptions and voice-enabled agents.

According to Microsoft, the model can operate at speeds up to 2.5 times faster than its existing Azure Fast transcription model.

MAI-Voice-1, meanwhile, is designed for high-quality speech generation.

The model can generate up to a minute of audio in a single second, with an emphasis on natural, emotional tone and speaker personality.

Related:Microsoft to Invest $5.5 billion in AI in Singapore

The third release, MAI-Image-2, represents the second generation of Microsoft’s in-house image model. The company says it offers at least twice the generation speed of its predecessor while providing more realistic details, such as skin tone, lighting and textures.

The model is targeted for use in the creative industries, and is already being rolled out across Microsoft products, with integrations planned for the Bing search engine and PowerPoint.

Early customers include marketing and communications firm WPP, Microsoft said.

“MAI-Image-2 is a genuine game-changer,” Rob Reilly, global chief creative officer at WPP said in a MAI blog post on the launch. “It’s a platform that not only responds to the intricate nuance of creative direction, but deeply respects the sheer craft involved in generating real-world, campaign-ready images.”

In the post, Microsoft said the updates come as it pursues a more "humanist" AI.

“We have a distinct view when creating our AI models -- putting humans at the center, optimizing for how people actually communicate, training for practical use,” the company said.

The launches also reflect a broader strategic shift as Microsoft looks to diversify its AI portfolio and reduce reliance on external partners such as OpenAI. It is also aiming to strengthen its competitive standing against rivals such as Google and Amazon, both of which have been investing heavily in proprietary AI stacks.

About the Author

Contributing Writer

Scarlett Evans is a freelance writer with a focus on emerging technologies and the minerals industry. Previously, she served as assistant editor at IoT World Today, where she specialized in robotics and smart city technologies. Scarlett also has a background in the mining and resources sector, with experience at Mine Australia, Mine Technology and Power Technology. She joined Informa in April 2022 before transitioning to freelance work.

Original source

AI Business

https://aibusiness.com/generative-ai/microsoft-goes-beyond-llms-new-voice-image-models

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

model

Models

Meta Breaks 8-Year Transformer Rule, Rewrites AI's Fundamental Rules, Model Shows Subconsciousness for First Time - 36 Kr

Meta Breaks 8-Year Transformer Rule, Rewrites AI's Fundamental Rules, Model Shows Subconsciousness for First Time 36 Kr

GNews AI transformer

1m5 months ago

Models

SeismoQuakeGNN: a hybrid framework for spatio-temporal earthquake prediction with transformer-enhanced models - Frontiers

SeismoQuakeGNN: a hybrid framework for spatio-temporal earthquake prediction with transformer-enhanced models Frontiers

GNews AI transformer

1m4 months ago

ProductsRecent

Seedance 2.0 API: A Technical Guide to AI Video Generation with Pricing and Integration Examples

This post covers the Seedance 2.0 API — a unified AI video generation interface from EvoLink.ai that exposes text-to-video, image-to-video, and reference-to-video capabilities through a single consistent API. The focus here is on technical integration patterns, model selection logic, and cost modeling — the parts that matter when you’re building a real system around this. API Design: Unified Async Task Model The central design of Seedance 2.0 is a unified async task pattern across all generation modes. Rather than separate endpoints with different request and response schemas, every generation request follows the same lifecycle: POST /v1/videos/generations — submit task, receive ID immediately GET /v1/tasks/{id} — poll for status and progress Download video from result URL once status == "

discuss.huggingface.co

6m1 day ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 128 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

Microsoft Goes Beyond LLMs With New Voice, Image Models

About the Author

Daily AI Digest

More about

Meta Breaks 8-Year Transformer Rule, Rewrites AI's Fundamental Rules, Model Shows Subconsciousness for First Time - 36 Kr

SeismoQuakeGNN: a hybrid framework for spatio-temporal earthquake prediction with transformer-enhanced models - Frontiers

Seedance 2.0 API: A Technical Guide to AI Video Generation with Pricing and Integration Examples

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

Meta Breaks 8-Year Transformer Rule, Rewrites AI's Fundamental Rules, Model Shows Subconsciousness for First Time - 36 Kr

SeismoQuakeGNN: a hybrid framework for spatio-temporal earthquake prediction with transformer-enhanced models - Frontiers

How is a transformer used in neural networks? - EE World Online

Transformer Paper Authors at AI Startup Debut Open Source Model - Bloomberg.com