Run MiniMax Speech-02 models with an API
MiniMax's Speech-02 models give you high-quality text-to-speech with voice cloning, emotional expression, and multilingual support.
The Speech-02 series from MiniMax are text-to-speech models that let you create natural-sounding voices with emotional expression. The models have support for more than 30 languages.
According to the Artificial Analysis Speech Arena, Speech-02-HD is the best text-to-speech model available today, while Speech-02-Turbo comes in third.
With Replicate, you can run these models with one line of code.
Listen to MiniMax Speech-02
Here’s a sample of the Speech-02-HD model reading an adapted version of this blog post, and the prediction that generated it.
Listen to this blog post
MiniMax Speech-02 models are the best text-to-speech models available today.
Try MiniMax Speech-02
You can choose between two models: Speech-02-HD for high-quality voiceovers and audiobooks, and Speech-02-Turbo, a cheaper model that’s faster and best suited for real-time applications.
Both models can be used with a cloned voice. Voice cloning needs at least 10 seconds of audio and takes about 30 seconds to train. Each voice can be adjusted for pitch, speed, and volume to make it sound natural.
Try the models in our playground:
-
Speech-02-HD - For high-quality voiceovers and audiobooks
-
Speech-02-Turbo - For real-time applications
-
Voice Cloning - For creating custom voices
What you can build
These models can help you create:
-
Virtual assistants that sound natural
-
Audiobooks and voiceovers with studio-quality sound
-
Language learning tools with native pronunciation
-
Customer service bots that speak multiple languages
-
Content that’s accessible to people who prefer audio
Emotion control
MiniMax’s emotion control system has two ways to add feeling to voices. The auto-detect mode figures out the emotional tone from your text, while manual controls let you set the exact emotion you want. This helps your voices sound natural and engaging, whether you’re making content for entertainment, education, or business.
Language support
The models work with more than 30 languages and accents. You can use different English variants (US, UK, Australian, and Indian), Asian languages (Mandarin, Cantonese, Japanese, Korean, Vietnamese, and Indonesian), and European languages (French, German, Spanish, Portuguese, Turkish, Russian, and Ukrainian).
Voice cloning and text-to-speech with JavaScript
You can run the models with our JavaScript client. First, install the Node.js client library:
Set your API token as an environment variable:
(You can get an API token from your account. Keep it private.)
Import and set up the client:
First, clone a voice. You’ll need an audio file in MP3, M4A, or WAV format. The file should be between 10 seconds and 5 minutes long and less than 20MB in size:
Now use the cloned voice for text-to-speech. You can add pauses between words using <#x#> where x is the pause duration in seconds (0.01-99.99):
Voice cloning and text-to-speech with Python
You can run the models with our Python client. First, install the client and set your API token:
Here’s how to clone a voice and use it for text-to-speech:
Pricing
The text-to-speech models charge based on input and output tokens. The turbo model costs $30 per million characters, while the HD model costs $50 per million characters. One token is a single character.
Voice cloning costs $3 per voice.
Keep up to speed
Connect with our community by following us on X and joining our Discord for updates and discussions.
Happy hacking! 🎙️
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelWith $10mn raise, Gnani aims to scale voice AI models, tap global markets - The Times of India
<a href="https://news.google.com/rss/articles/CBMi3gFBVV95cUxOVFhtYkxNTmc0TUR5bU1SakU3Z0dRUXJYS2J0UjdXVl9GVHNjdjFDS2VtR0pNdGVxeWY1cnFyeU9DazM3OExzMkpId28wOVg4dFdwa01EMVJuT3cwV25iSnVUQXhtQlJjMmdyWmdHUWxWdHlTcXhhSXJ6bF9tWmFMUU9EQ1MtZlQzWkRldGctQXNiRG4tdHh5blNUU3dMUWE2V1htVEdFb1B6ci1UVldKQ043bndWQkVyejVhWnVHejgwaHo1V3hPTXVCTTZaN1lXV2RWenE4bERUOHJaSFHSAeMBQVVfeXFMUEVyS3d4cmRfbXRZSHNEREtCaHpfMnEzQnd0N216RUxKbXdwZllwWTFRWTlVX0U2WnFtam5KRmZ3TUNCcEVCWFRiRTNDcEEtR05SRlNvTW5fQzVhZHJLc09NRnRRRDBMVnlHaERxeUs0YWk3RVdXZzJrMzdVc0pNZ2NpeGlSekhYdTYxaHJuRG10UzZxVmUzbnFwX3g1RHpkdWJ6eFVxX256X1c0Xzl2eXBWd0pkZ0dQTnZZLXNMTVV6MlhnVml2U25kbWJLRFNRb2pXTGZPTW1vQlFoOTdWUV9QdHc?oc=5" target="_blank">With $10mn raise, Gnani aims to scale voice AI models, tap global markets</a> <font color="#6f6f6f">The Times of India</
AI models at IBM and DeepMind are pushing DNA toward a GPT era - IBM
<a href="https://news.google.com/rss/articles/CBMihwFBVV95cUxPeVpYeHFXbm5WclI4bkJJdHVKOFY3Q1BlempyX2lDa2pRV01YZFFpX3Y2RjNNT19ibjV1U01FWmNJQ1h4djRIVkpuVGJDSzZLVGlVYjFGeF83MmpRNGZwdzM1eFZfOXFTUlFPdVFXTVZ3dXJtYXluUUxPS19seHlncWhzZERQQzA?oc=5" target="_blank">AI models at IBM and DeepMind are pushing DNA toward a GPT era</a> <font color="#6f6f6f">IBM</font>
How to Add Structured Logging to Node.js APIs with Pino 9 + OpenTelemetry (2026 Guide)
<p>Logging is the first thing you reach for when something breaks in production. Yet most Node.js APIs still write plain-text <code>console.log</code> statements that are useless in a distributed system. In 2026, <strong>structured JSON logging correlated with distributed traces</strong> is the baseline for any serious API. This guide shows you exactly how to wire up Pino 9 + OpenTelemetry so that every log line carries a <code>traceId</code> and <code>spanId</code>, making root-cause analysis a matter of seconds rather than hours.</p> <h2> Why <code>console.log</code> Kills You at Scale </h2> <p>Before diving in, let's be concrete about the problem. A log like this:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>[2026-04-01T08:00:12.345Z] ERROR:
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
AI models at IBM and DeepMind are pushing DNA toward a GPT era - IBM
<a href="https://news.google.com/rss/articles/CBMihwFBVV95cUxPeVpYeHFXbm5WclI4bkJJdHVKOFY3Q1BlempyX2lDa2pRV01YZFFpX3Y2RjNNT19ibjV1U01FWmNJQ1h4djRIVkpuVGJDSzZLVGlVYjFGeF83MmpRNGZwdzM1eFZfOXFTUlFPdVFXTVZ3dXJtYXluUUxPS19seHlncWhzZERQQzA?oc=5" target="_blank">AI models at IBM and DeepMind are pushing DNA toward a GPT era</a> <font color="#6f6f6f">IBM</font>
Introducing AI Runtime: Scalable, Serverless NVIDIA GPUs on Databricks for Training and Finetuning - Databricks
<a href="https://news.google.com/rss/articles/CBMivgFBVV95cUxOU0Q3Q0VWMm9VU1dUMG9nTTg0S3ExZ0d4MkJjbUlfNnU2WEpRTkRnUmNyQWNVOEdSbGRyZ3dJZmN0c1hTYXlma0xlMDVleXRvQTlYTU01TkRZd2pXZGVZOEpQVnEyWG9sRGljQXZWc0MtT0NGTkZBSHU3Mlh4bW1FZlo0MHBjZ1Q5NlAtdkkzSFg4V3lSVEw4dVFxbVkzenFlM2sxNW0xQzRtSllFYTZDSVh2VjBpRmtuX2M3bzJn?oc=5" target="_blank">Introducing AI Runtime: Scalable, Serverless NVIDIA GPUs on Databricks for Training and Finetuning</a> <font color="#6f6f6f">Databricks</font>

From Kindergarten to Career Change: How CMU Designs Education for a Lifetime
<p> <img loading="lazy" src="https://www.cmu.edu/news/sites/default/files/styles/listings_desktop_1x_/public/2026-01/250516B_Surprise_EM_053.jpg.webp?itok=Ipq3jUzk" width="900" height="508" alt="Sharon Carver with students"> </p> CMU’s learning initiatives are shaped by research on how people learn, rather than by any single discipline. That approach shows up in K–12 classrooms, college courses, and workforce training programs, where learning science and AI are used to support evolving educational needs.
Build an End-to-End RAG Pipeline for LLM Applications
<p><em>This article was originally written by Shaoni Mukherjee (Technical Writer)</em></p> <p><a href="https://www.digitalocean.com/resources/articles/large-language-models" rel="noopener noreferrer">Large language models</a> have transformed the way we build intelligent applications. <a href="https://www.digitalocean.com/products/gradient/platform" rel="noopener noreferrer">Generative AI Models</a> can summarize documents, generate code, and answer complex questions. However, they still face a major limitation: they cannot access private or continuously changing knowledge unless that information is incorporated into their training data.</p> <p>Retrieval-Augmented Generation (RAG) addresses this limitation by combining information retrieval systems with generative AI models. Instead of rel

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!