Show HN: SpeechSDK – free, open-source SDK that unifies all AI voice models

Hacker News AI Topby PiersonMarksApril 3, 20261 min read0 views

Article URL: https://www.speechsdk.dev/ Comments URL: https://news.ycombinator.com/item?id=47633441 Points: 3 # Comments: 0

The Unified Text-to-Speech SDK

The SpeechSDK is a free, open-source toolkit for building AI audio applications with multiple voice providers.

Providers

25+

Models

Built

For Production

Open Source

MIT License

Multi-Provider

One interface across OpenAI, ElevenLabs, Deepgram, Cartesia, Google, Mistral, Hume, and more. Unified model strings, consistent response format, BYO API keys.

Cross-Platform

Runs everywhere — Node.js, Edge runtimes, and the browser. Same API, zero platform-specific code.

Node.jsEdgeBrowser

Minimal Dependencies

Lightweight by design. Built-in retries, typed errors, and lazy base64 encoding. No heavy frameworks.

AI Engineering

For Production Voice Applications

Lazy base64 conversion

Only computes the format you access — uint8Array or base64 — and caches it. No unnecessary encoding or wasted memory.

Content-type awareness

The mediaType is read directly from each provider's response headers. You always know the actual audio format — MP3 from OpenAI, WAV from Cartesia, etc.

Custom fetch & Base URL

Every provider accepts a custom fetch and baseURL. Point at OpenAI-compatible proxies, Azure OpenAI, LiteLLM, or local models. Swap in undici, a proxy-aware fetch, or a mock.

Smart retries

Built-in retry with exponential backoff via p-retry. Retries 5xx and network errors automatically. 4xx errors (auth failures, bad requests) abort immediately — no wasted time.

Zero runtime dependencies

Only dependency is p-retry. The SDK uses raw fetch and Uint8Array — no heavy audio libraries, no provider SDK wrappers. Works anywhere fetch works.

Works seamlessly with Speech Gateway

Speech Gateway adds production infrastructure — queuing, quality processing, voice management, and analytics. One config change to connect. Coming Soon.

ProviderModel StringDefaultOpenAIopenai/gpt-4o-mini-ttsYesOpenAIopenai/tts-1—OpenAIopenai/tts-1-hd—ElevenLabselevenlabs/eleven_multilingual_v2YesElevenLabselevenlabs/eleven_v3—ElevenLabselevenlabs/eleven_flash_v2_5—ElevenLabselevenlabs/eleven_flash_v2—Deepgramdeepgram/aura-2YesCartesiacartesia/sonic-3YesHumehume/octave-2YesGooglegoogle/gemini-2.5-flash-preview-ttsYesGooglegoogle/gemini-2.5-pro-preview-tts—Fish Audiofish-audio/s2-proYesUnreal Speechunreal-speech/defaultYesMurfmurf/GEN2YesResembleresemble/defaultYesfalfal-ai/—Mistralmistral/voxtral-mini-tts-2603Yes

Pass just the provider name to use its default model — e.g. model: 'openai' resolves to openai/gpt-4o-mini-tts.

Frequently asked questions

Each provider has its own SDK, request format, auth pattern, and response shape. SpeechSDK gives you one interface for all of them — same function call, same result type, same error handling. Switch providers by simply changing a model string.

One SDK, every provider. Add text-to-speech to your app in minutes with a unified, open-source interface.

Original source

Hacker News AI Top

https://www.speechsdk.dev/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelopen-source

ModelsLive

Does GPT-2 Have a Fear Direction?

Anthropic dropped a paper this morning showing that Claude Sonnet 4.5 has steerable emotion representations. Actual directions in activation space that, when injected, shift the model's behavior in predictable ways. They found a non-monotonic anger flip: push the steering vector hard enough and the model will flip to something qualitatively different than anger. The paper only covered their very large, heavily instruction tuned model. This paper is a write-up on the same same experiment at a tiny scale. The Setup: I generated 40 situational prompt pairs to extract a fewer direction via difference-in-means. No emotional words for the prompts and the contrast is entirely situational. Ex: standing at the edge of a rooftop versus standing at the edge of a meadow, alone in a parking garage at m

lesswrong.com

6m15 minutes ago

ModelsFresh

AI model boosts accuracy and reliability in predicting biochar production - EurekAlert!

AI model boosts accuracy and reliability in predicting biochar production EurekAlert!

Google News: AI

1mabout 2 hours ago

Open Source AILive

We Ditched LangChain. Here’s What We Built Instead — and Why It’s Better for Serious AI Research.

How two lean open-source frameworks outperform the incumbents when you need typed skill contracts, concurrent scientific tool execution… Continue reading on Medium »

Medium AI

1m32 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 129 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

Does GPT-2 Have a Fear Direction?

lesswrong.com

6m15 minutes ago

ModelsLive

Google DeepMind s Research Lets an LLM Rewrite Its Own Game Theory Algorithms — And It Outperformed the Experts

Designing algorithms for Multi-Agent Reinforcement Learning (MARL) in imperfect-information games — scenarios where players act sequentially and cannot see each other s private information, like poker — has historically relied on manual iteration. Researchers identify weighting schemes, discounting rules, and equilibrium solvers through intuition and trial-and-error. Google DeepMind researchers proposes AlphaEvolve, an LLM-powered evolutionary coding agent [ ] The post Google DeepMind s Research Lets an LLM Rewrite Its Own Game Theory Algorithms — And It Outperformed the Experts appeared first on MarkTechPost .

MarkTechPost

8mabout 1 hour ago

ModelsLive

Meta Halts Mercor Partnership After AI Training Data Breach - The Tech Buzz

Meta Halts Mercor Partnership After AI Training Data Breach The Tech Buzz

GNews AI AGI

1mabout 1 hour ago

ModelsLive

Most Students Think ChatGPT Helps Them Study — Here’s Why It Actually Slows Them Down (And How to…

Continue reading on Medium »

Medium AI

1m44 minutes ago