An unofficial MiniMax Speech Synthesis (Text-to-Speech / T2A) SDK for Node.js, JavaScript, and TypeScript. Convert text to natural-sounding speech using MiniMax's TTS API with full streaming, voice cloning, and voice design support.
API Reference | npm | GitHub
ReadableStream<Buffer> for low-latency audioimport) and CommonJS (require)npm install minimax-speech-ts
Requires Node.js >= 18.
import { MiniMaxSpeech } from 'minimax-speech-ts'
const client = new MiniMaxSpeech({
apiKey: process.env.MINIMAX_API_KEY!,
groupId: process.env.MINIMAX_GROUP_ID, // optional
})
// Text to speech
const result = await client.synthesize({
text: 'Hello, world!',
model: 'speech-02-hd',
voiceSetting: { voiceId: 'English_expressive_narrator' },
})
// result.audio is a Buffer containing the audio data
await fs.promises.writeFile('output.mp3', result.audio)
new MiniMaxSpeech({
apiKey: string // Required. MiniMax API key.
groupId?: string // Optional. MiniMax group ID, appended as ?GroupId= query param.
apiHost?: string // Optional. Defaults to 'https://api.minimaxi.chat'.
})
synthesize(request): Promise<SynthesizeResult>Synchronous text-to-speech. Returns decoded audio as a Buffer.
const result = await client.synthesize({
text: 'Hello!',
model: 'speech-02-hd', // optional, defaults to 'speech-02-hd'
voiceSetting: {
voiceId: 'English_expressive_narrator',
speed: 1.0,
vol: 1.0,
pitch: 0,
emotion: 'happy', // speech-02-*/speech-2.6-*/speech-2.8-* only
},
audioSetting: {
format: 'mp3', // 'mp3' | 'pcm' | 'flac' | 'wav'
sampleRate: 32000,
bitrate: 128000,
channel: 1,
},
languageBoost: 'English',
voiceModify: {
pitch: 0, // -100 to 100
intensity: 0, // -100 to 100
timbre: 0, // -100 to 100
soundEffects: 'robotic', // optional
},
timbreWeights: [ // mix multiple voices
{ voiceId: 'voice-1', weight: 0.5 },
{ voiceId: 'voice-2', weight: 0.5 },
],
subtitleEnable: false,
pronunciationDict: { tone: ['处理/(chǔ lǐ)'] },
})
result.audio // Buffer
result.extraInfo // { audioLength, audioSampleRate, audioSize, bitrate, wordCount, usageCharacters, ... }
result.traceId // string
result.subtitleFile // string | undefined
Pass outputFormat: 'url' to receive a URL string instead of a decoded buffer:
const result = await client.synthesize({
text: 'Hello!',
outputFormat: 'url',
})
result.audio // string (URL)
synthesizeStream(request): Promise<ReadableStream<Buffer>>Streaming text-to-speech via SSE. Returns a ReadableStream of audio Buffer chunks.
WAV format is not supported in streaming mode.
const stream = await client.synthesizeStream({
text: 'Hello, streaming world!',
voiceSetting: { voiceId: 'English_expressive_narrator' },
audioSetting: { format: 'mp3' },
streamOptions: { excludeAggregatedAudio: true },
})
const writer = fs.createWriteStream('output.mp3')
for await (const chunk of stream) {
writer.write(chunk)
}
writer.end()
synthesizeAsync(request): Promise<AsyncSynthesizeResult>Async text-to-speech for long-form content. Submit a task then poll for completion.
Provide either text or textFileId (mutually exclusive). WAV format is not supported.
const task = await client.synthesizeAsync({
text: 'A very long article...',
voiceSetting: { voiceId: 'English_expressive_narrator' },
})
task.taskId // string
task.fileId // number
task.taskToken // string
task.usageCharacters // number
querySynthesizeAsync(taskId): Promise<AsyncSynthesizeQueryResult>Poll the status of an async synthesis task.
const status = await client.querySynthesizeAsync(task.taskId)
status.status // 'processing' | 'success' | 'failed' | 'expired'
status.fileId // number (download file ID when status is 'success')
uploadFile(file, purpose): Promise<FileUploadResult>Upload an audio file for voice cloning.
const audioBlob = new Blob([await fs.promises.readFile('voice.mp3')], { type: 'audio/mp3' })
const upload = await client.uploadFile(audioBlob, 'voice_clone')
upload.file.fileId // number
upload.file.bytes // number
upload.file.filename // string
cloneVoice(request): Promise<VoiceCloneResult>Clone a voice from an uploaded audio file.
const result = await client.cloneVoice({
fileId: upload.file.fileId,
voiceId: 'my-custom-voice', // 8-256 chars, must start with a letter
text: 'Preview text', // optional preview
model: 'speech-02-hd', // required if text is provided
needNoiseReduction: true,
needVolumeNormalization: true,
clonePrompt: { // optional prompt-based cloning
promptAudio: promptFileId,
promptText: 'Transcript of the prompt audio',
},
})
result.demoAudio // hex-encoded preview audio (empty if no text provided)
result.inputSensitive // { type: number }
designVoice(request): Promise<VoiceDesignResult>Design a new voice from a text description.
const result = await client.designVoice({
prompt: 'A warm female voice with a slight British accent',
previewText: 'Hello, this is a preview of the designed voice.',
voiceId: 'my-designed-voice', // optional, auto-generated if omitted
})
result.voiceId // string
result.trialAudio // hex-encoded preview audio
getVoices(request): Promise<GetVoiceResult>List available voices.
const voices = await client.getVoices({
voiceType: 'all', // 'system' | 'voice_cloning' | 'voice_generation' | 'all'
})
voices.systemVoice // SystemVoiceInfo[] — built-in voices
voices.voiceCloning // VoiceCloningInfo[] — your cloned voices
voices.voiceGeneration // VoiceGenerationInfo[] — your designed voices
deleteVoice(request): Promise<DeleteVoiceResult>Delete a cloned or designed voice.
const result = await client.deleteVoice({
voiceType: 'voice_cloning', // 'voice_cloning' | 'voice_generation'
voiceId: 'my-custom-voice',
})
The library provides a typed error hierarchy:
import {
MiniMaxClientError, // Client-side validation (bad params, before request is sent)
MiniMaxError, // Base class for all API errors
MiniMaxAuthError, // Authentication failures (codes 1004, 2049)
MiniMaxRateLimitError, // Rate limiting (codes 1002, 1039, 1041, 2045)
MiniMaxValidationError, // Server-side validation (codes 2013, 1042, 2037, 2039, 2048, 20132)
} from 'minimax-speech-ts'
try {
await client.synthesize({ text: 'Hello' })
} catch (e) {
if (e instanceof MiniMaxClientError) {
// Bad parameters — fix your request
console.error(e.message)
} else if (e instanceof MiniMaxAuthError) {
// Invalid API key
} else if (e instanceof MiniMaxRateLimitError) {
// Back off and retry
} else if (e instanceof MiniMaxValidationError) {
// Server rejected the request parameters
console.error(e.statusCode, e.statusMsg, e.traceId)
} else if (e instanceof MiniMaxError) {
// Other API error
console.error(e.statusCode, e.statusMsg)
}
}
Client-side validation catches common mistakes before making a request:
text, voiceId, etc.)speech-01-* doesn't support emotions)fluent/whisper emotions with non-speech-2.6-* modelstext and textFileId both provided (mutually exclusive)text provided without model in voice cloning| Model | Emotions | Notes |
|---|---|---|
speech-2.8-hd |
All except fluent, whisper | Latest HD |
speech-2.8-turbo |
All except fluent, whisper | Latest Turbo |
speech-2.6-hd |
All including fluent, whisper | |
speech-2.6-turbo |
All including fluent, whisper | |
speech-02-hd |
All except fluent, whisper | Default |
speech-02-turbo |
All except fluent, whisper | |
speech-01-hd |
None | |
speech-01-turbo |
None | |
speech-01 |
None | Legacy |
fetch and ReadableStream)MIT