
Grand Opening — First 100 customers get 50% OFF annual plans!
Claim Now通过简单的 REST API 将情感化文本转语音集成到你的应用中。支持语音生成、实时流式音频传输和声音克隆管理 — 只需一个 API 密钥即可开始。
Generate speech in one API call
https://voxtral.com/api/ttscurl -X POST https://voxtral.com/api/tts/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input": "Hello world!", "voice_id": "82c99ee6-f932-423f-a4a3-d403c8914b8d"}'Two ways to authenticate API requests
Pass your API key as a Bearer token in the Authorization header.
Authorization: Bearer YOUR_API_KEYGet your key from Dashboard → API Keys
When using the web app, authentication happens automatically via your session cookie. No extra headers needed.
Cookie: better_auth.session=...Handled automatically when signed in to the dashboard.
All available API endpoints
Convert text to speech with voice selection, format control, and speed adjustment. Texts over 1,000 characters are automatically chunked at sentence boundaries.
| Parameter | Type | Required | Description |
|---|---|---|---|
input | string | Yes | Text to convert (max 5,000 chars). Auto-chunked at sentence boundaries for long text. |
voice_id | string | Conditional | Preset voice UUID or saved clone voice ID. Required if ref_audio not provided. |
ref_audio | string | Conditional | Base64-encoded reference audio for instant voice cloning. Required if voice_id not provided. |
response_format | string | No | Output format: "mp3" (default), "wav", "flac", "opus", "pcm" |
speed | number | No | Playback speed 0.25–4.0 (default 1.0) |
{
"code": 0,
"data": {
"audio_data": "UklGRi4AAABXQVZFZm10...",
"format": "mp3",
"generation_id": "f47ac10b-58cc-4372-...",
"audio_url": "https://cdn.voxtral.com/tts/abc123.mp3",
"chunks": 1,
"credits_used": 42
}
}curl -X POST https://voxtral.com/api/tts/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "I am so sorry for your loss. Words cannot express how deeply I feel for you.",
"voice_id": "82c99ee6-f932-423f-a4a3-d403c8914b8d",
"response_format": "mp3",
"speed": 0.9
}'Real-time Server-Sent Events (SSE) streaming for low-latency applications. Returns audio chunks as they are generated — ideal for conversational interfaces and live playback.
| Parameter | Type | Required | Description |
|---|---|---|---|
input | string | Yes | Text to convert (max 5,000 chars). |
voice_id | string | Conditional | Preset voice UUID or saved clone voice ID. |
ref_audio | string | Conditional | Base64-encoded reference audio for cloning. |
response_format | string | No | Output format, default: "pcm". Also supports mp3, wav, flac, opus. |
speed | number | No | Playback speed 0.25–4.0 (default 1.0) |
event: speech.audio.delta
data: {"audio_data": "UklGRi4AAABXQVZFZm10..."}
event: speech.audio.delta
data: {"audio_data": "AAAAAAAAAAAAAAAA..."}
event: speech.audio.done
data: {}curl -N -X POST https://voxtral.com/api/tts/stream \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Hello! This audio streams in real time.",
"voice_id": "82c99ee6-f932-423f-a4a3-d403c8914b8d"
}'Create a persistent voice from reference audio. The voice is stored on Mistral's infrastructure and can be reused across future requests with its voice_id. Maximum 20 saved voices per user.
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Display name for the voice (e.g. "My Podcast Voice"). |
ref_audio | string | Yes | Base64-encoded reference audio sample (WAV preferred, 5–30 seconds). |
language | string | No | Primary language: en, fr, es, pt, it, nl, de, hi, ar |
gender | string | No | Voice gender: "male" or "female" |
{
"code": 0,
"data": {
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "My Podcast Voice",
"mistral_voice_id": "vx_abc123def456",
"language": "en",
"gender": "male",
"created_at": "2025-01-15T10:30:00.000Z"
}
}# First, base64-encode your audio file:
# base64 -i sample.wav > sample.b64
curl -X POST https://voxtral.com/api/tts/voices/clone \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "My Podcast Voice",
"ref_audio": "'$(cat sample.b64)'",
"language": "en",
"gender": "male"
}'Retrieve all saved voice clones for the authenticated user, with pagination support.
| Parameter | Type | Required | Description |
|---|---|---|---|
page | number | No | Page number (default: 1) |
limit | number | No | Results per page (default: 50) |
{
"code": 0,
"data": {
"voices": [
{
"id": "a1b2c3d4-...",
"name": "My Podcast Voice",
"mistral_voice_id": "vx_abc123",
"ref_audio_url": "https://cdn.voxtral.com/...",
"language": "en",
"gender": "male",
"tags": [],
"created_at": "2025-01-15T10:30:00.000Z"
}
],
"total": 3,
"page": 1,
"limit": 50
}
}curl https://voxtral.com/api/tts/voices/mine?page=1&limit=50 \
-H "Authorization: Bearer YOUR_API_KEY"Update metadata for a saved voice clone. Only the fields you provide will be changed.
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | No | New display name |
language | string | No | Update language code |
gender | string | No | Update gender |
tags | string[] | No | Array of tags for organization |
{
"code": 0,
"data": {
"id": "a1b2c3d4-...",
"name": "Updated Voice Name",
"language": "fr",
"gender": "female"
}
}curl -X PATCH https://voxtral.com/api/tts/voices/a1b2c3d4-... \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"name": "Updated Voice Name", "language": "fr"}'Permanently delete a saved voice clone. This removes the voice from both Voxtral and the upstream Mistral provider.
{
"code": 0,
"data": {
"id": "a1b2c3d4-..."
}
}curl -X DELETE https://voxtral.com/api/tts/voices/a1b2c3d4-... \
-H "Authorization: Bearer YOUR_API_KEY"17 built-in voices with emotional variants. Click to copy the voice ID.
Usage limits and credit consumption
1 character = 1 credit. Credits are consumed on successful generation. Failed requests don't consume credits.
enEnglishfrFrenchesSpanishptPortugueseitItaliannlDutchdeGermanhiHindiarArabic