Grand Opening — First 100 customers get 50% OFF annual plans!

API 文档

API REFERENCE

使用 Voxtral API 构建应用

通过简单的 REST API 将情感化文本转语音集成到你的应用中。支持语音生成、实时流式音频传输和声音克隆管理 — 只需一个 API 密钥即可开始。

Quick Start

Generate speech in one API call

Base URLhttps://voxtral.com/api/tts
curl -X POST https://voxtral.com/api/tts/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello world!", "voice_id": "82c99ee6-f932-423f-a4a3-d403c8914b8d"}'

Authentication

Two ways to authenticate API requests

API Key (Recommended)

Pass your API key as a Bearer token in the Authorization header.

Authorization: Bearer YOUR_API_KEY

Get your key from Dashboard → API Keys

Session Cookie

When using the web app, authentication happens automatically via your session cookie. No extra headers needed.

Cookie: better_auth.session=...

Handled automatically when signed in to the dashboard.

Endpoints

All available API endpoints

Convert text to speech with voice selection, format control, and speed adjustment. Texts over 1,000 characters are automatically chunked at sentence boundaries.

Parameters

ParameterTypeRequiredDescription
inputstringYesText to convert (max 5,000 chars). Auto-chunked at sentence boundaries for long text.
voice_idstringConditionalPreset voice UUID or saved clone voice ID. Required if ref_audio not provided.
ref_audiostringConditionalBase64-encoded reference audio for instant voice cloning. Required if voice_id not provided.
response_formatstringNoOutput format: "mp3" (default), "wav", "flac", "opus", "pcm"
speednumberNoPlayback speed 0.25–4.0 (default 1.0)

Response

200 OK
{
  "code": 0,
  "data": {
    "audio_data": "UklGRi4AAABXQVZFZm10...",
    "format": "mp3",
    "generation_id": "f47ac10b-58cc-4372-...",
    "audio_url": "https://cdn.voxtral.com/tts/abc123.mp3",
    "chunks": 1,
    "credits_used": 42
  }
}

Code Example

curl -X POST https://voxtral.com/api/tts/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "I am so sorry for your loss. Words cannot express how deeply I feel for you.",
    "voice_id": "82c99ee6-f932-423f-a4a3-d403c8914b8d",
    "response_format": "mp3",
    "speed": 0.9
  }'

Real-time Server-Sent Events (SSE) streaming for low-latency applications. Returns audio chunks as they are generated — ideal for conversational interfaces and live playback.

Returns text/event-stream with speech.audio.delta (audio chunks) and speech.audio.done events.

Parameters

ParameterTypeRequiredDescription
inputstringYesText to convert (max 5,000 chars).
voice_idstringConditionalPreset voice UUID or saved clone voice ID.
ref_audiostringConditionalBase64-encoded reference audio for cloning.
response_formatstringNoOutput format, default: "pcm". Also supports mp3, wav, flac, opus.
speednumberNoPlayback speed 0.25–4.0 (default 1.0)

Response

200 OK
event: speech.audio.delta
data: {"audio_data": "UklGRi4AAABXQVZFZm10..."}

event: speech.audio.delta
data: {"audio_data": "AAAAAAAAAAAAAAAA..."}

event: speech.audio.done
data: {}

Code Example

curl -N -X POST https://voxtral.com/api/tts/stream \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello! This audio streams in real time.",
    "voice_id": "82c99ee6-f932-423f-a4a3-d403c8914b8d"
  }'

Create a persistent voice from reference audio. The voice is stored on Mistral's infrastructure and can be reused across future requests with its voice_id. Maximum 20 saved voices per user.

Parameters

ParameterTypeRequiredDescription
namestringYesDisplay name for the voice (e.g. "My Podcast Voice").
ref_audiostringYesBase64-encoded reference audio sample (WAV preferred, 5–30 seconds).
languagestringNoPrimary language: en, fr, es, pt, it, nl, de, hi, ar
genderstringNoVoice gender: "male" or "female"

Response

200 OK
{
  "code": 0,
  "data": {
    "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "name": "My Podcast Voice",
    "mistral_voice_id": "vx_abc123def456",
    "language": "en",
    "gender": "male",
    "created_at": "2025-01-15T10:30:00.000Z"
  }
}

Code Example

# First, base64-encode your audio file:
# base64 -i sample.wav > sample.b64

curl -X POST https://voxtral.com/api/tts/voices/clone \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Podcast Voice",
    "ref_audio": "'$(cat sample.b64)'",
    "language": "en",
    "gender": "male"
  }'

Retrieve all saved voice clones for the authenticated user, with pagination support.

Query Parameters

ParameterTypeRequiredDescription
pagenumberNoPage number (default: 1)
limitnumberNoResults per page (default: 50)

Response

200 OK
{
  "code": 0,
  "data": {
    "voices": [
      {
        "id": "a1b2c3d4-...",
        "name": "My Podcast Voice",
        "mistral_voice_id": "vx_abc123",
        "ref_audio_url": "https://cdn.voxtral.com/...",
        "language": "en",
        "gender": "male",
        "tags": [],
        "created_at": "2025-01-15T10:30:00.000Z"
      }
    ],
    "total": 3,
    "page": 1,
    "limit": 50
  }
}

Code Example

curl https://voxtral.com/api/tts/voices/mine?page=1&limit=50 \
  -H "Authorization: Bearer YOUR_API_KEY"

Update metadata for a saved voice clone. Only the fields you provide will be changed.

Parameters

ParameterTypeRequiredDescription
namestringNoNew display name
languagestringNoUpdate language code
genderstringNoUpdate gender
tagsstring[]NoArray of tags for organization

Response

200 OK
{
  "code": 0,
  "data": {
    "id": "a1b2c3d4-...",
    "name": "Updated Voice Name",
    "language": "fr",
    "gender": "female"
  }
}

Code Example

curl -X PATCH https://voxtral.com/api/tts/voices/a1b2c3d4-... \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "Updated Voice Name", "language": "fr"}'

Permanently delete a saved voice clone. This removes the voice from both Voxtral and the upstream Mistral provider.

Response

200 OK
{
  "code": 0,
  "data": {
    "id": "a1b2c3d4-..."
  }
}

Code Example

curl -X DELETE https://voxtral.com/api/tts/voices/a1b2c3d4-... \
  -H "Authorization: Bearer YOUR_API_KEY"

Preset Voices

17 built-in voices with emotional variants. Click to copy the voice ID.

J
JaneEnglish · Female · 9 variants
P
PaulEnglish · Male · 8 variants

Limits & Credits

Usage limits and credit consumption

Credits

1 character = 1 credit. Credits are consumed on successful generation. Failed requests don't consume credits.

Formats

MP3WAVFLACOpusPCM

Limits

  • Max 5,000 chars per request
  • Max 20 saved voices
  • Speed range 0.25x–4.0x
  • 9 languages supported
Supported Languages
enEnglishfrFrenchesSpanishptPortugueseitItaliannlDutchdeGermanhiHindiarArabic