Grand Opening — First 100 customers get 50% OFF annual plans!

API REFERENCE

使用 Voxtral API 构建应用

通过简单的 REST API 将情感化文本转语音集成到你的应用中。支持语音生成、实时流式音频传输和声音克隆管理 — 只需一个 API 密钥即可开始。

Quick Start

Generate speech in one API call

Base URLhttps://voxtral.com/api/tts

curl -X POST https://voxtral.com/api/tts/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello world!", "voice_id": "82c99ee6-f932-423f-a4a3-d403c8914b8d"}'

Authentication

Two ways to authenticate API requests

API Key (Recommended)

Pass your API key as a Bearer token in the Authorization header.

Authorization: Bearer YOUR_API_KEY

Get your key from Dashboard → API Keys

Session Cookie

When using the web app, authentication happens automatically via your session cookie. No extra headers needed.

Cookie: better_auth.session=...

Handled automatically when signed in to the dashboard.

Endpoints

All available API endpoints

Convert text to speech with voice selection, format control, and speed adjustment. Texts over 1,000 characters are automatically chunked at sentence boundaries.

Parameters

Parameter	Type	Required	Description
`input`	string	Yes	Text to convert (max 5,000 chars). Auto-chunked at sentence boundaries for long text.
`voice_id`	string	Conditional	Preset voice UUID or saved clone voice ID. Required if ref_audio not provided.
`ref_audio`	string	Conditional	Base64-encoded reference audio for instant voice cloning. Required if voice_id not provided.
`response_format`	string	No	Output format: "mp3" (default), "wav", "flac", "opus", "pcm"
`speed`	number	No	Playback speed 0.25–4.0 (default 1.0)

Response

200 OK

{
  "code": 0,
  "data": {
    "audio_data": "UklGRi4AAABXQVZFZm10...",
    "format": "mp3",
    "generation_id": "f47ac10b-58cc-4372-...",
    "audio_url": "https://cdn.voxtral.com/tts/abc123.mp3",
    "chunks": 1,
    "credits_used": 42
  }
}

Code Example

curl -X POST https://voxtral.com/api/tts/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "I am so sorry for your loss. Words cannot express how deeply I feel for you.",
    "voice_id": "82c99ee6-f932-423f-a4a3-d403c8914b8d",
    "response_format": "mp3",
    "speed": 0.9
  }'

Real-time Server-Sent Events (SSE) streaming for low-latency applications. Returns audio chunks as they are generated — ideal for conversational interfaces and live playback.

Returns text/event-stream with speech.audio.delta (audio chunks) and speech.audio.done events.

Parameters

Parameter	Type	Required	Description
`input`	string	Yes	Text to convert (max 5,000 chars).
`voice_id`	string	Conditional	Preset voice UUID or saved clone voice ID.
`ref_audio`	string	Conditional	Base64-encoded reference audio for cloning.
`response_format`	string	No	Output format, default: "pcm". Also supports mp3, wav, flac, opus.
`speed`	number	No	Playback speed 0.25–4.0 (default 1.0)

Response

200 OK

event: speech.audio.delta
data: {"audio_data": "UklGRi4AAABXQVZFZm10..."}

event: speech.audio.delta
data: {"audio_data": "AAAAAAAAAAAAAAAA..."}

event: speech.audio.done
data: {}

Code Example

curl -N -X POST https://voxtral.com/api/tts/stream \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello! This audio streams in real time.",
    "voice_id": "82c99ee6-f932-423f-a4a3-d403c8914b8d"
  }'

Create a persistent voice from reference audio. The voice is stored on Mistral's infrastructure and can be reused across future requests with its voice_id. Maximum 20 saved voices per user.

Parameters

Parameter	Type	Required	Description
`name`	string	Yes	Display name for the voice (e.g. "My Podcast Voice").
`ref_audio`	string	Yes	Base64-encoded reference audio sample (WAV preferred, 5–30 seconds).
`language`	string	No	Primary language: en, fr, es, pt, it, nl, de, hi, ar
`gender`	string	No	Voice gender: "male" or "female"

Response

200 OK

{
  "code": 0,
  "data": {
    "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "name": "My Podcast Voice",
    "mistral_voice_id": "vx_abc123def456",
    "language": "en",
    "gender": "male",
    "created_at": "2025-01-15T10:30:00.000Z"
  }
}

Code Example

# First, base64-encode your audio file:
# base64 -i sample.wav > sample.b64

curl -X POST https://voxtral.com/api/tts/voices/clone \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Podcast Voice",
    "ref_audio": "'$(cat sample.b64)'",
    "language": "en",
    "gender": "male"
  }'

Retrieve all saved voice clones for the authenticated user, with pagination support.

Query Parameters

Parameter	Type	Required	Description
`page`	number	No	Page number (default: 1)
`limit`	number	No	Results per page (default: 50)

Response

200 OK

{
  "code": 0,
  "data": {
    "voices": [
      {
        "id": "a1b2c3d4-...",
        "name": "My Podcast Voice",
        "mistral_voice_id": "vx_abc123",
        "ref_audio_url": "https://cdn.voxtral.com/...",
        "language": "en",
        "gender": "male",
        "tags": [],
        "created_at": "2025-01-15T10:30:00.000Z"
      }
    ],
    "total": 3,
    "page": 1,
    "limit": 50
  }
}

Code Example

curl https://voxtral.com/api/tts/voices/mine?page=1&limit=50 \
  -H "Authorization: Bearer YOUR_API_KEY"

Update metadata for a saved voice clone. Only the fields you provide will be changed.

Parameters

Parameter	Type	Required	Description
`name`	string	No	New display name
`language`	string	No	Update language code
`gender`	string	No	Update gender
`tags`	string[]	No	Array of tags for organization

Response

200 OK

{
  "code": 0,
  "data": {
    "id": "a1b2c3d4-...",
    "name": "Updated Voice Name",
    "language": "fr",
    "gender": "female"
  }
}

Code Example

curl -X PATCH https://voxtral.com/api/tts/voices/a1b2c3d4-... \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "Updated Voice Name", "language": "fr"}'

Permanently delete a saved voice clone. This removes the voice from both Voxtral and the upstream Mistral provider.

Response

200 OK

{
  "code": 0,
  "data": {
    "id": "a1b2c3d4-..."
  }
}

Code Example

curl -X DELETE https://voxtral.com/api/tts/voices/a1b2c3d4-... \
  -H "Authorization: Bearer YOUR_API_KEY"

Preset Voices

17 built-in voices with emotional variants. Click to copy the voice ID.

JaneEnglish · Female · 9 variants

PaulEnglish · Male · 8 variants

Limits & Credits

Usage limits and credit consumption

Credits

1 character = 1 credit. Credits are consumed on successful generation. Failed requests don't consume credits.

Formats

MP3WAVFLACOpusPCM

Limits

•Max 5,000 chars per request
•Max 20 saved voices
•Speed range 0.25x–4.0x
•9 languages supported

Supported Languages

enEnglishfrFrenchesSpanishptPortugueseitItaliannlDutchdeGermanhiHindiarArabic

API 文档

使用 Voxtral API 构建应用

Quick Start

Authentication

API Key (Recommended)

Session Cookie

Endpoints

Parameters

Response

Code Example

Parameters

Response

Code Example

Parameters

Response

Code Example

Query Parameters

Response

Code Example

Parameters

Response

Code Example

Response

Code Example

Preset Voices

Limits & Credits

Credits

Formats

Limits