Introduction
Text to Speech (TTS) API is based on advanced generative AI models that can convert input text into realistic speech audio. It supports multiple uses:- Voice blog articles
- Generate speech audio in multiple languages
- Provide real-time audio output stream
- gpt-4o-audio-preview - The latest audio generation model from OpenAI, supports conversational audio generation
- gpt-4o-mini-tts - The preferred model for smart real-time applications, supports advanced voice control, and can control multiple voice characteristics through prompts:
- Accent
- Emotional range
- Intonation
- Impressions
- Speed of speech
- Tone
- Whispering
- tts-1-hd - High-quality TTS model
- tts-1 - Standard TTS model, balance quality and speed
Model calling method
Standard TTS model (tts-1, tts-1-hd)
Use the/v1/audio/speech endpoint, and call the client.audio.speech.create() method.
gpt-4o-mini-tts
Use the/v1/audio/speech endpoint, and support the instructions parameter for advanced voice control.
gpt-4o-audio-preview
Use the/v1/chat/completions endpoint, and set the modalities: ["text", "audio"] and audio configuration.
Request parameters
Standard TTS parameters
applicable to tts-1, tts-1-hd, gpt-4o-mini-ttsThe model ID to use. Optional values:
tts-1, tts-1-hd, gpt-4o-mini-ttsThe text to generate audio, with a maximum length of 4096 characters
The voice to use for synthesis. Optional values:
alloy, echo, fable, onyx, nova, shimmerThe audio output format. Supported formats:
mp3, opus, aac, flac, wav, pcm. Default is mp3The speed of generating audio. The range is 0.25 to 4.0. Default is 1.0. Note:
gpt-4o-mini-tts does not support this parameter, but you can control the speed through natural language descriptionVoice generation instructions (only applicable to
gpt-4o-mini-tts model), can specify voice style, tone, emotion, etc.gpt-4o-audio-preview parameters
Set to
gpt-4o-audio-previewSet to
["text", "audio"] to enable audio outputAudio configuration object, containing
voice and format fieldsChat message array, same as standard chat format