Introduction
Text to Speech (TTS) API is based on advanced generative AI models that can convert input text into realistic speech audio. It supports multiple uses:- Voice blog articles
- Generate speech audio in multiple languages
- Provide real-time audio output stream
- gpt-4o-audio-preview - The latest audio generation model from OpenAI, supports conversational audio generation
- gpt-4o-mini-tts - The preferred model for smart real-time applications, supports advanced voice control, and can control multiple voice characteristics through prompts:
- Accent
- Emotional range
- Intonation
- Impressions
- Speed of speech
- Tone
- Whispering
- tts-1-hd - High-quality TTS model
- tts-1 - Standard TTS model, balance quality and speed
Performance suggestions: For the fastest response time, it is recommended to use
wav
or pcm
as the response format. For high-quality audio, it is recommended to use tts-1-hd
; for faster generation speed, use tts-1
; for smart voice applications, it is recommended to use gpt-4o-mini-tts
.Voice preview: You can listen to different voice effects on OpenAI.fm.Model calling method
Standard TTS model (tts-1, tts-1-hd)
Use the/v1/audio/speech
endpoint, and call the client.audio.speech.create()
method.
gpt-4o-mini-tts
Use the/v1/audio/speech
endpoint, and support the instructions
parameter for advanced voice control.
gpt-4o-audio-preview
Use the/v1/chat/completions
endpoint, and set the modalities: ["text", "audio"]
and audio
configuration.
Request parameters
Standard TTS parameters
applicable to tts-1, tts-1-hd, gpt-4o-mini-ttsThe model ID to use. Optional values:
tts-1
, tts-1-hd
, gpt-4o-mini-tts
The text to generate audio, with a maximum length of 4096 characters
The voice to use for synthesis. Optional values:
alloy
, echo
, fable
, onyx
, nova
, shimmer
The audio output format. Supported formats:
mp3
, opus
, aac
, flac
, wav
, pcm
. Default is mp3
The speed of generating audio. The range is 0.25 to 4.0. Default is 1.0. Note:
gpt-4o-mini-tts
does not support this parameter, but you can control the speed through natural language descriptionVoice generation instructions (only applicable to
gpt-4o-mini-tts
model), can specify voice style, tone, emotion, etc.gpt-4o-audio-preview parameters
Set to
gpt-4o-audio-preview
Set to
["text", "audio"]
to enable audio outputAudio configuration object, containing
voice
and format
fieldsChat message array, same as standard chat format