Use AI models to convert text into natural speech, supporting multiple speech styles and output formats
wav
or pcm
as the response format. For high-quality audio, it is recommended to use tts-1-hd
; for faster generation speed, use tts-1
; for smart voice applications, it is recommended to use gpt-4o-mini-tts
.Voice preview: You can listen to different voice effects on OpenAI.fm./v1/audio/speech
endpoint, and call the client.audio.speech.create()
method.
/v1/audio/speech
endpoint, and support the instructions
parameter for advanced voice control.
/v1/chat/completions
endpoint, and set the modalities: ["text", "audio"]
and audio
configuration.
tts-1
, tts-1-hd
, gpt-4o-mini-tts
alloy
, echo
, fable
, onyx
, nova
, shimmer
mp3
, opus
, aac
, flac
, wav
, pcm
. Default is mp3
gpt-4o-mini-tts
does not support this parameter, but you can control the speed through natural language descriptiongpt-4o-mini-tts
model), can specify voice style, tone, emotion, etc.gpt-4o-audio-preview
["text", "audio"]
to enable audio outputvoice
and format
fields