AiHubMix Documentation Hub

Introduction

Text to Speech (TTS) API is based on advanced generative AI models that can convert input text into realistic speech audio. It supports multiple uses:

Voice blog articles
Generate speech audio in multiple languages
Provide real-time audio output stream

Available model list:

gpt-4o-audio-preview - The latest audio generation model from OpenAI, supports conversational audio generation
gpt-4o-mini-tts - The preferred model for smart real-time applications, supports advanced voice control, and can control multiple voice characteristics through prompts:
- Accent
- Emotional range
- Intonation
- Impressions
- Speed of speech
- Tone
- Whispering
tts-1-hd - High-quality TTS model
tts-1 - Standard TTS model, balance quality and speed

Performance suggestions: For the fastest response time, it is recommended to use wav or pcm as the response format. For high-quality audio, it is recommended to use tts-1-hd; for faster generation speed, use tts-1; for smart voice applications, it is recommended to use gpt-4o-mini-tts.Voice preview: You can listen to different voice effects on OpenAI.fm.

Model calling method

Standard TTS model (tts-1, tts-1-hd)

Use the /v1/audio/speech endpoint, and call the client.audio.speech.create() method.

gpt-4o-mini-tts

Use the /v1/audio/speech endpoint, and support the instructions parameter for advanced voice control.

gpt-4o-audio-preview

Use the /v1/chat/completions endpoint, and set the modalities: ["text", "audio"] and audio configuration.

Request parameters

Standard TTS parameters

applicable to tts-1, tts-1-hd, gpt-4o-mini-tts

model

string

required

The model ID to use. Optional values: tts-1, tts-1-hd, gpt-4o-mini-tts

input

string

required

The text to generate audio, with a maximum length of 4096 characters

voice

string

required

The voice to use for synthesis. Optional values: alloy, echo, fable, onyx, nova, shimmer

response_format

string

The audio output format. Supported formats: mp3, opus, aac, flac, wav, pcm. Default is mp3

speed

number

The speed of generating audio. The range is 0.25 to 4.0. Default is 1.0. Note: gpt-4o-mini-tts does not support this parameter, but you can control the speed through natural language description

instructions

string

Voice generation instructions (only applicable to gpt-4o-mini-tts model), can specify voice style, tone, emotion, etc.

gpt-4o-audio-preview parameters

model

string

required

Set to gpt-4o-audio-preview

modalities

array

required

Set to ["text", "audio"] to enable audio output

audio

object

required

Audio configuration object, containing voice and format fields

messages

array

required

Chat message array, same as standard chat format

Usage

curl https://aihubmix.com/v1/audio/speech \
  -H "Authorization: Bearer $AIHUBMIX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "The quick brown 🦊 jumped over the lazy dog.",
    "voice": "alloy"
  }' \
  --output speech.mp3

Basics

API

Platform Management

Terms and Privacy

Text to Speech

Introduction

Model calling method

Standard TTS model (tts-1, tts-1-hd)

gpt-4o-mini-tts

gpt-4o-audio-preview

Request parameters

Standard TTS parameters

gpt-4o-audio-preview parameters

Usage

Basics

API

Platform Management

Terms and Privacy

​Introduction

​Model calling method

​Standard TTS model (tts-1, tts-1-hd)

​gpt-4o-mini-tts

​gpt-4o-audio-preview

​Request parameters

​Standard TTS parameters

​gpt-4o-audio-preview parameters

​Usage

Introduction

Model calling method

Standard TTS model (tts-1, tts-1-hd)

gpt-4o-mini-tts

gpt-4o-audio-preview

Request parameters

Standard TTS parameters

gpt-4o-audio-preview parameters

Usage