Create a cloned voice from an audio file. This endpoint allows you to create a new voice by providing a voice sample and reference audio.
voice_file and the original audio file as reference_audio_file when cloning the voice.x-api-key header.
multipart/form-data (set automatically when using FormData/files)
| Field | Type | Required | Description |
|---|---|---|---|
voice_file | File | Yes | The generated preview audio file from the preview API |
reference_audio_file | File | Yes | The original audio file used for the preview |
text | string | Yes | The text used in preview generation |
stability | number | Yes | Voice stability (0.0 to 1.0). Higher values produce more consistent output |
name | string | Yes | Name for the cloned voice |
model | string | Yes | Model identifier to use for voice cloning |
description | string | No | Description of the voice |
gender | string | No | Gender of the voice (e.g., “male”, “female”) |
age | string | No | Age category of the voice (e.g., “middle”, “elderly”) |
languages | string | No | Comma-separated list of language codes (e.g., “ar,en”) |
dialects | string | No | Comma-separated list of dialects (e.g., “najdi,hijazi”) |
avatar_url | string | No | URL to an avatar image for the voice (you can put your image URL here) |
voice_file should be the audio file generated from the voice preview APIreference_audio_file should be the original audio file you used when generating the previewtext should match the text you used when generating the previewvoice_id automatically upon creation200 OK
Content-Type: application/json
| Field | Type | Description |
|---|---|---|
id | string (UUID) | Unique identifier for the voice record |
voice_id | string | Voice identifier used in API calls |
name | string | Name of the cloned voice |
description | string | null | Description of the voice |
gender | string | null | Gender of the voice |
age | string | null | Age category of the voice |
languages | string[] | List of language codes supported by the voice |
dialect | string[] | List of dialects supported by the voice |
type | string | null | Voice type |
sample_url | string | URL to the sample audio file |
avatar_url | string | null | URL to the avatar image |
stability | number | Voice stability value |
voice_id in text-to-speech generation endpoints. The voice will be available immediately after successful creation.API key for authentication
The generated preview audio file from the preview API
The original audio file used for the preview
The text used in preview generation
Voice stability (0.0 to 1.0). Higher values produce more consistent output
0 <= x <= 1Name for the cloned voice
Model identifier to use for voice cloning
Description of the voice
Gender of the voice (e.g., 'male', 'female')
Age category of the voice (e.g., 'middle', 'elderly')
Comma-separated list of language codes (e.g., 'ar,en')
Comma-separated list of dialects (e.g., 'najdi,hijazi')
URL to an avatar image for the voice (you can put your image URL here)
Voice created successfully
Unique identifier for the voice record
Voice identifier used in API calls
Name of the cloned voice
List of language codes supported by the voice
List of dialects supported by the voice
URL to the sample audio file
Voice stability value
Description of the voice
Gender of the voice
Age category of the voice
Voice type
URL to the avatar image