Skip to main content
Faseeh’s Text-to-Speech API converts Arabic text into natural-sounding speech using advanced AI models. Support multiple Arabic dialects from Abu Dhabi to Rabat.

Overview

The Text-to-Speech API provides three methods for generating speech:
  1. Streaming Audio Output: Receive PCM16 audio chunks as they’re generated
  2. Complete Audio File: Receive a full WAV file after generation completes
  3. WebSocket Streaming: Real-time bidirectional communication for streaming audio

Endpoints

POST /text-to-speech/:model_id (Complete Audio)

Generate speech from text and receive a complete WAV file.

View Documentation

Complete WAV file generation endpoint

POST /text-to-speech/:model_id (Streaming)

Generate speech from text with streaming PCM16 audio chunks.

View Documentation

Streaming PCM16 audio endpoint

WSS /text-to-speech

Real-time streaming via WebSocket connection. Connection:
wss://api.faseeh.com/api/v1/text-to-speech?x-api-key=YOUR_API_KEY
Messages: Initialize Connection:
{
  "type": "initConnection",
  "model_id": "MODEL_ID",
  "voice_id": "VOICE_ID",
  "voice_settings": {
    "stability": 0.5
  }
}
Send Text:
{
  "type": "text",
  "text": "Your Arabic text here",
  "try_trigger_generation": true
}
Close Connection:
{
  "type": "closeConnection"
}
Responses: Connection Initialized:
{
  "type": "connectionInitialized"
}
Audio Chunk:
{
  "audio": "base64_encoded_audio",
  "sampleRate": 24000,
  "isFinal": false
}
Final Chunk:
{
  "audio": "",
  "sampleRate": 24000,
  "isFinal": true
}
Error:
{
  "type": "error",
  "message": "Error description"
}

Example Usage

JavaScript/TypeScript

async function generateSpeech(modelId: string, voiceId: string, text: string) {
  const response = await fetch(`https://api.faseeh.ai/api/v1/text-to-speech/${modelId}`, {
    method: 'POST',
    headers: {
      'x-api-key': 'YOUR_API_KEY',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      voice_id: voiceId,
      text: text,
      stability: 0.5,
      streaming: false,
    }),
  });

  const audioBlob = await response.blob();
  const audioUrl = URL.createObjectURL(audioBlob);
  return audioUrl;
}

Python

import requests

def generate_speech(model_id, voice_id, text, api_key):
    url = f"https://api.faseeh.ai/api/v1/text-to-speech/{model_id}"
    headers = {
        "x-api-key": api_key,
        "Content-Type": "application/json"
    }
    data = {
        "voice_id": voice_id,
        "text": text,
        "stability": 0.5,
        "streaming": False
    }
    
    response = requests.post(url, json=data, headers=headers)
    return response.content  # WAV audio bytes

Voice Settings

  • stability: Controls the stability of the voice (0.0 to 1.0). Higher values produce more consistent output.

Best Practices

  1. Choose the right model: Select a model based on your latency and quality requirements
  2. Use streaming for long texts: Streaming reduces perceived latency for longer generations
  3. Handle errors gracefully: Always check for error responses and handle insufficient balance scenarios
  4. Cache models: Model information doesn’t change frequently, cache model lists to reduce API calls