Text to Speech

Faseeh’s Text-to-Speech API converts Arabic text into natural-sounding speech using advanced AI models. Support multiple Arabic dialects from Abu Dhabi to Rabat.

Overview

The Text-to-Speech API provides three methods for generating speech:

Streaming Audio Output: Receive PCM16 audio chunks as they’re generated
Complete Audio File: Receive a full WAV file after generation completes
WebSocket Streaming: Real-time bidirectional communication for streaming audio

Endpoints

POST /text-to-speech/:model_id (Complete Audio)

Generate speech from text and receive a complete WAV file.

View Documentation

Complete WAV file generation endpoint

POST /text-to-speech/:model_id (Streaming)

Generate speech from text with streaming PCM16 audio chunks.

View Documentation

Streaming PCM16 audio endpoint

WSS /text-to-speech

Real-time streaming via WebSocket connection. Connection:

wss://api.faseeh.com/api/v1/text-to-speech?x-api-key=YOUR_API_KEY

Messages: Initialize Connection:

{
  "type": "initConnection",
  "model_id": "MODEL_ID",
  "voice_id": "VOICE_ID",
  "voice_settings": {
    "stability": 0.5
  }
}

Send Text:

{
  "type": "text",
  "text": "Your Arabic text here",
  "try_trigger_generation": true
}

Close Connection:

{
  "type": "closeConnection"
}

Responses: Connection Initialized:

{
  "type": "connectionInitialized"
}

Audio Chunk:

{
  "audio": "base64_encoded_audio",
  "sampleRate": 24000,
  "isFinal": false
}

Final Chunk:

{
  "audio": "",
  "sampleRate": 24000,
  "isFinal": true
}

Error:

{
  "type": "error",
  "message": "Error description"
}

Example Usage

JavaScript/TypeScript

async function generateSpeech(modelId: string, voiceId: string, text: string) {
  const response = await fetch(`https://api.faseeh.ai/api/v1/text-to-speech/${modelId}`, {
    method: 'POST',
    headers: {
      'x-api-key': 'YOUR_API_KEY',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      voice_id: voiceId,
      text: text,
      stability: 0.5,
      streaming: false,
    }),
  });

  const audioBlob = await response.blob();
  const audioUrl = URL.createObjectURL(audioBlob);
  return audioUrl;
}

Python

import requests

def generate_speech(model_id, voice_id, text, api_key):
    url = f"https://api.faseeh.ai/api/v1/text-to-speech/{model_id}"
    headers = {
        "x-api-key": api_key,
        "Content-Type": "application/json"
    }
    data = {
        "voice_id": voice_id,
        "text": text,
        "stability": 0.5,
        "streaming": False
    }
    
    response = requests.post(url, json=data, headers=headers)
    return response.content  # WAV audio bytes

Voice Settings

stability: Controls the stability of the voice (0.0 to 1.0). Higher values produce more consistent output.

Best Practices

Choose the right model: Select a model based on your latency and quality requirements
Use streaming for long texts: Streaming reduces perceived latency for longer generations
Handle errors gracefully: Always check for error responses and handle insufficient balance scenarios
Cache models: Model information doesn’t change frequently, cache model lists to reduce API calls

POST Endpoint

Complete WAV file generation

STREAM Endpoint

Streaming PCM16 audio

WSS Endpoint

WebSocket streaming

Getting started

Capabilities

Overview

Endpoints

POST /text-to-speech/:model_id (Complete Audio)

View Documentation

POST /text-to-speech/:model_id (Streaming)

View Documentation

WSS /text-to-speech

Example Usage

JavaScript/TypeScript

Python

Voice Settings

Best Practices

POST Endpoint

STREAM Endpoint

WSS Endpoint

Getting started

Capabilities

​Overview

​Endpoints

​POST /text-to-speech/:model_id (Complete Audio)

View Documentation

​POST /text-to-speech/:model_id (Streaming)

View Documentation

​WSS /text-to-speech

​Example Usage

​JavaScript/TypeScript

​Python

​Voice Settings

​Best Practices

POST Endpoint

STREAM Endpoint

WSS Endpoint

Overview

Endpoints

POST /text-to-speech/:model_id (Complete Audio)

POST /text-to-speech/:model_id (Streaming)

WSS /text-to-speech

Example Usage

JavaScript/TypeScript

Python

Voice Settings

Best Practices