Text-to-Speech (TTS)

The Text-to-Speech capability converts written text in Amharic or Afan Oromo to natural-sounding speech using the /audio endpoint. This feature enables applications to communicate audibly in Ethiopian languages.

Overview and Use Cases

The Text-to-Speech API is powered by our specialized audio models (አሌፍ-Audio-AM for Amharic and አሌፍ-Audio-OM for Afan Oromo) that have been trained to produce natural, high-quality speech with proper pronunciation and intonation. This capability enables:

Voice Interfaces: Create voice-driven applications with Ethiopian language support
Accessibility Solutions: Make content accessible to visually impaired users
Content Narration: Generate audio versions of written content (articles, books, news)
Educational Tools: Help with language learning and pronunciation
IVR Systems: Build interactive voice response systems in local languages
Voice Messaging: Enable voice communication in messaging applications

Request Format

Basic Request

Endpoint: POST /audio

{
  "text": "ሰላም፡ ይህ የአዲስ ኤአይ የድምፅ ቴክኖሎጂ ናት።",
  "language": "am"
}

json

Full Request Parameters

{
  "text": "Your text to convert to speech",
  "language": "am" | "om",
  "stream": false
}

json

Required Parameters

| Parameter | Type | Description | | ---------- | ------ | -------------------------------------------------- | | text | string | The text content to convert to speech | | language | string | Language code: am (Amharic) or om (Afan Oromo) |

Optional Parameters

| Parameter | Type | Default | Description | | --------- | ------- | ------- | ------------------------------- | | stream | boolean | false | Enable streaming audio delivery |

Response Format

Non-Streaming Response

{
  "audio": "data:audio/wav;base64,UklGRiQgAABXQV..."
}

json

The audio field contains a base64-encoded WAV file that can be directly used in HTML audio elements or saved to a file.

Streaming Response

With streaming enabled, each chunk is sent as a line-delimited JSON object:

{"audio_chunk": "UklGRiQ...", "index": 0}
{"audio_chunk": "AAAEkSRJ...", "index": 1}
{"audio_chunk": "QJElSZI...", "index": 2}
...

text

Streaming vs. Non-streaming

Non-streaming (Default)

The default mode returns a complete audio file when generation is finished:

Pros: Simple to implement, single audio file to handle
Cons: Higher latency for longer texts
Best for: Short texts (under 100 characters)

Streaming

Streaming mode delivers audio chunks as they're generated:

Pros: Reduced perceived latency, immediate playback for long texts
Cons: Requires managing multiple audio chunks
Best for: Longer texts, real-time applications
Usage: Set "stream": true in the request

Unlike the chat API streaming, the audio streaming functionality is stable and recommended for production use with long texts.

Supported Languages

The TTS capability supports two Ethiopian languages:

Amharic (am): Uses the አሌፍ-Audio-AM model with native Ge'ez script processing
Afan Oromo (om): Uses the አሌፍ-Audio-OM model with Latin script processing

Audio Format and Quality

Output format is base64-encoded WAV audio
Audio sample rate: 24kHz (high quality)
Audio characteristics:
- Natural prosody and intonation
- Clear pronunciation of Ethiopian language phonemes
- Balanced tone suitable for various applications

Length Limitations and Chunking

For long texts, the API automatically handles chunking:

Maximum recommended input length: 2000 characters per request
For longer content, consider:
1. Using streaming mode for better user experience
2. Splitting text into natural segments (paragraphs, sentences)
3. Making multiple requests and concatenating the results

Code Examples

Basic Audio Playback (JavaScript)

async function getAndPlayAudio() {
  try {
    const response = await fetch(
      "https://api.addisassistant.com/api/v1/audio",
      {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          "X-API-Key": "YOUR_API_KEY",
        },
        body: JSON.stringify({
          text: "ሰላም፡ እንደምን አለህ?",
          language: "am",
        }),
      },
    );

    const data = await response.json();

    // Play the audio
    const audio = new Audio(data.audio);
    audio.play();

    // Or save it (in browser environments that support it)
    // const link = document.createElement('a');
    // link.href = data.audio;
    // link.download = 'addis-ai-speech.wav';
    // link.click();
  } catch (error) {
    console.error("Error fetching audio:", error);
  }
}

javascript

Streaming Audio Playback (JavaScript)

async function streamAudio() {
  const audioQueue = [];
  let isPlaying = false;

  try {
    const response = await fetch(
      "https://api.addisassistant.com/api/v1/audio",
      {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          "X-API-Key": "YOUR_API_KEY",
        },
        body: JSON.stringify({
          text: "ይህ ረጅም ጽሑፍ ነው። የአዲስ ኤአይ የድምጽ ቴክኖሎጂ ትልልቅ ጽሑፎችን በቀላሉ አንብቦ ድምጽ ሊያወጣ ይችላል።",
          language: "am",
          stream: true,
        }),
      },
    );

    // Process the streaming response
    const reader = response.body.getReader();
    let decoder = new TextDecoder();

    // Function to play the next audio in queue
    function playNext() {
      if (audioQueue.length === 0) {
        isPlaying = false;
        return;
      }

      isPlaying = true;
      const nextChunk = audioQueue.shift();
      const audio = new Audio("data:audio/wav;base64," + nextChunk);

      audio.onended = playNext;
      audio.play();
    }

    // Read chunks from the stream
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split("\n").filter((line) => line.trim());

      for (const line of lines) {
        try {
          const data = JSON.parse(line);
          audioQueue.push(data.audio_chunk);

          // Start playing if not already
          if (!isPlaying) {
            playNext();
          }
        } catch (e) {
          console.error("Error parsing chunk:", e);
        }
      }
    }
  } catch (error) {
    console.error("Error streaming audio:", error);
  }
}

javascript

Server-side Processing (Node.js)

const fs = require("fs");
const fetch = require("node-fetch");

async function textToSpeechAndSave(text, language, outputPath) {
  try {
    const response = await fetch(
      "https://api.addisassistant.com/api/v1/audio",
      {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          "X-API-Key": "YOUR_API_KEY",
        },
        body: JSON.stringify({
          text,
          language,
        }),
      },
    );

    const data = await response.json();

    // Extract base64 data (remove the data:audio/wav;base64, prefix)
    const base64Data = data.audio.split(",")[1];

    // Save to file
    fs.writeFileSync(outputPath, Buffer.from(base64Data, "base64"));
    console.log(`Audio saved to ${outputPath}`);
  } catch (error) {
    console.error("Error generating or saving audio:", error);
  }
}

// Example usage
textToSpeechAndSave("ሰላም ዓለም", "am", "hello_world_amharic.wav");

javascript

Best Practices

Content Optimization
- Use natural language and punctuation
- Break long content into meaningful segments
- Avoid unusual abbreviations or symbols
Performance Considerations
- Use streaming for texts longer than 100 characters
- Implement audio caching for frequently used phrases
- Pre-generate audio for static content
Error Handling
- Implement fallback mechanisms for TTS failures
- Handle network interruptions gracefully in streaming mode
- Consider retry logic with exponential backoff
User Experience
- Provide visual feedback while audio is loading
- Allow users to pause, stop, or restart audio playback
- Consider offering speed control for playback
Language-Specific Considerations
- For Amharic: Use proper Ge'ez script with correct word breaks
- For Afan Oromo: Use standard Latin script orthography
- Test pronunciation of specialized terminology

Use Case Examples

Reading Assistant: Application that reads text content aloud for visually impaired users
Language Learning: Tool to demonstrate proper pronunciation of Amharic or Afan Oromo words
News Reader: Service that converts news articles to audio format
Navigation System: Voice directions in local languages
Voice Messaging: Platform enabling users to convert text messages to voice messages

Previous Next