Text-to-Speech (TTS)

The Text-to-Speech capability converts written text in Amharic or Afan Oromo to natural-sounding speech using the /audio endpoint. This feature enables applications to communicate audibly in Ethiopian languages.

Overview and Use Cases

The Text-to-Speech API is powered by our specialized audio models (አሌፍ-Audio-AM for Amharic and አሌፍ-Audio-OM for Afan Oromo) that have been trained to produce natural, high-quality speech with proper pronunciation and intonation. This capability enables:
  • Voice Interfaces: Create voice-driven applications with Ethiopian language support
  • Accessibility Solutions: Make content accessible to visually impaired users
  • Content Narration: Generate audio versions of written content (articles, books, news)
  • Educational Tools: Help with language learning and pronunciation
  • IVR Systems: Build interactive voice response systems in local languages
  • Voice Messaging: Enable voice communication in messaging applications

Request Format

Basic Request

Endpoint: POST /audio
{
"text": "ሰላም፡ ይህ የአዲስ ኤአይ የድምፅ ቴክኖሎጂ ናት።",
"language": "am"
}
json

Full Request Parameters

{
"text": "Your text to convert to speech",
"language": "am" | "om",
"stream": false
}
json

Required Parameters

| Parameter | Type | Description | | ---------- | ------ | -------------------------------------------------- | | text | string | The text content to convert to speech | | language | string | Language code: am (Amharic) or om (Afan Oromo) |

Optional Parameters

| Parameter | Type | Default | Description | | --------- | ------- | ------- | ------------------------------- | | stream | boolean | false | Enable streaming audio delivery |

Response Format

Non-Streaming Response

{
"audio": "data:audio/wav;base64,UklGRiQgAABXQV..."
}
json
The audio field contains a base64-encoded WAV file that can be directly used in HTML audio elements or saved to a file.

Streaming Response

With streaming enabled, each chunk is sent as a line-delimited JSON object:
{"audio_chunk": "UklGRiQ...", "index": 0}
{"audio_chunk": "AAAEkSRJ...", "index": 1}
{"audio_chunk": "QJElSZI...", "index": 2}
...
text

Streaming vs. Non-streaming

Non-streaming (Default)

The default mode returns a complete audio file when generation is finished:
  • Pros: Simple to implement, single audio file to handle
  • Cons: Higher latency for longer texts
  • Best for: Short texts (under 100 characters)

Streaming

Streaming mode delivers audio chunks as they're generated:
  • Pros: Reduced perceived latency, immediate playback for long texts
  • Cons: Requires managing multiple audio chunks
  • Best for: Longer texts, real-time applications
  • Usage: Set "stream": true in the request
Unlike the chat API streaming, the audio streaming functionality is stable and recommended for production use with long texts.

Supported Languages

The TTS capability supports two Ethiopian languages:
  • Amharic (am): Uses the አሌፍ-Audio-AM model with native Ge'ez script processing
  • Afan Oromo (om): Uses the አሌፍ-Audio-OM model with Latin script processing

Audio Format and Quality

  • Output format is base64-encoded WAV audio
  • Audio sample rate: 24kHz (high quality)
  • Audio characteristics:
    • Natural prosody and intonation
    • Clear pronunciation of Ethiopian language phonemes
    • Balanced tone suitable for various applications

Length Limitations and Chunking

For long texts, the API automatically handles chunking:
  • Maximum recommended input length: 2000 characters per request
  • For longer content, consider:
    1. Using streaming mode for better user experience
    2. Splitting text into natural segments (paragraphs, sentences)
    3. Making multiple requests and concatenating the results

Code Examples

Basic Audio Playback (JavaScript)

async function getAndPlayAudio() {
try {
const response = await fetch(
"https://api.addisassistant.com/api/v1/audio",
{
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": "YOUR_API_KEY",
},
body: JSON.stringify({
text: "ሰላም፡ እንደምን አለህ?",
language: "am",
}),
},
);
const data = await response.json();
// Play the audio
const audio = new Audio(data.audio);
audio.play();
// Or save it (in browser environments that support it)
// const link = document.createElement('a');
// link.href = data.audio;
// link.download = 'addis-ai-speech.wav';
// link.click();
} catch (error) {
console.error("Error fetching audio:", error);
}
}
javascript

Streaming Audio Playback (JavaScript)

async function streamAudio() {
const audioQueue = [];
let isPlaying = false;
try {
const response = await fetch(
"https://api.addisassistant.com/api/v1/audio",
{
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": "YOUR_API_KEY",
},
body: JSON.stringify({
text: "ይህ ረጅም ጽሑፍ ነው። የአዲስ ኤአይ የድምጽ ቴክኖሎጂ ትልልቅ ጽሑፎችን በቀላሉ አንብቦ ድምጽ ሊያወጣ ይችላል።",
language: "am",
stream: true,
}),
},
);
// Process the streaming response
const reader = response.body.getReader();
let decoder = new TextDecoder();
// Function to play the next audio in queue
function playNext() {
if (audioQueue.length === 0) {
isPlaying = false;
return;
}
isPlaying = true;
const nextChunk = audioQueue.shift();
const audio = new Audio("data:audio/wav;base64," + nextChunk);
audio.onended = playNext;
audio.play();
}
// Read chunks from the stream
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split("\n").filter((line) => line.trim());
for (const line of lines) {
try {
const data = JSON.parse(line);
audioQueue.push(data.audio_chunk);
// Start playing if not already
if (!isPlaying) {
playNext();
}
} catch (e) {
console.error("Error parsing chunk:", e);
}
}
}
} catch (error) {
console.error("Error streaming audio:", error);
}
}
javascript

Server-side Processing (Node.js)

const fs = require("fs");
const fetch = require("node-fetch");
async function textToSpeechAndSave(text, language, outputPath) {
try {
const response = await fetch(
"https://api.addisassistant.com/api/v1/audio",
{
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": "YOUR_API_KEY",
},
body: JSON.stringify({
text,
language,
}),
},
);
const data = await response.json();
// Extract base64 data (remove the data:audio/wav;base64, prefix)
const base64Data = data.audio.split(",")[1];
// Save to file
fs.writeFileSync(outputPath, Buffer.from(base64Data, "base64"));
console.log(`Audio saved to ${outputPath}`);
} catch (error) {
console.error("Error generating or saving audio:", error);
}
}
// Example usage
textToSpeechAndSave("ሰላም ዓለም", "am", "hello_world_amharic.wav");
javascript

Best Practices

  1. Content Optimization
    • Use natural language and punctuation
    • Break long content into meaningful segments
    • Avoid unusual abbreviations or symbols
  2. Performance Considerations
    • Use streaming for texts longer than 100 characters
    • Implement audio caching for frequently used phrases
    • Pre-generate audio for static content
  3. Error Handling
    • Implement fallback mechanisms for TTS failures
    • Handle network interruptions gracefully in streaming mode
    • Consider retry logic with exponential backoff
  4. User Experience
    • Provide visual feedback while audio is loading
    • Allow users to pause, stop, or restart audio playback
    • Consider offering speed control for playback
  5. Language-Specific Considerations
    • For Amharic: Use proper Ge'ez script with correct word breaks
    • For Afan Oromo: Use standard Latin script orthography
    • Test pronunciation of specialized terminology

Use Case Examples

  • Reading Assistant: Application that reads text content aloud for visually impaired users
  • Language Learning: Tool to demonstrate proper pronunciation of Amharic or Afan Oromo words
  • News Reader: Service that converts news articles to audio format
  • Navigation System: Voice directions in local languages
  • Voice Messaging: Platform enabling users to convert text messages to voice messages