Voice Interface Implementation

This tutorial demonstrates how to implement a basic voice interface using Addis AI's Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities.

Overview

Voice interfaces allow users to interact with your applications using spoken language instead of traditional graphical user interfaces. With Addis AI, you can easily add voice capabilities to your applications in Amharic and Afan Oromo.

Prerequisites

  • An Addis AI API key (see the Authentication guide for details)
  • Basic knowledge of JavaScript/TypeScript
  • A web server or application with microphone access

Setting Up Your Project

1. Install Dependencies

First, set up a basic web project and install the necessary dependencies:
# Create a new project directory
mkdir addis-voice-app
cd addis-voice-app
# Initialize a new project
npm init -y
# Install dependencies
npm install addis-ai-client microphone-stream
bash

2. Create Basic HTML Structure

Create an index.html file:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Addis AI Voice Interface</title>
<link rel="stylesheet" href="styles.css" />
</head>
<body>
<div class="container">
<h1>Addis AI Voice Assistant</h1>
<div class="voice-controls">
<button id="start-recording">Start Recording</button>
<button id="stop-recording" disabled>Stop Recording</button>
</div>
<div class="conversation">
<div id="conversation-history"></div>
</div>
</div>
<script src="app.js"></script>
</body>
</html>
html

Implementing Voice Recording and Processing

3. Set Up Audio Recording

Create an app.js file:
// API key configuration
const ADDIS_AI_API_KEY = "your_api_key_here";
const API_ENDPOINT = "https://api.addis-ai.com";
// DOM elements
const startButton = document.getElementById("start-recording");
const stopButton = document.getElementById("stop-recording");
const conversationHistory = document.getElementById("conversation-history");
// Audio recording variables
let mediaRecorder;
let audioChunks = [];
let stream;
// Event listeners
startButton.addEventListener("click", startRecording);
stopButton.addEventListener("click", stopRecording);
// Start recording function
async function startRecording() {
try {
stream = await navigator.mediaDevices.getUserMedia({ audio: true });
mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = (event) => {
audioChunks.push(event.data);
};
mediaRecorder.onstop = processAudio;
audioChunks = [];
mediaRecorder.start();
// Update UI
startButton.disabled = true;
stopButton.disabled = false;
addMessageToHistory("Listening...", "system");
} catch (error) {
console.error("Error accessing microphone:", error);
addMessageToHistory(
"Error accessing microphone. Please check permissions.",
"error",
);
}
}
// Stop recording function
function stopRecording() {
if (mediaRecorder && mediaRecorder.state !== "inactive") {
mediaRecorder.stop();
stream.getTracks().forEach((track) => track.stop());
// Update UI
startButton.disabled = false;
stopButton.disabled = true;
addMessageToHistory("Processing audio...", "system");
}
}
// Process recorded audio
async function processAudio() {
const audioBlob = new Blob(audioChunks, { type: "audio/wav" });
try {
// Call Addis AI Speech-to-Text API
const formData = new FormData();
formData.append("audio", audioBlob);
formData.append("language", "am"); // Use 'am' for Amharic or 'om' for Afan Oromo
const response = await fetch(`${API_ENDPOINT}/v1/audio/transcribe`, {
method: "POST",
headers: {
"X-API-Key": ADDIS_AI_API_KEY,
},
body: formData,
});
const data = await response.json();
if (data.text) {
// Display transcribed text
addMessageToHistory(data.text, "user");
// Process the transcribed text with the chat API
await processTextWithChat(data.text);
} else {
addMessageToHistory(
"Could not transcribe audio. Please try again.",
"error",
);
}
} catch (error) {
console.error("Error processing audio:", error);
addMessageToHistory("Error processing audio. Please try again.", "error");
}
}
// Process text with chat API
async function processTextWithChat(text) {
try {
const response = await fetch(`${API_ENDPOINT}/v1/chat/completions`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": ADDIS_AI_API_KEY,
},
body: JSON.stringify({
model: "addis-1-alef",
messages: [{ role: "user", content: text }],
language: "am", // Use 'am' for Amharic or 'om' for Afan Oromo
}),
});
const data = await response.json();
if (data.choices && data.choices[0] && data.choices[0].message) {
const assistantResponse = data.choices[0].message.content;
// Display assistant response
addMessageToHistory(assistantResponse, "assistant");
// Convert assistant response to speech
await convertTextToSpeech(assistantResponse);
} else {
addMessageToHistory(
"Error processing your request. Please try again.",
"error",
);
}
} catch (error) {
console.error("Error calling chat API:", error);
addMessageToHistory(
"Error processing your request. Please try again.",
"error",
);
}
}
// Convert text to speech
async function convertTextToSpeech(text) {
try {
const response = await fetch(`${API_ENDPOINT}/v1/audio/speech`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": ADDIS_AI_API_KEY,
},
body: JSON.stringify({
text: text,
language: "am", // Use 'am' for Amharic or 'om' for Afan Oromo
voice_id: "female-1", // Available voice options depend on the language
}),
});
if (response.ok) {
const audioBlob = await response.blob();
playAudio(audioBlob);
} else {
console.error("Error generating speech");
}
} catch (error) {
console.error("Error calling TTS API:", error);
}
}
// Play audio
function playAudio(audioBlob) {
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();
}
// Add message to conversation history
function addMessageToHistory(message, role) {
const messageDiv = document.createElement("div");
messageDiv.className = `message ${role}`;
messageDiv.textContent = message;
conversationHistory.appendChild(messageDiv);
conversationHistory.scrollTop = conversationHistory.scrollHeight;
}
javascript

4. Add Basic Styling

Create a styles.css file:
body {
font-family: "Segoe UI", Tahoma, Geneva, Verdana, sans-serif;
margin: 0;
padding: 0;
background-color: #f5f5f5;
}
.container {
max-width: 800px;
margin: 0 auto;
padding: 20px;
}
h1 {
text-align: center;
color: #333;
}
.voice-controls {
display: flex;
justify-content: center;
margin-bottom: 20px;
}
button {
padding: 12px 24px;
margin: 0 10px;
border: none;
border-radius: 4px;
cursor: pointer;
font-size: 16px;
transition: background-color 0.3s;
}
#start-recording {
background-color: #4caf50;
color: white;
}
#stop-recording {
background-color: #f44336;
color: white;
}
button:disabled {
background-color: #cccccc;
cursor: not-allowed;
}
.conversation {
background-color: white;
border-radius: 8px;
padding: 20px;
box-shadow: 0 2px 5px rgba(0, 0, 0, 0.1);
height: 400px;
overflow-y: auto;
}
.message {
margin-bottom: 12px;
padding: 12px;
border-radius: 6px;
max-width: 80%;
}
.user {
background-color: #e3f2fd;
margin-left: auto;
}
.assistant {
background-color: #f1f1f1;
margin-right: auto;
}
.system,
.error {
background-color: #f8f9fa;
font-style: italic;
color: #6c757d;
margin: 0 auto;
text-align: center;
}
.error {
color: #dc3545;
}
css

Testing the Voice Interface

To test your voice interface, you need to run it on a web server. You can use a simple development server:
# If you have Node.js installed
npx serve
# Or with Python 3
python -m http.server
bash
Open your browser and navigate to the local server (typically http://localhost:5000 or http://localhost:8000). Grant microphone permissions when prompted.

Further Enhancements

This basic implementation can be enhanced in several ways:
  1. Add conversation history storage
  2. Implement voice activity detection for automatic recording
  3. Add voice preferences (gender, tone, speed)
  4. Implement a more sophisticated UI with visual feedback
  5. Add authentication and user profiles

Conclusion

You've now built a basic voice interface using Addis AI's Text-to-Speech and Speech-to-Text capabilities. This implementation demonstrates the fundamental workflow for voice applications:
  1. Record user audio
  2. Transcribe audio to text
  3. Process text with AI
  4. Convert AI response to speech
  5. Play the speech response
For more advanced implementations, see our API Reference and explore our other Examples.
Note: This tutorial uses basic browser APIs for audio recording. For production applications, consider using more robust audio libraries and implementing proper error handling.