Introduction
ElevenLabs offers one of the highest-performing text-to-speech APIs on the market. In 2026, expert integrations require fine-grained management of audio streams, rate limits, and instant voice cloning. This tutorial guides you step by step through building a robust, production-ready TypeScript layer.
Prerequisites
- Node.js 20+
- ElevenLabs account with Pro API key
- Advanced TypeScript knowledge and stream management
- Axios or native fetch
Project Initialization
npm init -y
npm install axios dotenv
npm install --save-dev @types/node typescript
npx tsc --initInitializes a TypeScript project and installs the necessary dependencies to consume the ElevenLabs API in a typed manner.
API Key Configuration
import dotenv from 'dotenv';
dotenv.config();
export const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY!;
export const BASE_URL = 'https://api.elevenlabs.io/v1';Loads the API key from environment variables. Always store secrets outside the source code.
Typed API Client
import axios from 'axios';
import { ELEVENLABS_API_KEY, BASE_URL } from '../config/elevenlabs';
const client = axios.create({
baseURL: BASE_URL,
headers: {
'xi-api-key': ELEVENLABS_API_KEY,
'Content-Type': 'application/json',
},
});
export default client;Creates a reusable and typed Axios client for all requests to ElevenLabs.
Basic TTS Generation
import client from '../lib/elevenlabsClient';
export async function generateSpeech(text: string, voiceId: string): Promise<Buffer> {
const response = await client.post(
`/text-to-speech/${voiceId}`,
{ text, model_id: 'eleven_multilingual_v2' },
{ responseType: 'arraybuffer' }
);
return Buffer.from(response.data);
}Complete function that returns an MP3 audio Buffer. Uses the multilingual v2 model for optimal quality.
Real-Time Audio Streaming
import client from '../lib/elevenlabsClient';
export async function* streamSpeech(text: string, voiceId: string) {
const response = await client.post(
`/text-to-speech/${voiceId}/stream`,
{ text, model_id: 'eleven_multilingual_v2' },
{ responseType: 'stream' }
);
for await (const chunk of response.data) {
yield chunk;
}
}Implements chunk-by-chunk streaming to reduce perceived latency in real-time applications.
Best Practices
- Always validate character limits before making calls
- Implement a retry system with exponential backoff
- Store voice_ids in the database rather than in code
- Use webhooks for long-running tasks
- Monitor credit consumption via the /user endpoint
Common Errors to Avoid
- Forgetting to handle 429 status code (rate limit) from ElevenLabs
- Not properly closing audio streams during streaming
- Using cloned voices without explicit consent
- Ignoring latency differences between regions
To Go Further
Discover our advanced training on voice AI integration: https://learni-group.com/formations