Introduction
Resemble AI is a leading platform for text-to-speech (TTS) synthesis and voice cloning, delivering hyper-realistic voices via REST API. In 2026, its SDK and advanced endpoints enable expert use cases like real-time streaming, batch processing of thousands of clips, and integration of custom cloned voices. This expert tutorial guides you step-by-step through integrating Resemble AI into a Node.js/TypeScript app: from voice cloning to webhook management for async clips. Why it matters: AI voices cut audio production costs by 90% while boosting engagement in podcasts, virtual assistants, and games. With <200ms streaming latency, it's production-ready. Follow these steps for a scalable setup with error handling and caching. (128 words)
Prerequisites
- Resemble AI account with API key (create one at app.resemble.ai)
- Node.js 20+ and npm/yarn
- Advanced knowledge of TypeScript, fetch API, and async/await
- WAV/MP3 audio file (10-30s) for voice cloning
- Tools: FFmpeg for audio post-processing (optional)
Initialize the Node.js Project
mkdir resemble-ai-expert && cd resemble-ai-expert
npm init -y
npm install typescript @types/node ts-node axios form-data dotenv
npm install -D @types/node
npx tsc --init
mkdir src
echo 'API_KEY=your_resemble_api_key_here
PROJECT_UUID=your_project_uuid_here' > .envThis script sets up a TypeScript project with Axios for HTTP requests and FormData for multipart uploads. The .env file stores sensitive credentials; replace with your real values from the Resemble dashboard. Avoid committing .env to Git with .gitignore.
Configure Credentials
Create a project on Resemble AI to get the PROJECT_UUID. Generate an API key via Account > API Keys. Upload your clean source audio file (noise-free, ideally 44.1kHz) to /assets/voice_sample.wav. Test the API with curl: curl -H 'Authorization: Bearer $API_KEY' https://app.resemble.ai/api/v2/projects/$PROJECT_UUID. Analogy: like a pro vocal studio, cloning needs a high-quality 'master' audio.
Clone a Custom Voice
import axios from 'axios';
import FormData from 'form-data';
import fs from 'fs';
import dotenv from 'dotenv';
dotenv.config();
const API_KEY = process.env.API_KEY!;
const PROJECT_UUID = process.env.PROJECT_UUID!;
const VOICE_SAMPLE_PATH = './assets/voice_sample.wav';
async function cloneVoice() {
const form = new FormData();
form.append('name', 'MaVoixClonee');
form.append('gender', 'male');
form.append('accent', 'fr-FR');
form.append('description', 'Voix experte française clonée');
form.append('audio', fs.createReadStream(VOICE_SAMPLE_PATH));
try {
const response = await axios.post(
`https://app.resemble.ai/api/v2/projects/${PROJECT_UUID}/voices`,
form,
{
headers: {
'Authorization': `Bearer ${API_KEY}`,
...form.getHeaders(),
},
}
);
console.log('Voix clonée:', response.data.uuid);
return response.data.uuid;
} catch (error: any) {
console.error('Erreur clonage:', error.response?.data || error.message);
throw error;
}
}
cloneVoice();This script uploads audio to create an async cloned voice (training ~10min). Use FormData for multipart; monitor status in the dashboard. Pitfall: audio >60s or noisy = rejection; validate SSML first. Returns voice UUID for later TTS.
Generate Basic TTS with Cloned Voice
Once cloned (UUID retrieved), generate clips. Resemble supports SSML for advanced prosody (emphasis, pauses). For experts: optimize prompts for vocal consistency. Download via polling or webhook.
Generate and Download a TTS Clip
import axios from 'axios';
import fs from 'fs';
import dotenv from 'dotenv';
import { v4 as uuidv4 } from 'uuid';
dotenv.config();
const API_KEY = process.env.API_KEY!;
const PROJECT_UUID = process.env.PROJECT_UUID!;
const VOICE_UUID = 'your_cloned_voice_uuid_here'; // Remplacez par UUID du clonage
async function generateTTS(text: string, outputPath: string) {
const clipUuid = uuidv4();
// Créer clip
const createResponse = await axios.post(
`https://app.resemble.ai/api/v2/projects/${PROJECT_UUID}/clips`,
{
voice_uuid: VOICE_UUID,
ssml: `<speak>${text}</speak>`,
name: `clip-${clipUuid}`,
description: 'TTS expert',
},
{
headers: { 'Authorization': `Bearer ${API_KEY}` },
}
);
// Polling jusqu\'à ready
let status = 'generating';
while (status !== 'finished') {
const statusRes = await axios.get(
`https://app.resemble.ai/api/v2/projects/${PROJECT_UUID}/clips/${createResponse.data.uuid}`,
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
status = statusRes.data.status;
if (status === 'error') throw new Error('Clip failed');
await new Promise(r => setTimeout(r, 2000));
}
// Télécharger
const audioRes = await axios.get(
`https://app.resemble.ai/api/v2/projects/${PROJECT_UUID}/clips/${createResponse.data.uuid}/audio`,
{
headers: { 'Authorization': `Bearer ${API_KEY}` },
responseType: 'arraybuffer',
}
);
fs.writeFileSync(outputPath, audioRes.data);
console.log(`Clip sauvé: ${outputPath}`);
}
generateTTS('Bonjour, ceci est une voix clonée experte en français.', './output/tts.wav');Generates an SSML clip, polls status (avoids timeouts), and downloads WAV. Unique UUID prevents collisions. Pitfall: without polling, async clips fail; limit to 10min/clip. Integrate Redis caching for reuse.
Implement TTS Streaming
import axios from 'axios';
import dotenv from 'dotenv';
import { Readable } from 'stream';
dotenv.config();
const API_KEY = process.env.API_KEY!;
const PROJECT_UUID = process.env.PROJECT_UUID!;
const VOICE_UUID = 'your_cloned_voice_uuid_here';
async function streamTTS(text: string) {
const response = await axios.post(
`https://app.resemble.ai/api/v2/projects/${PROJECT_UUID}/stream`,
{
voice_uuid: VOICE_UUID,
ssml: `<speak>${text}</speak>`,
},
{
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json',
},
responseType: 'stream',
}
);
const stream = response.data as Readable;
stream.pipe(process.stdout); // Ou fs.createWriteStream('./stream.wav')
return new Promise((resolve, reject) => {
stream.on('end', resolve);
stream.on('error', reject);
});
}
streamTTS('Texte streamé en temps réel pour latence faible.');Streaming for <200ms latency, ideal for chatbots/games. Use responseType 'stream' to pipe directly. Pitfall: no complex SSML in stream; test bandwidth. Scale with WebSockets for clients.
Handle Batches and Webhooks
For >100 clips, use the batch API. Set up webhooks (Project Settings > Webhooks) to notify https://yourapp.com/webhook/resemble. Analogy: like a render farm orchestrator.
Batch Process Clips
import axios from 'axios';
import dotenv from 'dotenv';
dotenv.config();
const API_KEY = process.env.API_KEY!;
const PROJECT_UUID = process.env.PROJECT_UUID!;
const VOICE_UUID = 'your_cloned_voice_uuid_here';
const texts = [
'Premier clip batch.',
'Deuxième clip avec SSML: <emphasis>important</emphasis>.',
'Troisième en français expert.'
];
async function batchGenerate() {
const batch = texts.map((text, i) => ({
voice_uuid: VOICE_UUID,
ssml: `<speak>${text}</speak>`,
name: `batch-${i}`,
}));
const response = await axios.post(
`https://app.resemble.ai/api/v2/projects/${PROJECT_UUID}/clips/bulk`,
{ clips: batch },
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
);
console.log('Batch UUIDs:', response.data.map((c: any) => c.uuid));
}
batchGenerate();Creates multiple clips in one call (quota 1000/day). Handle UUIDs via webhooks. Pitfall: exceed quota = 429; implement exponential retry. Cost: ~$0.01/clip.
Webhook Handler for Ready Clips
import express from 'express';
import dotenv from 'dotenv';
dotenv.config();
const app = express();
app.use(express.json());
app.post('/webhook/resemble', (req, res) => {
const { project_uuid, clip_uuid, status } = req.body;
if (status === 'finished') {
console.log(`Clip ${clip_uuid} prêt dans projet ${project_uuid}`);
// Trigger download ou notify user
downloadClip(project_uuid, clip_uuid);
} else if (status === 'failed') {
console.error(`Clip ${clip_uuid} échoué`);
}
res.status(200).send('OK');
});
async function downloadClip(project_uuid: string, clip_uuid: string) {
// Logique download comme generateTTS
console.log(`Téléchargement ${clip_uuid}`);
}
const PORT = 3000;
app.listen(PORT, () => console.log(`Webhook sur port ${PORT}`));Express handler verifies payload HMAC (add Resemble validation). Respond 200 quickly. Pitfall: no ACK = infinite retries; secure with ngrok for dev.
Best Practices
- Cache clips: Use Redis/Memcached by hash(voice+text) to avoid regenerations (saves 80% quota).
- Optimized SSML:
for narrations; test phonemes /fʁɑ̃.sɛ/ for FR. - Retry & Circuit Breaker: Axios 3x retry + backoff for 5xx.
- Monitoring: Track latency/usage with Prometheus; daily quota alerts.
- Security: Rotate keys monthly, never expose client-side.
Common Errors to Avoid
- Aggressive polling: >1req/s = rate-limit; use 2-5s intervals.
- Degraded source audio: Noise >-40dB = cloning fail; preprocess with FFmpeg
ffmpeg -i input.wav -af silenceremove=1:0:-50dB output.wav. - Invalid SSML: No excessive nesting; validate in Resemble playground.
- Forget webhooks: Polling doesn't scale >100 clips; prioritize callbacks.
Next Steps
- Official docs: Resemble AI API
- Node SDK:
npm i @resembleai/sdk - Advanced: TSX (time-stretch), multi-language fusion.
- Check out our AI trainings at Learni for full voice API mastery.