Introduction
OpenRouter revolutionizes access to AI models in 2026 with a unified OpenAI-compatible API that routes to over 100 providers (Anthropic, OpenAI, Google, Mistral, and more). Unlike direct APIs, OpenRouter automatically handles fallbacks for failing models, optimizes costs via price/latency leaderboards, and supports server-sent events streaming for real-time chats.
Why this expert tutorial? For senior devs: implement custom routing (e.g., prioritize Claude-3.5-Sonnet if under $0.5/M tokens), model scouting (auto-test best model), monitoring via X-Usage headers, and resilience with exponential retries. Result: AI APIs with 99.99% uptime and 30% cost savings. Ideal for SaaS, AI agents, or scalable RAG. We start with a Next.js 15 project and build a production-ready chat API.
Prerequisites
- Node.js 20+ and npm/yarn/pnpm
- Next.js 15+ with TypeScript (npx create-next-app@latest)
- OpenRouter account: dashboard.openrouter.ai for free API key (initial credits)
- Advanced knowledge: OpenAI SDK, async/await, Zod validation, Vercel deployment
- Tools: .env.local, curl for testing
Initialize the Next.js Project
npx create-next-app@latest openrouter-app --typescript --tailwind --eslint --app --src-dir --import-alias "@/*"
cd openrouter-app
npm install openai@latest zod@latest @types/node
npm install -D @types/cors
cp .env.example .env.localCreates a Next.js 15 App Router project with TypeScript and Tailwind. Installs OpenAI SDK (OpenRouter-compatible via baseURL), Zod for input validation, and Node types. Copies .env for secrets. Pitfall: Use --app for modern App Router, not Pages.
Configure .env and OpenRouter Client
import OpenAI from 'openai';
import 'dotenv/config';
const openrouter = new OpenAI({
apiKey: process.env.OPENROUTER_API_KEY ?? '',
baseURL: 'https://openrouter.ai/api/v1',
});
export default openrouter;
export type OpenRouterMessage = OpenAI.Chat.ChatCompletionMessageParam;Sets up a customized OpenAI client for OpenRouter: baseURL points to the proxy API, apiKey via dotenv. Reusable type alias for messages. Pitfall: Verify OPENROUTER_API_KEY in .env.local (e.g., OPENROUTER_API_KEY=sk-or-...). No default provider for flexibility.
First Call: Basic Chat Completion
Let's test a simple call to Claude-3.5-Sonnet, the 2026 leaderboard model for reasoning. OpenRouter adds traceability headers (X-Provider, X-Requested-Model). Use curl to validate before coding.
Implement Basic Chat API Route
import { NextRequest, NextResponse } from 'next/server';
import openrouter from '@/lib/openrouter';
import { z } from 'zod';
const schema = z.object({
model: z.string().default('anthropic/claude-3.5-sonnet'),
messages: z.array(z.object({ role: z.enum(['user', 'system']), content: z.string() })),
});
export async function POST(req: NextRequest) {
try {
const { model, messages } = schema.parse(await req.json());
const completion = await openrouter.chat.completions.create({
model,
messages,
temperature: 0.7,
});
return NextResponse.json({ result: completion.choices[0].message });
} catch (error) {
return NextResponse.json({ error: 'Validation or API failed' }, { status: 400 });
}
}
POST /api/chat route validates inputs with Zod, calls chat.completions. Returns AI message. Analogy: Zod as a gatekeeper against prompt injection. Pitfall: Without try/catch, Zod errors crash; test with curl -X POST http://localhost:3000/api/chat -H 'Content-Type: application/json' -d '{"messages":[{"role":"user","content":"Hello"}] }'.
Add Server-Sent Events Streaming
import { NextRequest, NextResponse } from 'next/server';
import openrouter from '@/lib/openrouter';
import { z } from 'zod';
const schema = z.object({
model: z.string().default('anthropic/claude-3.5-sonnet'),
messages: z.array(z.object({ role: z.enum(['user', 'system']), content: z.string() })),
});
export async function POST(req: NextRequest) {
try {
const { model, messages } = schema.parse(await req.json());
const stream = await openrouter.chat.completions.create({
model,
messages,
stream: true,
temperature: 0.7,
});
const encoder = new TextEncoder();
const streamResponse = new ReadableStream({
async start(controller) {
try {
for await (const chunk of stream) {
const data = chunk.choices[0]?.delta?.content || '';
controller.enqueue(encoder.encode(`data: ${data}\n\n`));
}
controller.enqueue(encoder.encode('data: [DONE]\n\n'));
} catch (err) {
controller.error(err);
} finally {
controller.close();
}
},
});
return new Response(streamResponse, {
headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache' },
});
} catch (error) {
return NextResponse.json({ error: 'Stream failed' }, { status: 500 });
}
}
Enables stream: true for SSE. ReadableStream parses deltas in real-time. SSE headers essential for browsers. Pitfall: Forgetting [DONE] blocks clients; handle errors to avoid memory leaks in production.
Advanced Routing and Fallbacks
OpenRouter excels at intelligent routing: specify provider or models array for automatic fallbacks (e.g., fallback to Gemini if Sonnet is down). Use sort: 'price' to optimize costs.
API with Fallbacks and Custom Routing
import { NextRequest, NextResponse } from 'next/server';
import openrouter from '@/lib/openrouter';
import { z } from 'zod';
const schema = z.object({
messages: z.array(z.object({ role: z.enum(['user', 'system']), content: z.string() })),
});
export async function POST(req: NextRequest) {
try {
const { messages } = schema.parse(await req.json());
const completion = await openrouter.chat.completions.create({
model: 'anthropic/claude-3.5-sonnet',
messages,
provider: { // Fallback si provider principal down
allow_fallbacks: true,
force: false,
order: ['anthropic', 'openai'],
sort: 'auto',
},
temperature: 0.7,
extra_headers: {
'X-Title': 'Mon App IA',
},
});
const usage = completion.usage;
return NextResponse.json({
result: completion.choices[0].message,
usage,
headers: { 'X-Provider': completion.headers?.['x-provider'] },
});
} catch (error: any) {
return NextResponse.json({ error: error.message }, { status: 500 });
}
}
provider object enables ordered fallbacks by sort: 'auto' (latency/price). extra_headers for traceability. Parses usage for billing. Pitfall: Without allow_fallbacks: true, no resilience; log headers['x-provider'] for debugging.
Advanced Monitoring and Model Scouting
import { NextResponse } from 'next/server';
import openrouter from '@/lib/openrouter';
export async function GET() {
try {
const models = await openrouter.models.list({
sort: 'performance',
});
const filtered = models.data.filter(m =>
m.id.includes('claude') || m.id.includes('gpt-4o')
);
return NextResponse.json({ models: filtered.slice(0, 10) });
} catch (error) {
return NextResponse.json({ error: 'Models fetch failed' }, { status: 500 });
}
}
export async function POST() {
// Beascout : test auto meilleur modèle
const testPrompt = [{ role: 'user', content: 'Résume ce JSON en français' }];
const results = await Promise.all([
openrouter.chat.completions.create({ model: 'anthropic/claude-3.5-sonnet', messages: testPrompt }),
openrouter.chat.completions.create({ model: 'openai/gpt-4o', messages: testPrompt }),
]);
const best = results.reduce((best, res, i) =>
(res.usage?.total_tokens ?? 0) < (best.usage?.total_tokens ?? Infinity) ? res : best
);
return NextResponse.json({ bestModel: best.model });
}
GET lists models sorted by performance, filters leaders. POST scouts models by testing/comparing tokens. Analogy: Leaderboard like F1 Top 10 drivers. Pitfall: Rate limits (60 req/min free tier); cache results with Redis in production.
Best Practices
- Always validate with Zod: Protects against injections and API abuse.
- Implement exponential retries with
p-retryfor 99.99% uptime. - Cache responses (Upstash Redis) for repeated prompts, save 50% costs.
- Monitor headers X-Usage, X-Provider via Vercel Logs/Prometheus.
- Secure your key: Vercel env vars, never commit .env; rotate keys monthly.
Common Errors to Avoid
- Forgetting
baseURL: Calls go to OpenAI directly, double billing. - Ignoring rate limits: 20k req/day free; implement queues (BullMQ).
- No
stream: trueparsing: Frontend clients hang on empty deltas. - Fallbacks without
allow_fallbacks: 100% downtime if model down (e.g., Anthropic outages).
Next Steps
Dive deeper with OpenRouter docs. Implement RAG with Pinecone + OpenRouter. Check out our advanced AI trainings at Learni for autonomous agents and multi-model fine-tuning.