Introduction
Fireworks.ai is an ultra-fast AI inference platform hosting open-source models like Llama 3.1, Mistral, or Qwen, optimized for latencies under 100ms. Unlike OpenAI, it excels in cost-efficiency and speed for real-time apps, perfect for chatbots, AI agents, or RAG.
This advanced tutorial guides you through integrating Fireworks.ai into a Next.js 15 app (App Router). We'll build a chatbot with SSE streaming, function/tool calls, robust error handling, and a reactive UI. Why 2026? APIs have evolved with native multimodal model support and serverless fine-tuning.
By the end, you'll have a production-ready, scalable prototype on Vercel. Key gains: 5x faster than Grok or GPT-4o-mini, at 10% of the cost. Ready to supercharge your AI apps? (128 words)
Prerequisites
- Free account on fireworks.ai with an API key (credits included on signup)
- Node.js 20+ and npm/yarn/pnpm
- Next.js 15+ (App Router)
- Advanced knowledge of TypeScript, React Server Components, and ReadableStreams
- Editor like VS Code with TypeScript extension
- Testing tools: curl or Postman
Initialize the Next.js Project
npx create-next-app@latest fireworks-chatbot --typescript --tailwind --eslint --app --src-dir --import-alias "@/*"
cd fireworks-chatbot
npm install
npm install @types/node
This command creates an optimized Next.js 15 project with TypeScript and Tailwind. The --app flag enables the App Router for Server Actions and native streaming. Skip default templates for full dependency control.
Get and Configure the API Key
- Sign up at app.fireworks.ai.
- Go to Account Settings > API Keys and generate a key (format
fwk-...). - In Explore Models, test Llama-3.1-70B-Instruct to validate (intuitive Playground interface with live streaming).
Environment Configuration
FIREWORKS_API_KEY=sk-fwk-YOUR_KEY_HERE
NEXT_PUBLIC_APP_URL=http://localhost:3000
MODEL_ID=accounts/fireworks/models/llama-v3p1-70b-instruct
MAX_TOKENS=4096
TEMPERATURE=0.7
This file sets critical environment variables. MODEL_ID points to a specific model (check the Fireworks interface). Add .env.local to .gitignore. Use process.env at runtime for security; avoid client-side leaks.
Basic Chat API Route
import { NextRequest, NextResponse } from 'next/server';
type Message = { role: 'user' | 'assistant'; content: string };
export async function POST(req: NextRequest) {
try {
const { messages } = await req.json() as { messages: Message[] };
const apiKey = process.env.FIREWORKS_API_KEY!;
const model = process.env.MODEL_ID!;
const response = await fetch('https://api.fireworks.ai/inference/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model,
messages,
max_tokens: parseInt(process.env.MAX_TOKENS || '4096'),
temperature: parseFloat(process.env.TEMPERATURE || '0.7'),
}),
});
if (!response.ok) throw new Error(`API error: ${response.status}`);
const data = await response.json();
return NextResponse.json({ content: data.choices[0].message.content });
} catch (error) {
return NextResponse.json({ error: 'Generation error' }, { status: 500 });
}
}
This POST route handles simple chat with message history (OpenAI-compatible format). It uses native fetch for granular control. Test with curl: curl -X POST .... Pitfall: always validate non-empty messages and handle rate limits (429).
Upgrade to SSE Streaming
For a smooth ChatGPT-like UX, implement Server-Sent Events (SSE). Fireworks natively supports stream: true. The UI receives tokens in real-time via EventSource or fetch. Benefit: perceived latency <200ms.
API Route with SSE Streaming
import { NextRequest, NextResponse } from 'next/server';
import { Readable } from 'stream';
type Message = { role: 'user' | 'assistant'; content: string };
export async function POST(req: NextRequest) {
try {
const { messages } = await req.json() as { messages: Message[] };
const apiKey = process.env.FIREWORKS_API_KEY!;
const model = process.env.MODEL_ID!;
const response = await fetch('https://api.fireworks.ai/inference/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model,
messages,
max_tokens: parseInt(process.env.MAX_TOKENS || '4096'),
temperature: parseFloat(process.env.TEMPERATURE || '0.7'),
stream: true,
}),
});
if (!response.ok) throw new Error(`API error: ${response.status}`);
const stream = Readable.fromWeb(response.body!);
return new NextResponse(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Access-Control-Allow-Origin': '*',
},
});
} catch (error) {
return NextResponse.json({ error: 'Stream error' }, { status: 500 });
}
}
This route streams tokens via SSE. Readable.fromWeb() converts the body to a Node stream. Essential SSE headers for CORS and no-cache. Pitfall: always close streams properly; test with curl --no-buffer. Scales on Vercel Edge.
Chat UI Component with Streaming
'use client';
import { useState, useRef, useEffect } from 'react';
export default function ChatPage() {
const [messages, setMessages] = useState<{ role: string; content: string }[]>([]);
const [input, setInput] = useState('');
const [loading, setLoading] = useState(false);
const messagesEndRef = useRef<HTMLDivElement>(null);
const scrollToBottom = () => {
messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
};
useEffect(scrollToBottom, [messages]);
const sendMessage = async () => {
if (!input.trim() || loading) return;
const userMsg = { role: 'user', content: input };
setMessages((prev) => [...prev, userMsg]);
setLoading(true);
setInput('');
const res = await fetch('/api/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages: [...messages, userMsg] }),
});
const reader = res.body?.getReader();
const decoder = new TextDecoder();
let assistantMsg = { role: 'assistant', content: '' };
while (true) {
const { done, value } = await reader!.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') continue;
const parsed = JSON.parse(data);
assistantMsg.content += parsed.choices[0].delta.content || '';
setMessages((prev) => {
const newMsgs = [...prev];
newMsgs[newMsgs.length - 1] = assistantMsg;
return newMsgs;
});
}
}
}
setMessages((prev) => [...prev, assistantMsg]);
setLoading(false);
};
return (
<div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
<div className="flex-1 overflow-y-auto space-y-4 mb-4">
{messages.map((msg, i) => (
<div key={i} className={`flex ${msg.role === 'user' ? 'justify-end' : 'justify-start'}`}>
<div className={`max-w-xs p-3 rounded-lg ${msg.role === 'user' ? 'bg-blue-500 text-white' : 'bg-gray-200'}`}>
{msg.content}
</div>
</div>
))}
{loading && (
<div className="flex justify-start">
<div className="max-w-xs p-3 bg-gray-200 rounded-lg">...</div>
</div>
)}
<div ref={messagesEndRef} />
</div>
<div className="flex space-x-2">
<input
value={input}
onChange={(e) => setInput(e.target.value)}
className="flex-1 p-3 border rounded-lg"
onKeyDown={(e) => e.key === 'Enter' && sendMessage()}
placeholder="Type your message..."
/>
<button
onClick={sendMessage}
disabled={loading}
className="px-6 py-3 bg-blue-500 text-white rounded-lg disabled:opacity-50"
>
Send
</button>
</div>
</div>
);
}
Client-side component using useState for messages and streaming via ReadableStream. Parses SSE chunks (OpenAI format). Smooth auto-scroll. Pitfall: handle null delta.content and backpressure; optimize with Suspense for SSR.
Add Support for Tools/Functions
import { NextRequest, NextResponse } from 'next/server';
type Message = { role: 'user' | 'assistant' | 'tool'; content: string; tool_calls?: any[] };
export async function POST(req: NextRequest) {
const { messages } = await req.json() as { messages: Message[] };
const apiKey = process.env.FIREWORKS_API_KEY!;
const model = process.env.MODEL_ID!;
// Example tool: calculator
const tools = [
{
type: 'function',
function: {
name: 'calculate',
description: 'Calculate a mathematical operation',
parameters: {
type: 'object',
properties: {
expression: { type: 'string', description: 'e.g., 2+2*3' },
},
required: ['expression'],
},
},
},
];
let finalMessages = [...messages];
const response = await fetch('https://api.fireworks.ai/inference/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model,
messages: finalMessages,
tools,
tool_choice: 'auto',
temperature: 0.1,
}),
});
const data = await response.json();
const choice = data.choices[0];
if (choice.message.tool_calls) {
for (const toolCall of choice.message.tool_calls) {
const funcName = toolCall.function.name;
const args = JSON.parse(toolCall.function.arguments);
let toolResult = '';
if (funcName === 'calculate') {
try {
toolResult = eval(args.expression).toString(); // DANGER: in prod use mathjs
} catch {
toolResult = 'Calculation error';
}
}
finalMessages.push({
role: 'tool',
tool_call_id: toolCall.id,
content: toolResult,
});
// Recursive call for final response
const finalRes = await fetch('https://api.fireworks.ai/inference/v1/chat/completions', {
method: 'POST',
headers: { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' },
body: JSON.stringify({ model, messages: finalMessages }),
});
const finalData = await finalRes.json();
return NextResponse.json({ content: finalData.choices[0].message.content });
}
}
return NextResponse.json({ content: choice.message.content });
}
Adds function calling for smart agents (e.g., calculator). Loops through tool calls + final response. Recursive for tool chains. Pitfall: eval() is insecure (use safe-eval or math.js in prod); limit depth to avoid infinite loops.
Run and Test the App
Run npm run dev. Visit /chat. Test streaming and tools: "Calculate 15 * 3 + 7". Check server logs for debugging. Deploy to Vercel: vercel --prod (env vars auto-detected).
Best Practices
- Choose the right model: Llama-3.1-8B for latency, 70B for accuracy (test via Fireworks interface).
- Rate limiting: Use Upstash Redis or Next.js middleware (max 60 req/min).
- Caching: Cache system prompts with Vercel KV; reuse long contexts.
- Security: Validate inputs with Zod, secrets in env, no client-side keys.
- Monitoring: Integrate OpenTelemetry for latency tracing (Fireworks metrics dashboard).
Common Errors to Avoid
- Forget
stream: true: Batched responses, slow UX. - No null delta handling: UI freezes on empty chunks.
- Ignore rate limits: 429 errors; implement exponential backoff retries.
- Non-instruct model: Incoherent responses; stick to *-Instruct variants.
Next Steps
- Serverless fine-tuning on Fireworks: docs.
- RAG with Fireworks + Pinecone.
- Multimodal (vision) with Llava-1.6.