Introduction
The Mistral API, developed by Mistral AI, provides a powerful interface to access cutting-edge language models like Mistral Nemo or Mistral Large. In 2026, it's a top choice for developers seeking a performant open-source alternative to OpenAI, with optimized costs and reduced latency. This tutorial walks you through integrating it into a Node.js app, from your first API call to real-time streaming.
Why Mistral? It excels in multilingual tasks (including French), supports complex reasoning, and offers embeddings for semantic search. Imagine building a conversational chatbot or code assistant: in just a few lines, you can generate smart responses. We cover the basics for beginners with 100% working code. By the end, you'll be ready to scale your AI projects seamlessly.
Prerequisites
- Node.js 20+ installed
- A free account on console.mistral.ai to get an API key
- Basic JavaScript knowledge (async/await)
- A code editor like VS Code
Initialize the project and install dependencies
mkdir mistral-api-demo
cd mistral-api-demo
npm init -y
npm install @mistralai/mistralai dotenv
npm install -D nodemonThese commands create an empty Node.js project, install the official Mistral SDK for simplified API calls, and dotenv for secure API key management. nodemon is optional for development (auto-restarts). Run them in your terminal for a ready setup in 30 seconds.
Configure the API key
Create a .env file in the root directory:
MISTRAL_API_KEY=your_api_key_here
Replace your_api_key_here with the key from console.mistral.ai. Add .env to .gitignore to avoid committing it. The SDK automatically loads this variable via dotenv.
First call: Simple chat completion
require('dotenv').config();
const { Mistral } = require('@mistralai/mistralai');
const client = new Mistral({ apiKey: process.env.MISTRAL_API_KEY });
async function chatSimple() {
const response = await client.chat.completions.create({
model: 'mistral-tiny',
messages: [
{
role: 'user',
content: 'Explique-moi l\'IA en 3 phrases simples.'
}
]
});
console.log(response.choices[0].message.content);
}
chatSimple().catch(console.error);This complete script sends a user message to the fast, cost-effective 'mistral-tiny' model and prints the response. Use async/await to handle promises. Run with node chat-simple.js. Common pitfall: Ensure .env is loaded before creating the client.
Add a system prompt
require('dotenv').config();
const { Mistral } = require('@mistralai/mistralai');
const client = new Mistral({ apiKey: process.env.MISTRAL_API_KEY });
async function chatAvecSystem() {
const response = await client.chat.completions.create({
model: 'mistral-tiny',
messages: [
{
role: 'system',
content: 'Tu es un assistant expert en programmation. Réponds de manière concise et technique.'
},
{
role: 'user',
content: 'Comment optimiser une boucle for en JavaScript ?'
}
],
max_tokens: 200
});
console.log(response.choices[0].message.content);
}
chatAvecSystem().catch(console.error);The 'system' role sets the model's behavior, like a technical coach here. Add 'max_tokens' to limit responses and control costs. Think of it as giving instructions to an employee before a task. Test with node chat-system.js.
Managing conversation history
For a realistic chatbot, pass an array of messages with alternating roles (system, user, assistant). This maintains context, improving response coherence.
Conversation with history
require('dotenv').config();
const { Mistral } = require('@mistralai/mistralai');
const client = new Mistral({ apiKey: process.env.MISTRAL_API_KEY });
async function conversation() {
const messages = [
{
role: 'system',
content: 'Tu es un guide touristique français.'
},
{
role: 'user',
content: 'Quels sont les meilleurs sites à Paris ?'
},
{
role: 'assistant',
content: 'La Tour Eiffel, le Louvre et Notre-Dame sont incontournables.'
},
{
role: 'user',
content: 'Plus de détails sur le Louvre ?'
}
];
const response = await client.chat.completions.create({
model: 'open-mistral-nemo',
messages: messages
});
console.log('Assistant:', response.choices[0].message.content);
}
conversation().catch(console.error);History simulates a real conversation: the model 'sees' previous exchanges. Use 'open-mistral-nemo' for more power. Update 'messages' dynamically in a real app. Avoid overly long histories (>10k tokens) to stay within limits.
Stream responses in real time
require('dotenv').config();
const { Mistral } = require('@mistralai/mistralai');
const client = new Mistral({ apiKey: process.env.MISTRAL_API_KEY });
async function chatStream() {
const stream = await client.chat.completions.create({
model: 'mistral-tiny',
messages: [{ role: 'user', content: 'Raconte une histoire courte et effrayante.' }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
console.log('\n');
}
chatStream().catch(console.error);Enable 'stream: true' to receive tokens as they generate, perfect for reactive UIs. Use a 'for await' loop to iterate. It's like watching a movie load token by token. Run node chat-stream.js and see the magic.
Generate embeddings
require('dotenv').config();
const { Mistral } = require('@mistralai/mistralai');
const client = new Mistral({ apiKey: process.env.MISTRAL_API_KEY });
async function genererEmbedding() {
const response = await client.embeddings.create({
model: 'mistral-embed',
input: 'Recherche sémantique pour IA en français.'
});
const embedding = response.embeddings[0].embedding;
console.log('Embedding (premiers 10 dims):', embedding.slice(0, 10));
console.log('Dimension totale:', embedding.length);
}
genererEmbedding().catch(console.error);Embeddings convert text to vectors for similarity tasks (e.g., search). 'mistral-embed' produces 1024-dimensional vectors. Great for RAG or clustering. Avoid overly long inputs (>512 tokens).
Best practices
- Always use environment variables for the API key: never hardcode it.
- Limit tokens with
max_tokensand monitor quotas in the Mistral console. - Handle errors with try/catch and retries for rate limits (e.g., exponential backoff).
- Choose the right model: 'mistral-tiny' for quick tests, 'mistral-large' for precision.
- Cache repetitive responses with Redis to scale.
Common errors to avoid
- Forgetting
dotenv.config(): API key is undefined, 401 error. - Overly large history: exceeds context window (128k tokens max), truncates response.
- Ignoring rate limits: 100 req/min, implement queues.
- Not streaming for interactive apps: high perceived latency.
Next steps
Master fine-tuning and agents with our Learni trainings. Check the official Mistral docs, Mistral playground for testing, or explore LangChain for complex chains.