Skip to content
Learni
View all tutorials
Intelligence Artificielle

How to Integrate the Groq API for Fast LLMs in 2026

14 minINTERMEDIATE
Lire en français

Introduction

Groq delivers unmatched LLM inference speeds thanks to its LPU hardware. For intermediate developers, mastering its API enables building responsive applications such as chatbots, agents, or RAG pipelines. This tutorial covers installing the official SDK, basic calls, real-time streaming, tool usage, and robust error handling. You will get 100% functional, production-ready code. The focus is on TypeScript for type safety and maintainability.

Prerequisites

  • Node.js 20+
  • Groq account with an API key
  • Basic knowledge of TypeScript and async/await
  • npm or pnpm

Installation and Configuration

terminal
npm install groq-sdk dotenv

# .env
GROQ_API_KEY=gsk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The official groq-sdk handles authentication and retries. dotenv securely loads the API key without exposing it in source code.

Client Initialization

src/groqClient.ts
import Groq from 'groq-sdk';
import 'dotenv/config';

export const groq = new Groq({
  apiKey: process.env.GROQ_API_KEY!,
});

Create a reusable singleton client. The TypeScript exclamation mark ensures the key exists at runtime.

Simple Completion Call

src/simpleChat.ts
import { groq } from './groqClient';

async function simpleChat() {
  const completion = await groq.chat.completions.create({
    model: 'llama-3.3-70b-versatile',
    messages: [{ role: 'user', content: 'Explique Groq en une phrase' }],
    temperature: 0.7,
    max_tokens: 200,
  });
  console.log(completion.choices[0].message.content);
}

simpleChat();

Basic synchronous call to Llama 3.3. Choose the fastest model based on your use case.

Streaming Responses

src/streamingChat.ts
import { groq } from './groqClient';

async function streamChat() {
  const stream = await groq.chat.completions.create({
    model: 'llama-3.3-70b-versatile',
    messages: [{ role: 'user', content: 'Raconte une histoire courte' }],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    process.stdout.write(content);
  }
}

streamChat();

Streaming provides a smooth user experience. Handle empty chunks to avoid display errors.

Advanced Function Calling

src/toolsChat.ts
import { groq } from './groqClient';

async function toolsChat() {
  const tools = [{
    type: 'function' as const,
    function: {
      name: 'getWeather',
      description: 'Obtenir la météo',
      parameters: {
        type: 'object',
        properties: { city: { type: 'string' } },
        required: ['city'],
      },
    },
  }];

  const completion = await groq.chat.completions.create({
    model: 'llama-3.3-70b-versatile',
    messages: [{ role: 'user', content: 'Météo à Paris ?' }],
    tools,
    tool_choice: 'auto',
  });

  console.log(completion.choices[0].message.tool_calls);
}

toolsChat();

Groq natively supports tools. Always validate received arguments before execution.

Best Practices

  • Always use updated models and test latencies
  • Implement retry logic with exponential backoff
  • Limit max_tokens and track costs through logging
  • Validate tool JSON outputs with Zod
  • Cache frequent responses with Redis

Common Errors to Avoid

  • Forgetting to handle rate limits (429) and timeouts
  • Not typing tool_calls with TypeScript
  • Using overly long prompts without truncation
  • Ignoring parsing errors from streamed responses

Going Further

Explore our advanced training on LLM agents and inference optimization: https://learni-group.com/formations