Skip to content
Learni
View all tutorials
IA Générative

How to Implement Agentic RAG in 2026

Lire en français

Introduction

Agentic RAG (Retrieval-Augmented Generation with agents) revolutionizes generative AI question-answering systems. Unlike classic RAG, where retrieval is always triggered, agents intelligently decide when and how to fetch information: query routing, multi-tool selection, fallback to pure generation, or web search.

Why adopt it in 2026? LLMs like GPT-4o or Llama 3 overcome static knowledge limits, but Agentic RAG handles complex queries (multi-hop, ambiguous) with over 90% precision on RAGAS benchmarks. Picture an assistant that analyzes: "Is this a factual question?" and retrieves only when needed, saving tokens and costs.

This intermediate tutorial guides you through a complete system: FAISS vectorstore, LangChain tools, ReAct agent. Result: an autonomous agent over your documents. Estimated time: 30 min. Ready to supercharge your AI apps? (128 words)

Prerequisites

  • Python 3.11+
  • OpenAI API key (or Grok/HuggingFace)
  • Basic knowledge of LangChain and embeddings
  • pip installed
  • Virtual environment recommended (python -m venv env)

Install Dependencies

terminal
python -m venv agentic-rag-env
source agentic-rag-env/bin/activate  # Linux/Mac
# agentic-rag-env\Scripts\activate  # Windows

pip install langchain langchain-openai langchain-community faiss-cpu
pip install langchain-experimental langchainhub tiktoken

export OPENAI_API_KEY="votre-cle-api-ici"

python --version  # Check 3.11+

This script creates an isolated virtual environment and installs LangChain with OpenAI, FAISS for fast local vectorstore, and agent tools. Set OPENAI_API_KEY for embeddings/LLM. Avoid conflicts by not using global pip.

Understanding the Foundations: RAG vs Agentic RAG

Before the agent, let's recap the flow:

  • Classic RAG: Embed > Retrieve > Augment prompt > Generate. Problem: unnecessary retrieval on creative queries.
  • Agentic RAG: A ReAct agent (Reason + Act) inspects the query, calls tools (retriever, web search, calculator), or generates directly.

Analogy: Like an expert librarian who decides whether to search the shelves or answer from memory. Benefits: +25% accuracy on HybridRAG benchmarks, scalability.

Next step: Prepare the vectorstore with your docs.

Prepare FAISS Vectorstore

setup_vectorstore.py
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
import os

os.environ["OPENAI_API_KEY"] = "votre-cle-api-ici"

# Example documents (replace with your PDFs/Web)
docs_raw = """
La capital de la France est Paris. Paris a la tour Eiffel.
L'IA générative explose depuis 2023 avec GPT-4.
Agentic RAG utilise des agents pour optimiser retrieval.
""".split("\n")

documents = [TextLoader.from_text(doc).load()[0] for doc in docs_raw if doc.strip()]

# Splitter: 500 char chunks, 50 overlap
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.split_documents(documents)

# OpenAI Embeddings + FAISS
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(docs, embeddings)

# Save locally
vectorstore.save_local("faiss_index")
print("Vectorstore ready:", len(vectorstore.index_to_docstore_id))

This code loads example docs, splits them into smart chunks (avoids cutting words), embeds with text-embedding-3-small (low cost, high perf), and creates a persistent local FAISS index. Replace docs_raw with PDFLoader. Pitfall: Forgetting overlap loses context; test len() to validate.

Create Retriever and Tools

Retriever: Interface for top-k relevant docs.

In Agentic RAG, wrap it as a tool: create_retriever_tool so the agent can call it like retrieve_documents(query).

The agent will also have a pure LLM fallback tool. Flow: Query → Agent reasons → Tool? → Observe → Repeat → Final Answer.

Configure Retriever and Tools

tools_retriever.py
from langchain_openai import ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain.tools.retriever import create_retriever_tool
from langchain_core.prompts import PromptTemplate
from langchain import hub

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Load vectorstore
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Retriever tool
retriever_tool = create_retriever_tool(
    retriever,
    "search_documents",
    "Recherche dans la base de documents pour répondre à des questions factuelles sur l'IA et la France. Utilisez toujours pour des faits spécifiques.",
)

print(retriever_tool.invoke("Capital de la France ?"))
# Output: relevant docs

Loads the persistent vectorstore (allow_dangerous for local pickle), creates a top-3 retriever, and a descriptive tool to guide the agent. The description is key: the agent uses it for routing. Test with invoke(); adjust k=3 for precision/speed balance.

Assemble the ReAct Agent

The agent combines LLM + tools + ReAct prompt (hub.pull("hwchase17/react")): Reason (think), Act (tool), Observe (result), until Final Answer.

Custom prompt: Forces retrieval vs generation decision. Bind tools to the LLM.

Create the Agentic RAG Agent

agent_rag.py
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain import hub
from langchain.tools.retriever import create_retriever_tool
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAIEmbeddings

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Vectorstore & retriever (as before)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

retriever_tool = create_retriever_tool(
    retriever,
    "search_agentic_rag",
    "Utile pour questions sur docs : IA, France. Input: question concise.",
)
tools = [retriever_tool]

# ReAct prompt
prompt = hub.pull("hwchase17/react")

agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)

# Test
result = agent_executor.invoke({"input": "Quelle est la capital de la France ?"})
print(result["output"])

Creates the ReAct agent with 1 tool (expandable to web/math), standard LangChainHub prompt, and executor for reasoning loop. verbose=True for debug traces. handle_parsing_errors handles parsing hallucinations. Result: Automatic retrieval + synthesis.

Improvement: Multi-Tools and Advanced Routing

Add tools: TavilySearch (web), PythonREPL (calc). Router: Agent classifies query (factual/creative/math) → appropriate tool.

Custom prompt for optimization.

Multi-Tool Agent with Routing

agent_multi_tools.py
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain import hub
from langchain.tools.retriever import create_retriever_tool
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Vectorstore
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
retriever_tool = create_retriever_tool(retriever, "rag_search", "Docs internes IA/France.")

# Web tool (install pip install tavily-python)
web_search = TavilySearchResults(max_results=2)

tools = [retriever_tool, web_search]

# Custom prompt for routing
custom_prompt = PromptTemplate.from_template(
    """Réponds à {input}. Utilise tools si besoin.
RAG pour faits internes, web_search pour actualités récentes.
{agent_scratchpad}"""
)

agent = create_react_agent(llm, tools, custom_prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)

# Tests
print(agent_executor.invoke({"input": "Événements IA 2026 ?"})["output"])  # Web
print(agent_executor.invoke({"input": "Tour Eiffel ?"})["output"])  # RAG

Adds Tavily web search (free limited API), custom prompt for explicit routing. Agent decides: internal→RAG, external→web. Extend to 5+ tools. Pitfall: Too many tools → confusion; limit with precise descriptions.

Complete Agentic RAG Execution Script

run_agentic_rag.py
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain import hub
from langchain.tools.retriever import create_retriever_tool
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate

# Config
os.environ["OPENAI_API_KEY"] = "votre-cle-api-ici"
os.environ["TAVILY_API_KEY"] = "votre-tavily-key"  # Free: tavily.com

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Ensure vectorstore exists (run setup_vectorstore.py first)
vectorstore = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
retriever_tool = create_retriever_tool(retriever, "rag_search", "Recherche docs internes sur IA et géo.")
web_search = TavilySearchResults(max_results=2)
tools = [retriever_tool, web_search]

prompt = hub.pull("hwchase17/react")
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True, max_iterations=5)

# Simple interface
queries = [
    "Capital France ?",
    "Meilleures pratiques Agentic RAG ?",
    "2+2 ?"  # No tool, pure LLM
]
for q in queries:
    result = agent_executor.invoke({"input": q})
    print(f"Q: {q}\nA: {result['output']}\n---")

All-in-one script: loads everything, handles multi-queries, max_iterations=5 prevents infinite loops. Add TAVILY_API_KEY for web. Copy-paste and run: verbose traces show agent reasoning. Scale with Streamlit for UI.

Best Practices

  • Prompt engineering: Precise tool descriptions + few-shot examples for 95%+ routing.
  • Hybrid retrieval: Combine keyword (BM25) + vectorial via EnsembleRetriever.
  • Evaluation: RAGAS or DeepEval for metrics (faithfulness, context_recall).
  • Security: PromptGuard against injections; rate-limit tools.
  • Scaling: Migrate FAISS to Pinecone/Qdrant; hierarchical agents for >1M docs.

Common Errors to Avoid

  • No max_iterations: Agent loops infinitely → timeout/add limit=5.
  • Embeddings mismatch: Same model for index/query; else recall <50%.
  • Forget verbose=False in prod: Exposes sensitive logs.
  • Non-persistent vectorstore: Loses index each run; always save_local().
  • Temperature >0: Non-deterministic agent; set to 0 for prod.

Next Steps