Skip to content
Learni
View all tutorials
IA et Machine Learning

How to Implement Advanced Agentic RAG in 2026

Lire en français

Introduction

Agentic RAG (Retrieval-Augmented Generation with agents) goes beyond classic RAG by incorporating autonomous LLM agents that dynamically decide on retrieval, route to specialized tools, and manage multi-step queries. Unlike linear RAG, where a simple embed + retrieve works, Agentic RAG shines on complex tasks like multi-source financial analysis or iterative diagnostics.

Why is it essential in 2026? LLMs like GPT-4o or Llama 3 overcome hallucinations with these agents, boosting accuracy by 30-50% on benchmarks like GAIA. This advanced tutorial walks you through building a complete system with LangChain, ChromaDB for the vector store, and custom tools (retriever, calculator, simulated web search). The result: an agent that reasons, retrieves, and acts like an expert. Ready to bookmark this actionable guide? (142 words)

Prerequisites

  • Python 3.11+
  • OpenAI API key (or Grok/HuggingFace for local)
  • Advanced knowledge of LangChain, embeddings, and vector stores
  • Minimum 4 GB RAM for local testing
  • pip install langchain, chromadb, openai, langchain-openai, langchain-community, langchain-experimental

Installing Dependencies

terminal
pip install langchain langchain-openai langchain-community langchain-experimental chromadb pypdf tiktoken

mkdir agentic_rag_project
cd agentic_rag_project

# Create a .env file for the keys
cat > .env << EOF
OPENAI_API_KEY=sk-your-key-here
EOF

This command installs LangChain and its agent extensions, OpenAI for LLMs, ChromaDB for persistent vector stores, and PyPDF for loading docs. The .env file secures your API key; always load it with dotenv to avoid leaks in production.

Preparing Documents and Vector Store

Before building the agent, create a robust vector store. Think of it as an indexed library: the agent pulls from it intelligently. We'll use OpenAIEmbeddings for quality and Chroma for persistence.

Creating the Vector Store with Documents

vector_store.py
import os
from dotenv import load_dotenv
import chromadb
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.docstore.document import Document

load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

# Simulate a PDF; replace with your real docs
fake_docs = [
    Document(page_content="L'Agentic RAG utilise des agents pour router dynamiquement les queries vers retrieval ou tools.", metadata={"source": "rag101.pdf"}),
    Document(page_content="Exemple : query financière -> retrieve docs SEC + calcule ratios.", metadata={"source": "finance.pdf"}),
    Document(page_content="Multi-hop : question 1 retrieve, question 2 raffine avec tool web.", metadata={"source": "advanced.pdf"})
]

# Split the docs (chunk_size=1000 for balance of precision/speed)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(fake_docs)

# Embeddings and vector store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory="./chroma_db"
)
vectorstore.persist()
print("Vector store created and persisted.")

This script loads/splits documents (simulated here; use PyPDFLoader for real PDFs), applies OpenAI embeddings, and persists to ChromaDB. The 200-token overlap prevents context loss; adjust chunk_size based on your docs (too small = noise, too large = LLM overload).

Defining the Retriever as an Agent Tool

The agent treats the retriever as a bindable tool. Analogy: a detective (agent) who chooses to query the database (retriever) or other tools. Use create_retriever_tool to integrate it seamlessly.

Retriever Tool and Complementary Tools

tools.py
from langchain.tools.retriever import create_retriever_tool
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Retriever tool from persistent vectorstore
vectorstore = Chroma(persist_directory="./chroma_db", embedding_function=OpenAIEmbeddings())
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
rag_tool = create_retriever_tool(
    retriever,
    "rag_search",
    "Recherche sémantique dans la base de connaissances Agentic RAG. Utile pour facts précis sur RAG et agents."
)

@tool
def calculator(expression: str) -> str:
    """Évalue des expressions mathématiques."""
    try:
        return str(eval(expression))
    except:
        return "Erreur de calcul."

# Simulated web search
search = DuckDuckGoSearchAPIWrapper()

print("Tools created: rag_search, calculator, search.")

We create a retriever tool configured with k=4 relevant docs, a calculator for math reasoning, and DuckDuckGo for web fallback. The agent will automatically route: RAG for internal knowledge, tools for external/calculations.

Assembling the Basic Agent

The main agent uses create_react_agent for ReAct (Reason + Act). It observes, thinks, and acts in a loop until resolution. Custom prompt guides it to prioritize RAG.

Basic ReAct Agent with RAG

basic_agent.py
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub

from tools import llm, rag_tool, calculator, search  # Import from tools.py

tools = [rag_tool, calculator, search]

# Optimized ReAct prompt for Agentic RAG
prompt = hub.pull("hwchase17/react")
prompt.messages[0].prompt.template = """
Tu es un agent expert en RAG. Pour toute query liée à RAG/agents, utilise d'abord rag_search.
Si besoin de calcul, calculator. Pour info externe, search.
Raisonnement étape par étape, puis Final Answer.
{tools}
{agent_scratchpad}
"""

agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)

# Test
result = agent_executor.invoke({"input": "Explique Agentic RAG et calcule 15% de 250k."})
print(result["output"])

The ReAct agent is created with a custom prompt prioritizing RAG. AgentExecutor handles loop execution with verbose logging for debugging. handle_parsing_errors=True tolerates LLM parsing issues; the test combines retrieval + calculation to validate routing.

Evolving to Multi-Hop and Reflective Agent

For advanced use: multi-hop (chained queries) and self-reflection (critiques its own retrievals). Use ReflexionZero for auto-improvement, ideal for >95% precision.

Advanced Agent with Reflexivity

advanced_agent.py
from langchain_experimental.agents import ReflexionZeroAgent
from langchain_core.prompts import ChatPromptTemplate

from tools import llm, tools  # rag_tool, calculator, search

# Reflective prompt
reflexion_prompt = ChatPromptTemplate.from_messages([
    ("system", "Tu es un agent réflexif. Réfléchis sur tes actions passées pour améliorer. Critique : pertinent ? Complet ?"),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}")
])

# ReflexionZero agent (multi-hop + self-critique)
agent = ReflexionZeroAgent(
    llm=llm,
    tools=tools,
    reflection_prompt=reflexion_prompt,
    max_iterations=4,  # Limit to avoid infinite loops
    max_reflection=2
)

# Execution
result = agent.run("Analyse multi-étapes : Quel est l'impact d'Agentic RAG sur les coûts LLM ? Estime pour 1M queries (coût embed=0.0001$/1k tokens).")
print(result)

ReflexionZeroAgent adds self-reflection: it critiques and iterates after each cycle. max_iterations=4 prevents excessive costs; perfect for analytical queries. Combines RAG + calc + search for hybrid reasoning.

Deployment and Monitoring

Make it production-ready with LangServe for APIs or Streamlit for UI. Monitor with LangSmith for agent tracing.

Full Test Script and Streamlit Deployment

app.py
import streamlit as st
import os
from dotenv import load_dotenv
from basic_agent import agent_executor  # Or advanced_agent

load_dotenv()

st.title("🧠 Agentic RAG Demo")
query = st.text_input("Posez votre question complexe :")

if st.button("Exécuter") and query:
    with st.spinner("L'agent raisonne..."):
        result = agent_executor.invoke({"input": query})
    st.success("Réponse :")
    st.write(result["output"])

# Run: streamlit run app.py

Streamlit UI for interactive testing. Imports the agent with a user-friendly spinner. Deploy on Streamlit Cloud for sharing; integrate LangSmith (LANGCHAIN_API_KEY) for detailed traces in production.

Best Practices

  • Prioritize RAG early: Prompt engineering to force internal retrieval before external tools (cuts costs).
  • Limit iterations: max_steps=5-8 for agents; use early stopping on confidence >0.9.
  • Hybrid search: Combine BM25 + semantic in retriever for +20% recall.
  • Tool caching: Redis to memoize identical retrievals.
  • Evaluate with RAGAS: Metrics like faithfulness/context_recall for iteration.

Common Errors to Avoid

  • Forgetting persist_directory: ChromaDB is volatile = index lost on every run; always persist.
  • High temperature: >0.2 causes non-determinism; set to 0 for production.
  • No parsing error handling: Buggy LLMs crash; add handle_parsing_errors=True.
  • Chunks too large: >2000 tokens overload LLM context; strict split + overlap.

Next Steps

Dive into our advanced AI training at Learni on LangGraph for multi-agents. Resources: LangChain Agents Docs, RAGAS Framework, 'ReAct' and 'Reflexion' papers. Contribute on GitHub for Agentic RAG benchmarks.