Introduction
Agentic RAG (Retrieval-Augmented Generation with agents) revolutionizes generative AI question-answering systems. Unlike classic RAG, where retrieval is always triggered, agents intelligently decide when and how to fetch information: query routing, multi-tool selection, fallback to pure generation, or web search.
Why adopt it in 2026? LLMs like GPT-4o or Llama 3 overcome static knowledge limits, but Agentic RAG handles complex queries (multi-hop, ambiguous) with over 90% precision on RAGAS benchmarks. Picture an assistant that analyzes: "Is this a factual question?" and retrieves only when needed, saving tokens and costs.
This intermediate tutorial guides you through a complete system: FAISS vectorstore, LangChain tools, ReAct agent. Result: an autonomous agent over your documents. Estimated time: 30 min. Ready to supercharge your AI apps? (128 words)
Prerequisites
- Python 3.11+
- OpenAI API key (or Grok/HuggingFace)
- Basic knowledge of LangChain and embeddings
- pip installed
- Virtual environment recommended (
python -m venv env)
Install Dependencies
python -m venv agentic-rag-env
source agentic-rag-env/bin/activate # Linux/Mac
# agentic-rag-env\Scripts\activate # Windows
pip install langchain langchain-openai langchain-community faiss-cpu
pip install langchain-experimental langchainhub tiktoken
export OPENAI_API_KEY="votre-cle-api-ici"
python --version # Check 3.11+This script creates an isolated virtual environment and installs LangChain with OpenAI, FAISS for fast local vectorstore, and agent tools. Set OPENAI_API_KEY for embeddings/LLM. Avoid conflicts by not using global pip.
Understanding the Foundations: RAG vs Agentic RAG
Before the agent, let's recap the flow:
- Classic RAG: Embed > Retrieve > Augment prompt > Generate. Problem: unnecessary retrieval on creative queries.
- Agentic RAG: A ReAct agent (Reason + Act) inspects the query, calls tools (retriever, web search, calculator), or generates directly.
Analogy: Like an expert librarian who decides whether to search the shelves or answer from memory. Benefits: +25% accuracy on HybridRAG benchmarks, scalability.
Next step: Prepare the vectorstore with your docs.
Prepare FAISS Vectorstore
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
import os
os.environ["OPENAI_API_KEY"] = "votre-cle-api-ici"
# Example documents (replace with your PDFs/Web)
docs_raw = """
La capital de la France est Paris. Paris a la tour Eiffel.
L'IA générative explose depuis 2023 avec GPT-4.
Agentic RAG utilise des agents pour optimiser retrieval.
""".split("\n")
documents = [TextLoader.from_text(doc).load()[0] for doc in docs_raw if doc.strip()]
# Splitter: 500 char chunks, 50 overlap
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.split_documents(documents)
# OpenAI Embeddings + FAISS
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(docs, embeddings)
# Save locally
vectorstore.save_local("faiss_index")
print("Vectorstore ready:", len(vectorstore.index_to_docstore_id))This code loads example docs, splits them into smart chunks (avoids cutting words), embeds with text-embedding-3-small (low cost, high perf), and creates a persistent local FAISS index. Replace docs_raw with PDFLoader. Pitfall: Forgetting overlap loses context; test len() to validate.
Create Retriever and Tools
Retriever: Interface for top-k relevant docs.
In Agentic RAG, wrap it as a tool: create_retriever_tool so the agent can call it like retrieve_documents(query).
The agent will also have a pure LLM fallback tool. Flow: Query → Agent reasons → Tool? → Observe → Repeat → Final Answer.
Configure Retriever and Tools
from langchain_openai import ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain.tools.retriever import create_retriever_tool
from langchain_core.prompts import PromptTemplate
from langchain import hub
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Load vectorstore
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
# Retriever tool
retriever_tool = create_retriever_tool(
retriever,
"search_documents",
"Recherche dans la base de documents pour répondre à des questions factuelles sur l'IA et la France. Utilisez toujours pour des faits spécifiques.",
)
print(retriever_tool.invoke("Capital de la France ?"))
# Output: relevant docsLoads the persistent vectorstore (allow_dangerous for local pickle), creates a top-3 retriever, and a descriptive tool to guide the agent. The description is key: the agent uses it for routing. Test with invoke(); adjust k=3 for precision/speed balance.
Assemble the ReAct Agent
The agent combines LLM + tools + ReAct prompt (hub.pull("hwchase17/react")): Reason (think), Act (tool), Observe (result), until Final Answer.
Custom prompt: Forces retrieval vs generation decision. Bind tools to the LLM.
Create the Agentic RAG Agent
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain import hub
from langchain.tools.retriever import create_retriever_tool
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAIEmbeddings
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Vectorstore & retriever (as before)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
retriever_tool = create_retriever_tool(
retriever,
"search_agentic_rag",
"Utile pour questions sur docs : IA, France. Input: question concise.",
)
tools = [retriever_tool]
# ReAct prompt
prompt = hub.pull("hwchase17/react")
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)
# Test
result = agent_executor.invoke({"input": "Quelle est la capital de la France ?"})
print(result["output"])Creates the ReAct agent with 1 tool (expandable to web/math), standard LangChainHub prompt, and executor for reasoning loop. verbose=True for debug traces. handle_parsing_errors handles parsing hallucinations. Result: Automatic retrieval + synthesis.
Improvement: Multi-Tools and Advanced Routing
Add tools: TavilySearch (web), PythonREPL (calc). Router: Agent classifies query (factual/creative/math) → appropriate tool.
Custom prompt for optimization.
Multi-Tool Agent with Routing
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain import hub
from langchain.tools.retriever import create_retriever_tool
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Vectorstore
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
retriever_tool = create_retriever_tool(retriever, "rag_search", "Docs internes IA/France.")
# Web tool (install pip install tavily-python)
web_search = TavilySearchResults(max_results=2)
tools = [retriever_tool, web_search]
# Custom prompt for routing
custom_prompt = PromptTemplate.from_template(
"""Réponds à {input}. Utilise tools si besoin.
RAG pour faits internes, web_search pour actualités récentes.
{agent_scratchpad}"""
)
agent = create_react_agent(llm, tools, custom_prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)
# Tests
print(agent_executor.invoke({"input": "Événements IA 2026 ?"})["output"]) # Web
print(agent_executor.invoke({"input": "Tour Eiffel ?"})["output"]) # RAGAdds Tavily web search (free limited API), custom prompt for explicit routing. Agent decides: internal→RAG, external→web. Extend to 5+ tools. Pitfall: Too many tools → confusion; limit with precise descriptions.
Complete Agentic RAG Execution Script
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain import hub
from langchain.tools.retriever import create_retriever_tool
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
# Config
os.environ["OPENAI_API_KEY"] = "votre-cle-api-ici"
os.environ["TAVILY_API_KEY"] = "votre-tavily-key" # Free: tavily.com
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Ensure vectorstore exists (run setup_vectorstore.py first)
vectorstore = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
retriever_tool = create_retriever_tool(retriever, "rag_search", "Recherche docs internes sur IA et géo.")
web_search = TavilySearchResults(max_results=2)
tools = [retriever_tool, web_search]
prompt = hub.pull("hwchase17/react")
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True, max_iterations=5)
# Simple interface
queries = [
"Capital France ?",
"Meilleures pratiques Agentic RAG ?",
"2+2 ?" # No tool, pure LLM
]
for q in queries:
result = agent_executor.invoke({"input": q})
print(f"Q: {q}\nA: {result['output']}\n---")All-in-one script: loads everything, handles multi-queries, max_iterations=5 prevents infinite loops. Add TAVILY_API_KEY for web. Copy-paste and run: verbose traces show agent reasoning. Scale with Streamlit for UI.
Best Practices
- Prompt engineering: Precise tool descriptions + few-shot examples for 95%+ routing.
- Hybrid retrieval: Combine keyword (BM25) + vectorial via EnsembleRetriever.
- Evaluation: RAGAS or DeepEval for metrics (faithfulness, context_recall).
- Security: PromptGuard against injections; rate-limit tools.
- Scaling: Migrate FAISS to Pinecone/Qdrant; hierarchical agents for >1M docs.
Common Errors to Avoid
- No max_iterations: Agent loops infinitely → timeout/add limit=5.
- Embeddings mismatch: Same model for index/query; else recall <50%.
- Forget verbose=False in prod: Exposes sensitive logs.
- Non-persistent vectorstore: Loses index each run; always save_local().
- Temperature >0: Non-deterministic agent; set to 0 for prod.
Next Steps
- LangChain Agents Docs: langchain-ai/langchain
- Benchmarks: RAGAS framework
- Advanced: LlamaIndex for Agentic RAG, or AutoGen multi-agents.
- Learni Dev Training: Certified Generative AI & Agents courses (2026 sessions).