Skip to content
Learni
View all tutorials
Intelligence Artificielle

How to Implement FAISS for Vector Search in 2026

Lire en français

Introduction

FAISS (Facebook AI Similarity Search) is the reference library for similarity search on dense vectors. In 2026, it remains essential for RAG systems, recommendation engines, and large-scale duplicate detection. This expert tutorial covers optimized installation, advanced index creation (IVF, HNSW), GPU acceleration, persistence, and production strategies. Each step includes complete, functional code you can copy directly.

Prerequisites

  • Python 3.10+
  • CUDA 12.4+ and recent NVIDIA drivers (for GPU)
  • Solid knowledge of numpy and vector embeddings
  • pip and build essentials

Optimized Installation

terminal
pip install numpy faiss-gpu-cu12
python -c "import faiss; print(faiss.__version__)"

The faiss-gpu-cu12 installation enables CUDA 12 acceleration. Always verify NVIDIA driver compatibility before installing.

Creating an IVF Index

create_ivf_index.py
import faiss
import numpy as np

d = 768  # dimension des embeddings
nlist = 4096
quantizer = faiss.IndexFlatIP(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_INNER_PRODUCT)
index.nprobe = 32
print("Index IVF créé avec", index.nlist, "centres")

IndexIVFFlat offers an excellent speed/accuracy tradeoff. nprobe=32 provides good recall while remaining fast during inference.

Training and Adding Vectors

train_add_vectors.py
np.random.seed(42)
xb = np.random.random((500000, d)).astype('float32')
index.train(xb[:100000])
index.add(xb)
print("Vecteurs ajoutés:", index.ntotal)

Training on 20% of the data is generally sufficient. Always normalize vectors when using METRIC_INNER_PRODUCT.

GPU Search

gpu_search.py
res = faiss.StandardGpuResources()
gpu_index = faiss.index_cpu_to_gpu(res, 0, index)

xq = np.random.random((10, d)).astype('float32')
D, I = gpu_index.search(xq, k=5)
print("Top-5 indices:", I[0])

CPU-to-GPU conversion is instant and enables searches at over 100,000 QPS on a single A100.

Index Persistence

save_load_index.py
faiss.write_index(index, "faiss_ivf.index")
index_loaded = faiss.read_index("faiss_ivf.index")
print("Index chargé:", index_loaded.ntotal)

write_index/read_index is the most reliable method for serialization. For very large indexes, use IndexBinary or mmap formats.

HNSW Index for Low Latency

hnsw_index.py
index_hnsw = faiss.IndexHNSWFlat(d, 32)
index_hnsw.hnsw.efConstruction = 200
index_hnsw.add(xb[:100000])
D, I = index_hnsw.search(xq, k=5)
print("Recherche HNSW terminée")

HNSW offers very low latency but consumes more memory. efConstruction=200 is a good accuracy/speed balance.

Best Practices

  • Always normalize vectors before training and search
  • Choose nprobe or efSearch based on your latency/recall budget
  • Use IndexIDMap to preserve business identifiers
  • Monitor GPU memory fragmentation
  • Regularly save indexes with versioning

Common Mistakes to Avoid

  • Forgetting to call index.train before index.add on IVF indexes
  • Using float64 instead of float32 (halves speed)
  • Neglecting vector normalization with Inner Product
  • Exceeding GPU VRAM without switching to CPU or quantization

Going Further

Discover our advanced training on vector systems and production RAG at learni-group.com/formations.

How to Implement FAISS for Vector Search in 2026 | Learni