Introduction
FAISS (Facebook AI Similarity Search) is the reference library for similarity search on dense vectors. In 2026, it remains essential for RAG systems, recommendation engines, and large-scale duplicate detection. This expert tutorial covers optimized installation, advanced index creation (IVF, HNSW), GPU acceleration, persistence, and production strategies. Each step includes complete, functional code you can copy directly.
Prerequisites
- Python 3.10+
- CUDA 12.4+ and recent NVIDIA drivers (for GPU)
- Solid knowledge of numpy and vector embeddings
- pip and build essentials
Optimized Installation
pip install numpy faiss-gpu-cu12
python -c "import faiss; print(faiss.__version__)"The faiss-gpu-cu12 installation enables CUDA 12 acceleration. Always verify NVIDIA driver compatibility before installing.
Creating an IVF Index
import faiss
import numpy as np
d = 768 # dimension des embeddings
nlist = 4096
quantizer = faiss.IndexFlatIP(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_INNER_PRODUCT)
index.nprobe = 32
print("Index IVF créé avec", index.nlist, "centres")IndexIVFFlat offers an excellent speed/accuracy tradeoff. nprobe=32 provides good recall while remaining fast during inference.
Training and Adding Vectors
np.random.seed(42)
xb = np.random.random((500000, d)).astype('float32')
index.train(xb[:100000])
index.add(xb)
print("Vecteurs ajoutés:", index.ntotal)Training on 20% of the data is generally sufficient. Always normalize vectors when using METRIC_INNER_PRODUCT.
GPU Search
res = faiss.StandardGpuResources()
gpu_index = faiss.index_cpu_to_gpu(res, 0, index)
xq = np.random.random((10, d)).astype('float32')
D, I = gpu_index.search(xq, k=5)
print("Top-5 indices:", I[0])CPU-to-GPU conversion is instant and enables searches at over 100,000 QPS on a single A100.
Index Persistence
faiss.write_index(index, "faiss_ivf.index")
index_loaded = faiss.read_index("faiss_ivf.index")
print("Index chargé:", index_loaded.ntotal)write_index/read_index is the most reliable method for serialization. For very large indexes, use IndexBinary or mmap formats.
HNSW Index for Low Latency
index_hnsw = faiss.IndexHNSWFlat(d, 32)
index_hnsw.hnsw.efConstruction = 200
index_hnsw.add(xb[:100000])
D, I = index_hnsw.search(xq, k=5)
print("Recherche HNSW terminée")HNSW offers very low latency but consumes more memory. efConstruction=200 is a good accuracy/speed balance.
Best Practices
- Always normalize vectors before training and search
- Choose nprobe or efSearch based on your latency/recall budget
- Use IndexIDMap to preserve business identifiers
- Monitor GPU memory fragmentation
- Regularly save indexes with versioning
Common Mistakes to Avoid
- Forgetting to call index.train before index.add on IVF indexes
- Using float64 instead of float32 (halves speed)
- Neglecting vector normalization with Inner Product
- Exceeding GPU VRAM without switching to CPU or quantization
Going Further
Discover our advanced training on vector systems and production RAG at learni-group.com/formations.