Introduction
Stable Diffusion is a latent diffusion model that generates images from textual descriptions. In 2026, deploying it locally provides full control over privacy, costs, and customizations. This intermediate tutorial walks you through setting up a working environment using Hugging Face's diffusers library.
Prerequisites
- Python 3.10 or higher
- NVIDIA GPU with CUDA 12+
- Minimum 8 GB VRAM
- Basic command line and Python knowledge
Installing Dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install diffusers transformers accelerate safetensorsThese commands install PyTorch with CUDA support and the essential libraries to load and run Stable Diffusion efficiently.
Basic Generation Script
import torch
from diffusers import StableDiffusionPipeline
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "un chat astronaut sur la lune, style réaliste"
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
image.save("output.png")This script loads the Stable Diffusion v1.5 model in float16 precision to save VRAM and generates an image from the provided prompt.
Memory Optimization
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
use_safetensors=True
)
pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()
prompt = "paysage montagneux au coucher du soleil"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("optimized.png")Enabling CPU offloading allows running the model on GPUs with less VRAM while maintaining acceptable performance.
Using a Custom Scheduler
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "portrait de femme en style cyberpunk"
image = pipe(prompt, num_inference_steps=20).images[0]
image.save("dpm_output.png")Replacing the scheduler with DPMSolver reduces the number of inference steps while maintaining high image quality.
Simple Gradio Interface
import gradio as gr
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
def generate_image(prompt):
image = pipe(prompt, num_inference_steps=30).images[0]
return image
gr.Interface(fn=generate_image, inputs="text", outputs="image").launch()This code creates a local web interface with Gradio to quickly test different prompts without editing the Python script each time.
Best Practices
- Always use torch.float16 or torch.bfloat16 to reduce memory usage
- Enable optimizations like xformers or torch.compile when available
- Store models in safetensors format for better security
- Test different schedulers for each prompt type
- Keep a history of seeds to reproduce good results
Common Mistakes to Avoid
- Forgetting to enable CUDA and running only on CPU
- Using overly long prompts without negative prompts
- Ignoring insufficient VRAM errors without enabling model offload
- Not regularly updating the diffusers dependencies
Going Further
Check out our advanced generative AI courses to master fine-tuning and advanced control of Stable Diffusion.