Skip to content
Learni
View all tutorials
Intelligence Artificielle

How to Deploy AI Models on Vertex AI in 2026

Lire en français

Introduction

Vertex AI, Google Cloud's unified platform for AI and machine learning, revolutionizes production deployments in 2026. It integrates datasets, training (AutoML or custom), Kubeflow pipelines, predictive endpoints, and native MLOps monitoring. Why is it crucial? In a world where 80% of ML models fail in production due to lacking scalable infrastructure, Vertex AI provides end-to-end management: from fine-tuning Gemini to distributed jobs on TPUs. This expert tutorial guides you step-by-step to create a tabular dataset, train a classification model, deploy a private endpoint, and monitor in real-time. Result: a production-ready system handling 10k+ requests per second with costs optimized via spot instances. Ideal for senior data scientists managing critical workloads.

Prerequisites

  • Google Cloud account with billing enabled and Vertex AI quota (at least 10 vCPU, 30 GB storage).
  • gcloud CLI installed (v450+).
  • Python 3.10+ with venv.
  • IAM roles: roles/aiplatform.admin, roles/storage.admin on the project.
  • Ready CSV dataset (e.g., Iris dataset for testing).
  • Supported region: us-central1 or europe-west4.

gcloud Installation and Authentication

setup-gcloud.sh
#!/bin/bash

# Install gcloud if missing (Linux/Mac)
if ! command -v gcloud &> /dev/null; then
    curl https://sdk.cloud.google.com | bash
    exec -l $SHELL
fi

# Initialize gcloud
PROJECT_ID="your-project-id"
REGION="us-central1"

gcloud auth login

gcloud config set project $PROJECT_ID
gcloud config set ai/region $REGION

gcloud services enable aiplatform.googleapis.com \
    artifactregistry.googleapis.com \
    cloudbuild.googleapis.com

# Verify
 gcloud ai models list --limit=1

This script installs gcloud, enables user authentication, and activates essential APIs for Vertex AI. It configures the project and default region to avoid regional quota errors. Run it once; common pitfall: forgetting cloudbuild for custom jobs.

Python Environment Setup

Create an isolated virtual environment to avoid dependency conflicts. Install the Vertex AI SDK v1.52+ (stable in 2026), which handles REST calls under the hood with automatic retries. Analogy: like a Kubernetes orchestrator for ML, it abstracts GCP complexities. Upload your CSV dataset to a GCS bucket (e.g., gs://your-bucket/datasets/iris.csv) using gsutil cp.

SDK Installation and Client Test

setup-python.sh
#!/bin/bash

python -m venv venv
source venv/bin/activate  # venv\Scripts\activate on Windows

pip install --upgrade pip
pip install google-cloud-aiplatform==1.52.0 \
    pandas==2.2.2 \
    scikit-learn==1.5.1 \
    google-cloud-storage==2.17.0

# Test auth (via ADC)
export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account-key.json"  # Optional if gcloud auth active

python -c "from google.cloud import aiplatform; print('SDK OK:', aiplatform.__version__)" 

Installs core libraries for interacting with Vertex AI. The SDK uses Application Default Credentials (ADC) for seamless auth. Pitfall: without google-cloud-storage, dataset uploads fail; always test with a snippet to validate the environment.

Creating a Tabular Dataset

create_dataset.py
import os

from google.cloud import aiplatform

project_id = "your-project-id"
display_name = "iris-dataset"
dataset_uri = "gs://your-bucket/datasets/iris.csv"
location = "us-central1"

aiplatform.init(project=project_id, location=location)

# Schema for classification (adjust columns)
labels_schema = aiplatform.TabularDataset.LABEL_SCHEMAS(
    target_column_name="species"
)

# Create dataset (CSV auto-detected)
dataset = aiplatform.TabularDataset.create(
    display_name=display_name,
    source_uri=dataset_uri,
    labels_schema=labels_schema,
    delta=aiplatform.gapic.Delta(timestamp=None),
)

print(f"Dataset created: {dataset.resource_name}")
print(f"State: {dataset.state.name}")

This script creates a tabular dataset for multi-class classification (Iris). It explicitly defines the labels schema to avoid faulty auto-detection on large datasets. Wait 2-5 minutes for state=PIPELINE_STATE_SUCCEEDED; monitor via Vertex AI Console.

Training an AutoML Model

Progression: With the dataset ready, launch an AutoML training job for a quick baseline (ideal for experts to iterate fast). AutoML optimizes hyperparameters via NAS (Neural Architecture Search), achieving 95%+ accuracy on Iris in under 1 hour. Compare with custom training below.

Launching AutoML Training Job

train_automl.py
from google.cloud import aiplatform

project_id = "your-project-id"
location = "us-central1"
dataset_id = "projects/your-project-id/locations/us-central1/datasets/123456789"  # Replace with output from create_dataset
model_display_name = "iris-automl-classifier"

aiplatform.init(project=project_id, location=location)

# Dataset ref
 TabularDataset = aiplatform.TabularDataset(dataset_id)

# Training job (budget 1 node-hour for test)
job = aiplatform.AutoMLTabularTrainingJob(
    display_name="iris-automl-train",
    optimization_prediction_type="classification",
    budget_milli_node_hours=1000,
)

model = job.run(
    dataset=TabularDataset,
    model_display_name=model_display_name,
    disable_early_stopping=False,
)

print(f"Model created: {model.display_name}")
print(f"Metrics: {model.metrics}")

Launches an optimized AutoML job for classification with early stopping for cost savings (up to 30% of budget). budget_milli_node_hours=1000 = 1h n1-standard-4. Retrieve dataset_id from previous script; metrics include AUC, log_loss for expert evaluation.

Deploying a Secure Endpoint

deploy_endpoint.py
from google.cloud import aiplatform

project_id = "your-project-id"
location = "us-central1"
model_id = "projects/your-project-id/locations/us-central1/models/iris-automl-classifier"  # From train_automl
endpoint_display_name = "iris-prod-endpoint"

aiplatform.init(project=project_id, location=location)

# Dedicated endpoint (private VPC by default)
endpoint = aiplatform.Endpoint.create(display_name=endpoint_display_name)

# Deploy model (scalable machine type)
model = aiplatform.Model(model_id)

endpoint.deploy(
    model=model,
    deployed_model_display_name="deployed-iris",
    machine_type="n1-standard-4",
    min_replica_count=1,
    max_replica_count=3,
    traffic_split={"0": 100},
    enable_access_logging=True,
)

print(f"Endpoint URI: {endpoint.resource_name}")
print(f"Deployed on: {endpoint.deployed_models[0].deployed_model_id}")

Creates a private endpoint with autoscaling (1-3 replicas) and logging for Cloud Audit Logs. enable_access_logging=True is essential for GDPR compliance. Deployment time: 10-15 minutes; test via Console > Predictions.

Inference and Monitoring

For production, integrate the endpoint into apps via SDK or REST. Enable Vertex AI Feature Store for online serving if >1M features. Monitoring: Auto dashboards for latency (p99 <200ms), drift detection, and SHAP explainability.

Batch and Online Prediction

predict.py
from google.cloud import aiplatform

import pandas as pd

project_id = "your-project-id"
location = "us-central1"
endpoint_id = "projects/your-project-id/locations/us-central1/endpoints/123456789"  # From deploy_endpoint

aiplatform.init(project=project_id, location=location)

endpoint = aiplatform.Endpoint(endpoint_id)

# Online prediction (instances like dataset input)
instances = [
    [5.1, 3.5, 1.4, 0.2],  # Iris features
    [7.0, 3.2, 4.7, 1.4],
]
predictions = endpoint.predict(instances=instances).predictions
print("Online preds:", [pred['classes'][0]['label'] for pred in predictions])

# Batch (GCS URI job)
batch_job = endpoint.batch_predict(
    job_display_name="iris-batch",
    gcs_source_uri="gs://your-bucket/batch/iris_test.csv",
    gcs_destination_prefix="gs://your-bucket/preds/",
    instances_format="csv",
    predictions_format="jsonl",
    machine_type="n1-standard-2",
)
print(f"Batch job: {batch_job.resource_name}")

Dual inference: online for low-latency (<100ms), batch for massive volumes (70% savings vs. online). CSV/JSONL format matches dataset; monitor job with batch_job.wait(). Pitfall: schema mismatch on instances → 400 BadRequest.

Simple Vertex AI Pipeline (Kubeflow)

pipeline.py
from kfp import dsl
from google.cloud import aiplatform

@dsl.component(base_image="python:3.10")
def load_data_op(dataset_uri: str) -> str:
    # Simulate load (replace with real upload)
    return "gs://fake-bucket/processed.csv"

@dsl.pipeline(name="iris-pipeline", pipeline_root="gs://your-bucket/pipelines/")
def training_pipeline(
    dataset_uri: str = "gs://your-bucket/datasets/iris.csv"
):
    load_task = load_data_op(dataset_uri)
    # Add train/deploy components here

project_id = "your-project-id"
location = "us-central1"

aiplatform.PipelineJob(
    display_name="IrisPipeline",
    template_path=None,  # Inline
    pipeline_root="gs://your-bucket/pipelines/",
    parameter_values={"dataset_uri": "gs://your-bucket/datasets/iris.csv"},
).run()

Basic Kubeflow pipeline for MLOps: scalable to 100+ steps. @dsl.component encapsulates reusable steps; pipeline_root GCS for artifacts. Run for CI/CD; extend with custom containers for distributed TPU training.

Best Practices

  • Secure endpoints: Use VPC Service Controls + IAM least-privilege (e.g., roles/aiplatform.userPredictor only).
  • Optimize costs: Spot VMs for training (accelerator_type=TPU_V4 for >10GB data), auto-scale min_replicas=0 off-peak.
  • Advanced MLOps: Integrate Model Registry + Feature Store; enable Model Monitoring for drift (0.1 Jensen-Shannon threshold).
  • Scalability: Prefer serverless Prediction for <1k QPS; dedicated for high-throughput.
  • Testing: CI/CD with Cloud Build testing accuracy >95% before promoting staging→prod.

Common Errors to Avoid

  • Auth failed: Verify ADC with gcloud auth application-default print-access-token; no SA keys in prod (leak risk).
  • Quota exceeded: Increase via Console (e.g., 50 TrainingUnits); monitor with gcloud ai quotas list.
  • Schema mismatch: Always specify labels_schema; auto-detect success rate 70% on unbalanced data.
  • Cold start latency: min_replicas=1 + traffic_split for <500ms p95.

Next Steps