Skip to content
Learni
View all tutorials
Outils Backend

How to Use Protobuf with Python in 2026

Lire en français

Introduction

Protocol Buffers, or Protobuf, is a binary serialization format developed by Google, designed to be more compact and faster than JSON or XML. Unlike textual and verbose JSON, Protobuf produces typed binary data, reducing size by 3 to 10 times and speeding up parsing by 5 to 20 times based on benchmarks.

It's ideal for microservices, gRPC, IoT, or any system needing efficient communication. In 2026, with the rise of high-performance APIs and embedded AI, Protobuf remains an industry standard supported by all major languages.

This beginner tutorial guides you step by step: from installation to concrete Python examples. By the end, you'll know how to create, compile, and use Protobuf messages for real applications, like exchanging Person data between client and server. Every step includes actionable code.

Prerequisites

  • Python 3.10 or higher installed
  • pip (usually included with Python)
  • Unix-like system (Linux/macOS) or WSL on Windows for protoc
  • Basic Python knowledge (classes, imports)
  • A code editor like VS Code

Install protoc

install-protoc.sh
# Pour macOS avec Homebrew
brew install protobuf

# Pour Ubuntu/Debian
sudo apt update
sudo apt install protobuf-compiler

# Vérification
protoc --version
# Doit afficher libprotoc 27.3 ou supérieur

protoc is the Protobuf schema compiler. This installs the stable 2026-ready version. On Windows, download the binary from GitHub releases and add it to your PATH. Always check the version for compatibility with your plugins.

Define your first Protobuf schema

A .proto schema defines your data structure like a typed contract. Think of it as a JSON Schema but binary and evolvable.

We'll create a simple Person message with name, ID, and email. Fields are numbered for backward compatibility: changing an existing field's type breaks old messages, but adding a new field is safe if numbered sequentially.

First .proto file

person.proto
syntax = "proto3";

package tutorial;

message Person {
  int32 id = 1;
  string name = 2;
  string email = 3;
}

This schema uses proto3 (modern syntax, no 'required' fields). The package prevents collisions. Field numbers (1,2,3) are crucial for evolution: never reuse them. Copy this file as-is.

Compile the schema for Python

compile.sh
pip install protobuf

protoc --python_out=. person.proto

# Vérification : fichier généré
ls *.pb2.py

This generates person_pb2.py, a pure Python module with typed classes. pip install protobuf provides the runtime. The --python_out=. flag places the file in the current directory. Run this in the same folder as person.proto.

Use the message in Python

Once compiled, import the generated module. Create a Person object, serialize it to bytes (like 'pack'), then deserialize ('unpack'). It's 10x faster than json.dumps/loads for large volumes.

Simple encode/decode script

use_person.py
import person_pb2

# Create a message
person = person_pb2.Person()
person.id = 1234
person.name = "Alice"
person.email = "alice@example.com"

# Serialize to bytes
serialized = person.SerializeToString()
print(f"Taille sérialisée: {len(serialized)} bytes")
print(f"Bytes: {serialized}")

# Deserialize
from_bytes = person_pb2.Person()
from_bytes.ParseFromString(serialized)
print(f"ID: {from_bytes.id}, Nom: {from_bytes.name}, Email: {from_bytes.email}")

This complete script is runnable after compilation (python use_person.py). SerializeToString() produces compact bytes (~20 bytes here vs 50+ in JSON). ParseFromString() reconstructs the object losslessly. Tip: use HasField() for optional fields in proto2.

Complex schemas with nested messages

Analogy: like nested Python classes. We'll add a repeated PhoneNumber to Person, plus a PhoneType enum. Repeated fields are dynamic lists, perfect for arrays.

Advanced schema with nesting and enums

addressbook.proto
syntax = "proto3";

package tutorial;

enum PhoneType {
  MOBILE = 0;
  HOME = 1;
  WORK = 2;
}

message PhoneNumber {
  string number = 1;
  PhoneType type = 2;
}

message Person {
  int32 id = 1;
  string name = 2;
  string email = 3;
  repeated PhoneNumber phones = 4;
}

message AddressBook {
  repeated Person people = 1;
}

Enums for closed types, repeated for 1-N relationships, nested messages for composition. AddressBook is a container. Numbers skip ahead (4 after 3) for future additions. proto3 handles default zeros automatically.

Compile the advanced schema

compile_advanced.sh
protoc --python_out=. addressbook.proto

# Vérification
ls addressbook_pb2.py

Same process, generates addressbook_pb2.py with nested classes. No extra plugins needed for basic Python. If you get 'protoc: not found', go back to installation.

Full script with nested messages

use_addressbook.py
import addressbook_pb2

# Create AddressBook
book = addressbook_pb2.AddressBook()

# First person
person1 = book.people.add()
person1.id = 1234
person1.name = "Alice"
person1.email = "alice@example.com"
phone1 = person1.phones.add()
phone1.number = "+33123456789"
phone1.type = addressbook_pb2.AddressBook_PHONE_TYPE_MOBILE

# Second person
person2 = book.people.add()
person2.id = 5678
person2.name = "Bob"
person2.email = "bob@example.com"
phone2 = person2.phones.add()
phone2.number = "+33198765432"
phone2.type = addressbook_pb2.AddressBook_PHONE_TYPE_WORK

# Serialize
data = book.SerializeToString()
print(f"Taille totale: {len(data)} bytes")

# Deserialize
new_book = addressbook_pb2.AddressBook()
new_book.ParseFromString(data)
for person in new_book.people:
    print(f"Personne: {person.name} ({person.email})")
    for phone in person.phones:
        type_str = "Mobile" if phone.type == addressbook_pb2.AddressBook_PHONE_TYPE_MOBILE else "Work" if phone.type == addressbook_pb2.AddressBook_PHONE_TYPE_WORK else "Home"
        print(f"  Téléphone: {phone.number} ({type_str})")

Standalone executable script: use .add() for repeated fields. Enums via generated constants (_PHONE_TYPE_MOBILE). Size ~100 bytes vs 300+ in JSON. Ideal for storage or networking. Avoid mutating after serialization.

Best practices

  • Number sequentially: 1-15 for frequent fields (more compact), reserve >19000 for optional.
  • Avoid long strings: use bytes for base64 if needed.
  • Version .proto files: use git tags and semantic versioning for schemas.
  • Validate before compiling: use protoc --decode to test bytes manually.
  • Integrate with gRPC: move to RPC services after mastering messages.

Common errors to avoid

  • Reusing field numbers: breaks backward compatibility, old parsers fail.
  • Forgetting proto3 syntax: proto2 has rigid 'required', migrate to proto3.
  • Ignoring imports: for multi-files, use import public "other.proto".
  • No error handling: always wrap ParseFromString() in try/except for corrupt data.

Next steps

  • Official docs: Protobuf Python
  • gRPC tutorial: integrate these messages into RPC services.
  • Advanced tools: Buf CLI for automated builds, or protobuf-es for web.
  • Learni Training: master gRPC and microservices in depth with our experts.