Skip to content
Learni
View all tutorials
Data Engineering

How to Ensure Data Quality with Great Expectations in 2026

Lire en français

Introduction

Data quality has become a strategic priority for any data-driven organization. With the explosion of data sources and the growing use of automated pipelines, silent errors can prove extremely costly. Great Expectations enables you to define explicit data contracts and validate them automatically. This expert tutorial walks you through a complete production-ready implementation, including checkpoints, actions, and CI/CD integration. You will learn how to move from manual checks to a reliable, maintainable governance system.

Prerequisites

  • Python 3.10+
  • Advanced knowledge of pandas and SQL
  • Access to a PostgreSQL or Snowflake database
  • Docker (optional for the test environment)

Initialize the GE Project

terminal
python -m venv venv
source venv/bin/activate
pip install great-expectations==0.18.12 pandas sqlalchemy psycopg2-binary
mkdir data_quality_tutorial && cd data_quality_tutorial
great_expectations init

Initialize a Great Expectations project using version 0.18.12 for compatibility. The init command creates the expected directory structure including expectations, checkpoints, and plugins.

Datasource Configuration

great_expectations/great_expectations.yml
datasources:
  postgres_source:
    class_name: Datasource
    execution_engine:
      class_name: SqlAlchemyExecutionEngine
      connection_string: postgresql://user:password@localhost:5432/analytics
    data_connectors:
      default_runtime_data_connector:
        class_name: RuntimeDataConnector
        batch_identifiers:
          - batch_id

Declare your PostgreSQL data source in the main configuration file. Use RuntimeDataConnector for dynamic validations in production.

Create the Expectation Suite

create_expectation_suite.py
import great_expectations as gx
context = gx.get_context()
suite = context.add_expectation_suite(expectation_suite_name="raw_sales_suite")
batch_request = {
    "datasource_name": "postgres_source",
    "data_connector_name": "default_runtime_data_connector",
    "data_asset_name": "sales"
}
validator = context.get_validator(batch_request=batch_request, expectation_suite_name="raw_sales_suite")
validator.expect_column_to_exist("order_id")
validator.expect_column_values_to_not_be_null("order_id")
validator.expect_column_values_to_be_unique("order_id")
validator.expect_column_values_to_be_between("amount", min_value=0)
validator.save_expectation_suite(discard_failed_expectations=False)

Create a business expectation suite with strict rules on critical columns. Each expectation is persisted for reuse across pipelines.

Create the Advanced Checkpoint

create_checkpoint.py
import great_expectations as gx
context = gx.get_context()
checkpoint = context.add_checkpoint(
    name="sales_quality_checkpoint",
    config={
        "class_name": "SimpleCheckpoint",
        "expectation_suite_name": "raw_sales_suite",
        "action_list": [
            {"name": "store_validation_result", "action": {"class_name": "StoreValidationResultAction"}},
            {"name": "update_data_docs", "action": {"class_name": "UpdateDataDocsAction"}},
            {"name": "notify_slack", "action": {"class_name": "SlackNotificationAction", "slack_webhook": "${SLACK_WEBHOOK}"}},
        ],
    },
)
context.save_checkpoint(checkpoint)

Configure a checkpoint with multiple actions including Slack notifications. This enables immediate response when validation fails.

Run the Validation

run_validation.py
import great_expectations as gx
from datetime import datetime
context = gx.get_context()
result = context.run_checkpoint(
    checkpoint_name="sales_quality_checkpoint",
    batch_request={
        "datasource_name": "postgres_source",
        "data_connector_name": "default_runtime_data_connector",
        "data_asset_name": "sales",
        "runtime_parameters": {"query": "SELECT * FROM raw.sales WHERE date = CURRENT_DATE"}
    },
    run_name=f"daily_validation_{datetime.now().strftime('%Y%m%d')}"
)
print(result.success)

Execute validation with a parameterized query to target today's data. The result contains the overall status and per-expectation details.

Best Practices

  • Version your expectation suites in Git like source code
  • Use custom expectations for specific business rules
  • Configure tolerance thresholds instead of blocking validations initially
  • Integrate validations into your Airflow or dbt pipelines
  • Document each expectation with clear business context

Common Mistakes to Avoid

  • Forgetting to handle large batches that crash validations
  • Not configuring notification actions in production
  • Using non-descriptive suite names
  • Ignoring deprecation warnings during Great Expectations updates

Further Reading

Deepen your skills with our complete training on data governance and automated testing. Discover our Learni training.