Introduction
Data quality has become a strategic priority for any data-driven organization. With the explosion of data sources and the growing use of automated pipelines, silent errors can prove extremely costly. Great Expectations enables you to define explicit data contracts and validate them automatically. This expert tutorial walks you through a complete production-ready implementation, including checkpoints, actions, and CI/CD integration. You will learn how to move from manual checks to a reliable, maintainable governance system.
Prerequisites
- Python 3.10+
- Advanced knowledge of pandas and SQL
- Access to a PostgreSQL or Snowflake database
- Docker (optional for the test environment)
Initialize the GE Project
python -m venv venv
source venv/bin/activate
pip install great-expectations==0.18.12 pandas sqlalchemy psycopg2-binary
mkdir data_quality_tutorial && cd data_quality_tutorial
great_expectations initInitialize a Great Expectations project using version 0.18.12 for compatibility. The init command creates the expected directory structure including expectations, checkpoints, and plugins.
Datasource Configuration
datasources:
postgres_source:
class_name: Datasource
execution_engine:
class_name: SqlAlchemyExecutionEngine
connection_string: postgresql://user:password@localhost:5432/analytics
data_connectors:
default_runtime_data_connector:
class_name: RuntimeDataConnector
batch_identifiers:
- batch_idDeclare your PostgreSQL data source in the main configuration file. Use RuntimeDataConnector for dynamic validations in production.
Create the Expectation Suite
import great_expectations as gx
context = gx.get_context()
suite = context.add_expectation_suite(expectation_suite_name="raw_sales_suite")
batch_request = {
"datasource_name": "postgres_source",
"data_connector_name": "default_runtime_data_connector",
"data_asset_name": "sales"
}
validator = context.get_validator(batch_request=batch_request, expectation_suite_name="raw_sales_suite")
validator.expect_column_to_exist("order_id")
validator.expect_column_values_to_not_be_null("order_id")
validator.expect_column_values_to_be_unique("order_id")
validator.expect_column_values_to_be_between("amount", min_value=0)
validator.save_expectation_suite(discard_failed_expectations=False)Create a business expectation suite with strict rules on critical columns. Each expectation is persisted for reuse across pipelines.
Create the Advanced Checkpoint
import great_expectations as gx
context = gx.get_context()
checkpoint = context.add_checkpoint(
name="sales_quality_checkpoint",
config={
"class_name": "SimpleCheckpoint",
"expectation_suite_name": "raw_sales_suite",
"action_list": [
{"name": "store_validation_result", "action": {"class_name": "StoreValidationResultAction"}},
{"name": "update_data_docs", "action": {"class_name": "UpdateDataDocsAction"}},
{"name": "notify_slack", "action": {"class_name": "SlackNotificationAction", "slack_webhook": "${SLACK_WEBHOOK}"}},
],
},
)
context.save_checkpoint(checkpoint)Configure a checkpoint with multiple actions including Slack notifications. This enables immediate response when validation fails.
Run the Validation
import great_expectations as gx
from datetime import datetime
context = gx.get_context()
result = context.run_checkpoint(
checkpoint_name="sales_quality_checkpoint",
batch_request={
"datasource_name": "postgres_source",
"data_connector_name": "default_runtime_data_connector",
"data_asset_name": "sales",
"runtime_parameters": {"query": "SELECT * FROM raw.sales WHERE date = CURRENT_DATE"}
},
run_name=f"daily_validation_{datetime.now().strftime('%Y%m%d')}"
)
print(result.success)Execute validation with a parameterized query to target today's data. The result contains the overall status and per-expectation details.
Best Practices
- Version your expectation suites in Git like source code
- Use custom expectations for specific business rules
- Configure tolerance thresholds instead of blocking validations initially
- Integrate validations into your Airflow or dbt pipelines
- Document each expectation with clear business context
Common Mistakes to Avoid
- Forgetting to handle large batches that crash validations
- Not configuring notification actions in production
- Using non-descriptive suite names
- Ignoring deprecation warnings during Great Expectations updates
Further Reading
Deepen your skills with our complete training on data governance and automated testing. Discover our Learni training.