How to Create an ETL Job with AWS Glue in 2026

Introduction

AWS Glue is an AWS serverless ETL service that simplifies data extraction, transformation, and loading. It allows companies to catalog their data and run jobs without managing infrastructure. In 2026, mastering Glue is essential for any beginner data engineer looking to automate their pipelines. This tutorial guides you step by step to create a functional ETL job.

Prerequisites

AWS account with Glue and S3 permissions
AWS CLI installed and configured
Basic knowledge of Python
An S3 bucket containing CSV data

IAM Configuration

glue-role-policy.json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListBucket",
        "glue:*"
      ],
      "Resource": "*"
    }
  ]
}

This IAM policy grants the minimum permissions necessary for Glue to read and write to S3 and manage metadata.

Creating the Crawler

terminal

aws glue create-crawler \
  --name mon-premier-crawler \
  --role arn:aws:iam::123456789012:role/GlueServiceRole \
  --database-name glue-demo-db \
  --targets '{"S3Targets":[{"Path":"s3://mon-bucket-donnees/raw/"}]}' \
  --table-prefix demo_

This CLI command creates a crawler that automatically analyzes CSV data in S3 and generates the schema in the Glue catalog.

Python ETL Script

etl_job.py

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

datasource = glueContext.create_dynamic_frame.from_catalog(
    database="glue-demo-db",
    table_name="demo_raw_data"
)

transformed = ApplyMapping.apply(
    frame=datasource,
    mappings=[("id", "int", "id", "int"), ("name", "string", "nom", "string")]
)

glueContext.write_dynamic_frame.from_options(
    frame=transformed,
    connection_type="s3",
    connection_options={"path": "s3://mon-bucket-donnees/processed/"},
    format="parquet"
)
job.commit()

This complete script reads data via the catalog, applies a simple transformation, and writes the result as Parquet to S3.

Creating the Glue Job

terminal

aws glue create-job \
  --name mon-job-etl \
  --role arn:aws:iam::123456789012:role/GlueServiceRole \
  --command Name=glueetl,ScriptLocation=s3://mon-bucket-scripts/etl_job.py,PythonVersion=3 \
  --glue-version 4.0 \
  --default-arguments '{"--job-language":"python"}'

This command registers the ETL job in AWS Glue by pointing to the Python script stored in S3.

Running the Job

terminal

aws glue start-job-run --job-name mon-job-etl

Starts the ETL job execution. Monitor progress in the AWS Glue console or via CloudWatch.

Best Practices

Always use the data catalog to avoid hard-coded schemas
Prefer the Parquet format for transformed data
Enable bookmarks to process only new data
Monitor costs with tags and CloudWatch alerts

Common Errors to Avoid

Forgetting to grant correct IAM permissions to the Glue role
Not configuring bookmarks on S3 sources
Using outdated Glue versions (prefer 4.0+)
Ignoring CloudWatch logging for debugging

To Go Further

Deepen your ETL skills with our Learni training courses dedicated to AWS and modern data pipelines.

How to Create an ETL Job with AWS Glue in 2026

Introduction

Prerequisites

IAM Configuration

Creating the Crawler

Python ETL Script

Creating the Glue Job

Running the Job

Best Practices

Common Errors to Avoid

To Go Further

Recommended Learni Training Courses

AWS CLI Training - Automating Advanced Cloud Tasks

AWS Database Specialty DBS-C01 Training - Obtain Your Certification in 3 Days, May 2026

AWS Expert Training - Scalable Secure Cloud Architectures

AWS Intermediate Training - Manage and Scale Your Clouds Effectively

AWS Lambda Training - Master Serverless to Scale Effectively

AWS Machine Learning Specialty MLS-C01 Training - Obtain Your Certification in 3 Days April 2026

AWS Secrets Manager Training - Securing Secrets in Advanced Production

AWS Security Specialty SCS-C02 Training - Obtain Your Certification in 3 Days, April 2026

AWS Solutions Architect Professional SAP-C02 Training - Get Your Certification in 5 Days, April 2026

Recommended Learni Training Courses

AWS CLI Training - Automating Advanced Cloud Tasks

AWS Database Specialty DBS-C01 Training - Obtain Your Certification in 3 Days, May 2026

AWS Expert Training - Scalable Secure Cloud Architectures

AWS Intermediate Training - Manage and Scale Your Clouds Effectively

AWS Lambda Training - Master Serverless to Scale Effectively

AWS Machine Learning Specialty MLS-C01 Training - Obtain Your Certification in 3 Days April 2026

AWS Secrets Manager Training - Securing Secrets in Advanced Production

AWS Security Specialty SCS-C02 Training - Obtain Your Certification in 3 Days, April 2026

AWS Solutions Architect Professional SAP-C02 Training - Get Your Certification in 5 Days, April 2026