Introduction
AWS Step Functions lets you orchestrate complex serverless distributed workflows. In 2026, event-driven architectures require fine-grained state management, retries, and parallelism. This expert tutorial guides you from ASL definition to CDK deployment while integrating advanced patterns like callbacks and compensation. Every step includes production-ready functional code.
Prerequisites
- AWS account with advanced IAM permissions
- Node.js 20+ and AWS CDK v2
- Strong knowledge of TypeScript and Python
- AWS CLI configured (v2)
Create the Lambda Functions
import json
def lambda_handler(event, context):
order = event.get('order', {})
if not order.get('id'):
raise Exception('InvalidOrder')
return {'status': 'validated', 'order': order}This Lambda validates an order and raises a named exception to trigger retry and catch states in Step Functions.
Define the ASL State Machine
{
"Comment": "Workflow e-commerce expert 2026",
"StartAt": "ValidateOrder",
"States": {
"ValidateOrder": {
"Type": "Task",
"Resource": "${ValidateOrderLambdaArn}",
"Catch": [{
"ErrorEquals": ["InvalidOrder"],
"Next": "NotifyFailure"
}],
"Next": "ProcessPayment"
},
"ProcessPayment": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "${PaymentLambdaArn}"
},
"End": true
},
"NotifyFailure": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish",
"Parameters": {
"TopicArn": "${FailureTopicArn}",
"Message": "Order failed"
},
"End": true
}
}
}Complete ASL definition with named error handling, direct SNS integration, and CDK variables for ARN injection.
TypeScript CDK Stack
import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as sfn from 'aws-cdk-lib/aws-stepfunctions';
import * as tasks from 'aws-cdk-lib/aws-stepfunctions-tasks';
export class StepFunctionsStack extends cdk.Stack {
constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const validateFn = new lambda.Function(this, 'ValidateOrder', {
runtime: lambda.Runtime.PYTHON_3_12,
handler: 'validate_order.lambda_handler',
code: lambda.Code.fromAsset('lambdas'),
});
const definition = sfn.DefinitionBody.fromFile('statemachine/workflow.asl.json');
new sfn.StateMachine(this, 'OrderWorkflow', {
definitionBody: definition,
definitionSubstitutions: { ValidateOrderLambdaArn: validateFn.functionArn },
stateMachineType: sfn.StateMachineType.STANDARD,
});
}
}CDK stack that deploys the Lambda and State Machine while dynamically replacing ARNs in the ASL file.
Deploy via CDK
cdk bootstrap
cdk deploy StepFunctionsStack --require-approval neverDeployment commands that initialize the environment and deploy the stack without interactive confirmation.
Advanced Callback Token Pattern
{
"StartAt": "WaitForApproval",
"States": {
"WaitForApproval": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
"Parameters": {
"FunctionName": "${ApprovalLambdaArn}",
"Payload": {
"taskToken.$": "$$.Task.Token"
}
},
"TimeoutSeconds": 3600,
"End": true
}
}
}Callback token pattern implementation for human approvals, including an explicit timeout.
Best Practices
- Always use custom error names for precise debugging
- Prefer EXPRESS StateMachineType for high-throughput workloads
- Enable full DEBUG logging during development
- Externalize configuration via Parameter Store
- Systematically test states with Step Functions Local
Common Errors to Avoid
- Forgetting IAM permissions between Step Functions and integrated services
- Using overly short timeouts on long-running tasks
- Not versioning ASL definitions in production
- Ignoring the 25,000 event limit per execution
Further Reading
Deepen these concepts with our advanced serverless architecture training: https://learni-group.com/formations