Skip to content
Learni
View all tutorials
Cloud & Serverless

How to Orchestrate Complex Workflows with AWS Step Functions in 2026

Lire en français

Introduction

AWS Step Functions lets you orchestrate AWS services and microservices through visual state machines. In 2026, its adoption is surging for complex event-driven architectures. This expert tutorial guides you from ASL definitions to CDK deployment with advanced error handling, callbacks, and optimized Lambda integration. You will learn to avoid anti-patterns and ensure resilience in production environments.

Prerequisites

  • AWS CLI v2 configured with an admin profile
  • Node.js 20+ and AWS CDK v2
  • Strong knowledge of TypeScript and IAM
  • An AWS account with credits for testing

CDK Project Initialization

terminal
mkdir step-functions-workflow && cd step-functions-workflow
npx aws-cdk init app --language=typescript
npm install aws-cdk-lib constructs

Initializes a TypeScript CDK project and installs the essential dependencies for defining State Machines.

Basic State Machine Definition

lib/workflow-stack.ts
import * as cdk from 'aws-cdk-lib';
import * as sfn from 'aws-cdk-lib/aws-stepfunctions';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import { Construct } from 'constructs';

export class WorkflowStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);
    const processFn = new lambda.Function(this, 'ProcessFn', {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: 'index.handler',
      code: lambda.Code.fromInline('exports.handler = async () => ({ status: "ok" });'),
    });
    const definition = sfn.Chain.start(
      new sfn.Task(this, 'ProcessTask', {
        task: new sfn.LambdaInvoke(this, 'InvokeProcess', {
          lambdaFunction: processFn,
        }),
      })
    );
    new sfn.StateMachine(this, 'ComplexWorkflow', {
      definitionBody: sfn.DefinitionBody.fromChainable(definition),
      stateMachineType: sfn.StateMachineType.STANDARD,
    });
  }
}

This CDK code defines a STANDARD State Machine with a Lambda task. It uses DefinitionBody for improved maintainability in 2026.

Adding Choice and Parallel States

lib/advanced-workflow.ts
const choice = new sfn.Choice(this, 'CheckStatus');
const parallel = new sfn.Parallel(this, 'ParallelTasks');
parallel.branch(new sfn.LambdaInvoke(this, 'Branch1', { lambdaFunction: fn1 }));
parallel.branch(new sfn.LambdaInvoke(this, 'Branch2', { lambdaFunction: fn2 }));
const definition = sfn.Chain
  .start(processTask)
  .next(choice
    .when(sfn.Condition.stringEquals('$.status', 'success'), parallel)
    .otherwise(new sfn.Fail(this, 'Failed', { error: 'ValidationError' }))
  );

Adds a Choice for conditional routing and a Parallel for concurrent execution. Avoid infinite loops with strict conditions.

Advanced Error Handling Integration

lib/error-handling.ts
const retryPolicy = {
  maxAttempts: 3,
  interval: cdk.Duration.seconds(2),
  backoffRate: 2,
};
const catchPolicy = new sfn.Catch(this, 'CatchAll', {
  errors: ['States.TaskFailed'],
  resultPath: '$.error',
});
const taskWithRetry = new sfn.Task(this, 'ResilientTask', {
  task: new sfn.LambdaInvoke(this, 'InvokeWithRetry', {
    lambdaFunction: processFn,
  }),
}).addRetry(retryPolicy).addCatch(catchPolicy);

Implements exponential retry and global catch. Ensures resilience without duplicating error logic in every Lambda function.

Deployment and Testing

terminal
cdk deploy
aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:eu-west-1:123456789012:stateMachine:ComplexWorkflow \
  --input '{"status":"success"}'

Deploys the stack and starts a test execution. Verify results in the Step Functions console.

Best Practices

  • Always use DefinitionBody.fromChainable for readability
  • Prefer explicit timeouts on every task
  • Centralize error handling at the State Machine level
  • Version your ASL definitions with Git
  • Monitor with X-Ray and CloudWatch Logs

Common Mistakes to Avoid

  • Forgetting IAM permissions between Step Functions and Lambda
  • Setting overly long timeouts that drive up costs
  • Ignoring the 25,000 transition limit per execution
  • Failing to test error branches in a staging environment

Going Further

Explore native integrations with EventBridge and DynamoDB. Discover our Learni training courses on advanced serverless architecture.