Introduction
Redacting PII (Personally Identifiable Information) has become mandatory for any application handling personal data in 2026. This expert tutorial guides you step-by-step through building a robust, high-performance redaction system that meets GDPR requirements. You will learn to combine optimized regular expressions with semantic detection to process emails, phone numbers, IBANs, and contextual data. The approach is designed for direct production use with minimal latency.
Prerequisites
- Node.js 20+ and TypeScript 5.4+
- Advanced knowledge of regex and text processing
- AWS or GCP account for optional NLP services
- Familiarity with Express/Fastify middlewares
Project Initialization
npm init -y
npm install typescript @types/node tsx
npm install --save-dev @types/node
npx tsc --initInitialize a strict TypeScript project. tsx enables direct execution without compilation during development.
Solution Architecture
We will create a modular redaction module with a simple interface. Each PII type will be handled by a dedicated detector to enable efficient maintenance and unit testing.
PII Detector Definitions
export interface PIIDetector {
name: string;
regex: RegExp;
replacement: string;
}
export const detectors: PIIDetector[] = [
{ name: 'email', regex: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g, replacement: '[EMAIL_REDACTED]' },
{ name: 'phone', regex: /\b(?:\+33|0)[1-9](?:[\s.-]?\d{2}){4}\b/g, replacement: '[PHONE_REDACTED]' },
{ name: 'iban', regex: /\b[A-Z]{2}\d{2}(?:[\s]?[A-Z0-9]{4}){4,7}\b/g, replacement: '[IBAN_REDACTED]' }
];The regex patterns are optimized for French and European formats. Each detector is independent to simplify adding new PII types.
Redaction Engine Implementation
import { detectors, PIIDetector } from './detectors';
export function redactPII(text: string, customDetectors: PIIDetector[] = []): string {
let result = text;
const allDetectors = [...detectors, ...customDetectors];
for (const detector of allDetectors) {
result = result.replace(detector.regex, detector.replacement);
}
return result;
}
export function redactObject(obj: any): any {
if (typeof obj === 'string') return redactPII(obj);
if (Array.isArray(obj)) return obj.map(redactObject);
if (obj && typeof obj === 'object') {
const newObj: any = {};
for (const key in obj) {
newObj[key] = redactObject(obj[key]);
}
return newObj;
}
return obj;
}The redactObject function recursively processes complete JSON objects. This approach is essential for REST APIs and structured logs.
Fastify Middleware for APIs
import { FastifyRequest, FastifyReply } from 'fastify';
import { redactObject } from './redactor';
export async function piiRedactionMiddleware(request: FastifyRequest, reply: FastifyReply) {
if (request.body) {
request.body = redactObject(request.body);
}
if (request.query) {
request.query = redactObject(request.query);
}
const originalSend = reply.send;
reply.send = function (payload: any) {
return originalSend.call(this, redactObject(payload));
};
}This middleware intercepts both inputs and outputs to ensure no PII is transmitted in plaintext, including error responses.
Complete Unit Tests
import { redactPII, redactObject } from './redactor';
describe('PII Redaction', () => {
it('redacts email correctly', () => {
expect(redactPII('Contact: test@example.com')).toBe('Contact: [EMAIL_REDACTED]');
});
it('redacts nested objects', () => {
const input = { user: { email: 'john@doe.fr', phone: '0612345678' } };
expect(redactObject(input)).toEqual({ user: { email: '[EMAIL_REDACTED]', phone: '[PHONE_REDACTED]' } });
});
});Tests cover both simple cases and nested structures. Run with npx tsx src/redactor.test.ts.
Production Configuration
{
"detectors": ["email", "phone", "iban"],
"performance": {
"maxTextLength": 100000,
"timeoutMs": 50
},
"logging": {
"redactionCount": true,
"sampleRate": 0.01
}
}External configuration allows enabling/disabling detectors and monitoring production performance without restarts.
Best Practices
- Always test regex patterns on real datasets before deployment
- Implement logging of redaction statistics (without storing the data)
- Provide a dry-run mode for audits
- Version detector sets for traceability
- Combine with NLP solutions (Presidio, AWS Comprehend) for complex cases
Common Mistakes to Avoid
- Forgetting to handle arrays and nested objects
- Using overly broad regex that breaks legitimate data
- Failing to manage timeouts on very large text volumes
- Storing logs before redaction
Going Further
Integrate advanced NLP models and explore our Learni training programs on data compliance and security.