Introduction
Data Loss Prevention (DLP) has become essential for any company handling sensitive information. In 2026, data breaches cost an average of 4.8 million dollars per incident. This tutorial shows you how to integrate a DLP solution directly into your applications by combining automated detection with business rules. You will learn to scan data in real time, block unauthorized transfers, and generate actionable alerts. The approach is progressive and relies on concrete examples in Node.js and Python environments.
Prerequisites
- Node.js 20+ and Python 3.11+
- Basic knowledge of TypeScript and regex
- Access to a terminal and code editor
- Understanding of REST API security concepts
Install Dependencies
npm install express zod validator
pip install presidio-analyzer presidio-anonymizerWe install Express and Zod for validation, then Presidio for detecting sensitive data such as credit card numbers and emails. These tools enable local analysis without sending data to third-party services.
Create the Sensitive Data Scanner
from presidio_analyzer import AnalyzerEngine
analyzer = AnalyzerEngine()
def scan_text(text: str):
results = analyzer.analyze(text=text, language="fr")
return [{"entity": r.entity_type, "score": r.score, "text": text[r.start:r.end]} for r in results]
if __name__ == "__main__":
sample = "Le client Jean Dupont a la carte 4111-1111-1111-1111"
print(scan_text(sample))This script uses Microsoft Presidio to automatically detect sensitive entities. It returns the detected types with their confidence scores. Runnable directly, it forms the foundation of any DLP policy.
Define DLP Rules in YAML
rules:
- name: credit_card
entity: CREDIT_CARD
action: block
threshold: 0.8
- name: email
entity: EMAIL_ADDRESS
action: log
threshold: 0.9
- name: iban
entity: IBAN_CODE
action: block
threshold: 0.85This configuration file centralizes business rules. Each rule defines the entity type, action (block or log), and minimum confidence threshold. It is easily versionable and modifiable without redeployment.
Integrate DLP into a TypeScript API
import { NextRequest, NextResponse } from 'next/server';
import { z } from 'zod';
const dlpSchema = z.object({ content: z.string().min(1) });
export async function POST(req: NextRequest) {
const body = await req.json();
const { content } = dlpSchema.parse(body);
// Appel au scanner Python via API ou child_process
if (content.includes('4111-1111')) {
return NextResponse.json({ blocked: true, reason: 'CREDIT_CARD' }, { status: 403 });
}
return NextResponse.json({ blocked: false });
}The REST API validates inputs with Zod then applies a simple DLP check. In production, replace the condition with a call to the Python script. This instantly blocks requests containing sensitive data.
Complete Test Script
import fetch from 'node-fetch';
async function testDLP() {
const res = await fetch('http://localhost:3000/api/dlp', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ content: 'Carte: 4111-1111-1111-1111' })
});
console.log(await res.json());
}
testDLP();This test script sends a payload containing a credit card number and verifies that the response is blocked with a 403 status code. It allows validating the DLP integration in seconds.
Best Practices
- Always encrypt DLP logs to prevent secondary leaks
- Use high confidence thresholds (≥ 0.85) in production
- Version YAML rules in Git with code review
- Test false positive scenarios with anonymized data
- Combine DLP with RBAC access controls
Common Mistakes to Avoid
- Forgetting to handle special characters in regex (false negatives)
- Applying blocking on test environments without a whitelist
- Not logging DLP decisions for auditing
- Ignoring performance: scanning large volumes without caching
Go Further
Deepen your skills with our certified application security training. Discover our DLP and Zero Trust paths.