Introduction
Neo4j, the leading graph database, excels at handling complex relationships like social networks, fraud detection, or recommendation engines. In 2026, with Graph Data Science (GDS) 2.x and optimized Cypher, it manages billions of nodes in real time. This advanced tutorial takes you from Docker setup to Node.js integration, deep traversals, centrality algorithms, and performance tuning. Why does it matter? Graphs crush RDBMS on joins (up to 1000x faster) and scale horizontally with Fabric. You'll learn clean modeling, O(1) indexing, and how to dodge pitfalls like unbounded traversals. By the end, you'll have a working fraud detection system—bookmark this for your mission-critical projects. (132 words)
Prerequisites
- Docker 24+ installed and running
- Node.js 20+ with npm
- Intermediate Cypher knowledge (MATCH, CREATE)
- Git to clone optional examples
- Minimum 8GB RAM for GDS tests
Installing Neo4j with Docker
docker run \
--name neo4j-adv \
--publish=7474:7474 --publish=7687:7687 \
-p 7473:7473 \
-v $HOME/neo4j/data:/data \
-v $HOME/neo4j/logs:/logs \
-v $HOME/neo4j/import:/import \
-v $HOME/neo4j/plugins:/plugins \
-v $HOME/neo4j/conf:/conf \
-e NEO4J_AUTH=neo4j/Passw0rd2026 \
-e NEO4J_PLUGINS='["graph-data-science","apoc"]' \
-e NEO4J_dbms_memory_pagecache_size=2G \
neo4j:5.21-enterpriseThis script launches Neo4j Enterprise 5.21 with GDS and APOC enabled, persistent volumes, and default auth. Port 7474 is for the Browser, 7687 for Bolt (drivers). Adjust RAM via pagecache for datasets >1M nodes; skip Community edition for pro GDS features.
First Connection and Setup
Head to http://localhost:7474 and log in with neo4j/Passw0rd2026. Run :server switch bolt://localhost:7687 in the Browser. Activate GDS with CALL gds.debug.sysInfo() and APOC via CALL apoc.help("terms"). Import a test dataset: grab LDBC SNB into /import. This sets up the foundation for advanced traversals.
Fraud Detection Graph Modeling
CREATE CONSTRAINT account_id IF NOT EXISTS FOR (a:Account) REQUIRE a.id IS UNIQUE;
CREATE CONSTRAINT user_id IF NOT EXISTS FOR (u:User) REQUIRE u.id IS UNIQUE;
CREATE CONSTRAINT transaction_id IF NOT EXISTS FOR (t:Transaction) REQUIRE t.id IS UNIQUE;
// Base nodes (100 accounts/users)
UNWIND range(1,100) AS i
CREATE (u:User {id: 'U'+i, name: 'User'+i, country: 'FR'})
CREATE (a:Account {id: 'A'+i, balance: rand()*10000, userId: 'U'+i});
// Fraudulent transactions (suspicious links)
UNWIND [
{from: 'A1', to: 'A50', amount: 50000, time: datetime()},
{from: 'A50', to: 'A1', amount: 49900, time: datetime() + duration('PT1H')}
] AS tx
MATCH (from:Account {id: tx.from}), (to:Account {id: tx.to})
CREATE (t:Transaction {id: randomUUID(), amount: tx.amount, time: tx.time})
CREATE (from)-[:TRANSFERRED {amount: tx.amount}]->(t)-[:TO]->(to)
CREATE (u:User {id: 'Fraudster1', name: 'Suspect', country: 'XX'})-[:OWNS]->(a:Account {id: 'A50', balance: -1000});This script sets up UNIQUE constraints for performance, then builds a fraud graph with suspicious cycles (A1<->A50). Includes 100 base nodes + 2 circular transactions. Use UNWIND for scalable bulk inserts; UUIDs for immutable IDs. Run in the Browser for ~50ms execution.
Deep Traversal and Pattern Matching
// Find fraudulent cycles (depth 3)
MATCH path = (source:Account {id: 'A1'})-[:TRANSFERRED*1..3]->(target:Account)
WHERE source <> target AND ALL(r IN rels(path) | r.amount > 40000)
RETURN path, reduce(total=0.0, r IN rels(path) | total + r.amount) AS totalAmount
ORDER BY length(path) ASC
LIMIT 5;
// Shortest path with weights
MATCH (start:Account {id:'A1'}), (end:Account {id:'A50'})
CALL gds.shortestPath.dijkstra.stream({
sourceNode: start, targetNode: end,
relationshipType: 'TRANSFERRED',
relationshipWeightProperty: 'amount'
})
YIELD index, sourceNode, targetNode, totalCost, path
RETURN gds.util.asNode(sourceNode).id AS from,
gds.util.asNode(targetNode).id AS to,
totalCost, path
ORDER BY index;The first query detects cycles using bounded variable-length paths (*1..3) and an ALL() predicate. The second leverages GDS Dijkstra for weighted shortest paths (by amount). Bounds prevent combinatorial explosions; stream() enables lazy evaluation. Result: fraudulent paths in <10ms.
Optimization with Indexes and GDS
Without indexes, traversals explode: O(n!) vs O(log n). Add composite indexes for recurring patterns. GDS projects graphs into memory for parallel algorithms (PageRank, Louvain). For fraud detection, compute betweenness centrality: central nodes are prime suspects.
Indexes, Constraints, and GDS Centrality
// Composite indexes for performance
CREATE INDEX account_balance IF NOT EXISTS FOR (a:Account) ON (a.balance, a.id);
CREATE INDEX transaction_time IF NOT EXISTS FOR (t:Transaction) ON (t.time);
// Project GDS graph (in-memory)
CALL gds.graph.project(
'fraudGraph',
'Account',
'TRANSFERRED',
{relationshipProperties: 'amount'}
);
// Betweenness Centrality (fraud score)
CALL gds.betweenness.stream('fraudGraph', {
maxIterations: 100,
centralityRangeFrom: 0.9
})
YIELD nodeId, score
WHERE gds.util.asNode(nodeId).balance < 0
RETURN gds.util.asNode(nodeId).id AS account,
score AS fraudScore
ORDER BY score DESC
LIMIT 10;
// Drop graph
CALL gds.graph.drop('fraudGraph');Indexes on filtered properties supercharge SCANs. GDS project() builds an in-memory view (scales to 1B edges). Betweenness spots fraudulent brokers; rangeFrom prunes low scores. Drop() frees RAM; in production, loop via Cypher procedures.
Node.js Integration with Official Driver
import { Neo4j } from 'neo4j-driver';
const driver = Neo4j.driver(
'bolt://localhost:7687',
Neo4j.auth.basic('neo4j', 'Passw0rd2026')
);
const detectFraud = async () => {
const session = driver.session({ database: 'neo4j' });
try {
const res = await session.executeRead(tx => tx.run(
`MATCH (a1:Account {id: $id1})-[:TRANSFERRED*1..2]-(a2:Account)
WHERE a2.balance < 0
RETURN a1.id AS source, a2.id AS suspect, count(*) AS hops`,
{ id1: 'A1' }
));
res.records.forEach(record => {
console.log(`Fraud path: ${record.get('source')} -> ${record.get('suspect')} (${record.get('hops')} hops)`);
});
} catch (error) {
console.error('Query failed:', error);
} finally {
await session.close();
}
};
detectFraud().finally(() => driver.close());The official TS driver handles session pooling. executeRead() ensures safe read-only queries (ACID-compliant). Parameterized $id1 prevents injections. Explicit close() for production; use RxJS for async streams. Install with npm i neo4j-driver. Run via ts-node.
Advanced Write Transactions with APOC
// Bulk updates with APOC (path expansion)
MATCH (a:Account {id: 'A1'})-[:TRANSFERRED*]->(suspicious:Account {balance: -1000})
CALL apoc.do.case([
length((a)-[:TRANSFERRED*]->(suspicious)) > 1, 'SET suspicious.riskScore = 0.95',
suspicious.country <> 'FR', 'DETACH DELETE suspicious'
], '', {suspicious: suspicious, a: a}) YIELD value
RETURN value;APOC.do.case() handles conditional bulk updates. Path expansion *-> detects cycles. Add riskScore for downstream ML. DETACH DELETE cleans the graph; run in write transactions for atomicity.
Best Practices
- Model directionally: [:OWNS]-> over bidirectional for fast traversals.
- Always bound paths (*1..5) and use PROFILE for EXPLAIN plans.
- Batch GDS: project() once, then stream/write multiple algorithms.
- Selective indexes: Only on equality + range properties.
- Fabric for scaling: Shard across instances with zero downtime.
Common Mistakes to Avoid
- Unbounded traversals: Memory explosions (cap at ..5).
- Forgetting close() on driver/session: Bolt connection leaks.
- Indexes on labels alone: Slow full scans >1M nodes.
- GDS on live graphs: project() doubles RAM usage.
Next Steps
Try Neo4j Sandbox for ready datasets. Read the Graph Algorithms Book. Master AuraDB for cloud. Check our Learni graph training and Neo4j certification.