Introduction
A data catalog is a centralized repository that inventories an organization's datasets, enriched with metadata like descriptions, schemas, owners, and tags. It enables data discovery, governance, and collaboration, eliminating data silos. In a world of exploding data volumes, a well-designed data catalog boosts data team productivity by 30-50% according to Gartner.
This tutorial guides you through building a full-stack data catalog with Next.js 15 (App Router), Prisma for the ORM, PostgreSQL for a scalable DB, and a simple UI with Tailwind CSS. We'll implement:
- A Dataset model with JSON schema and tags.
- REST API for CRUD and full-text search.
- Listing and add interface with search.
Result: a production-ready prototype, deployable on Vercel. Estimated time: 30 min. Perfect for intermediate data engineers seeking a customizable open-source solution.
Prerequisites
- Node.js 20+ installed
- Docker and Docker Compose for PostgreSQL
- Basic knowledge of TypeScript, Next.js, and SQL
- GitHub account for optional deployment
- Tools: VS Code with Prisma and Tailwind extensions
Initialize the Next.js Project
npx create-next-app@latest data-catalog --typescript --tailwind --eslint --app --src-dir --import-alias "@/*"
cd data-catalog
npm install prisma @prisma/client
npx prisma init --datasource-provider postgresqlThis script creates a Next.js 15 project with TypeScript, Tailwind, and App Router. Prisma is installed for the ORM and initialized for PostgreSQL. Skip minimal templates to save time on Tailwind and ESLint setup.
PostgreSQL Database Setup
Launch a dedicated PostgreSQL container via Docker Compose to isolate your dev environment. Copy the code below into docker-compose.yml at the root, then update your Prisma connection URL.
Docker Compose for PostgreSQL
version: '3.8'
services:
postgres:
image: postgres:16
restart: always
environment:
POSTGRES_DB: datacatalog
POSTGRES_USER: user
POSTGRES_PASSWORD: password
ports:
- '5432:5432'
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:This file sets up a persistent PostgreSQL service with the datacatalog database. A volume prevents data loss. Run with docker compose up -d. Set DATABASE_URL=postgresql://user:password@localhost:5432/datacatalog in .env.
Prisma Schema for Dataset Model
generator client {
provider = "prisma-client-js"
}
datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
}
model Dataset {
id String @id @default(cuid())
name String @unique
description String?
schema Json?
owner String
tags String[]
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@map("datasets")
}This schema defines a Dataset model with essential fields: unique name, description, flexible JSON schema (for tables/cols), owner, and tags array. @@map for custom table name. Use Json for dynamic schemas without complex migrations.
Prisma Migration and Client Generation
docker compose up -d
cp .env.example .env # Si pas fait
# Éditez .env : DATABASE_URL="postgresql://user:password@localhost:5432/datacatalog"
npx prisma db push
npx prisma generate
npm run devPushes the schema to the DB without manual SQL migrations (great for dev). Generates the typed TS client. Run npm run dev to verify. Pitfall: Skip generate and types won't be available in your code.
Shared Prisma Client Implementation
Create a singleton utility for the Prisma client to avoid unnecessary reconnections in serverless environments (like Vercel). Place it in src/lib/prisma.ts.
Prisma Client Singleton
import { PrismaClient } from '@prisma/client'
const globalForPrisma = globalThis as unknown as {
prisma: PrismaClient | undefined
}
export const prisma = globalForPrisma.prisma ?? new PrismaClient()
if (process.env.NODE_ENV !== 'production') globalForPrisma.prisma = prismaGlobal singleton reuses the Prisma instance during dev hot-reload. Essential in Next.js serverless to prevent 100+ DB connections. In prod, one instance per cold start request.
API Route for Dataset CRUD
import { NextRequest, NextResponse } from 'next/server'
import { prisma } from '@/lib/prisma'
import { z } from 'zod'
const createSchema = z.object({
name: z.string().min(1),
description: z.string().optional(),
schema: z.object({}).passthrough().optional(),
owner: z.string().min(1),
tags: z.array(z.string()).optional().default([]),
})
export async function GET(request: NextRequest) {
const { searchParams } = new URL(request.url)
const q = searchParams.get('q') || ''
const datasets = await prisma.dataset.findMany({
where: {
OR: [
{ name: { contains: q, mode: 'insensitive' } },
{ description: { contains: q, mode: 'insensitive' } },
{ tags: { has: q } },
],
},
orderBy: { createdAt: 'desc' },
})
return NextResponse.json(datasets)
}
export async function POST(request: NextRequest) {
try {
const body = await request.json()
const validated = createSchema.parse(body)
const dataset = await prisma.dataset.create({ data: validated })
return NextResponse.json(dataset, { status: 201})
} catch (error) {
return NextResponse.json({ error: 'Validation failed' }, { status: 400 })
}
}Dynamic GET/POST route with Zod validation. Full-text search on name/desc/tags via Prisma OR. POST creates with safe parsing. Install with npm i zod first. Pitfall: Without mode: 'insensitive', search is case-sensitive.
User Interface for Listing and Adding Datasets
Tailwind is preconfigured. Create the /datasets page with a search form and dynamic list. Use native fetch for simplicity.
Datasets UI Page with Search
import { useState, useEffect } from 'react'
export default function DatasetsPage() {
const [datasets, setDatasets] = useState<any[]>([])
const [query, setQuery] = useState('')
const [loading, setLoading] = useState(false)
useEffect(() => {
fetchDatasets()
}, [query])
const fetchDatasets = async () => {
setLoading(true)
const params = new URLSearchParams({ q: query })
const res = await fetch(`/api/datasets?${params}`)
const data = await res.json()
setDatasets(data)
setLoading(false)
}
const addDataset = async (e: React.FormEvent) => {
e.preventDefault()
const form = new FormData(e.target as HTMLFormElement)
await fetch('/api/datasets', {
method: 'POST',
body: JSON.stringify(Object.fromEntries(form)),
})
fetchDatasets()
}
return (
<div className="container mx-auto p-8">
<h1 className="text-3xl font-bold mb-8">Data Catalog</h1>
<form onSubmit={addDataset} className="mb-8 p-4 border rounded">
<input name="name" placeholder="Nom dataset" className="border p-2 mr-2" required />
<input name="description" placeholder="Description" className="border p-2 mr-2" />
<input name="owner" placeholder="Owner" className="border p-2 mr-2" required />
<input name="tags" placeholder="tags,comma,separated" className="border p-2 mr-2" />
<button type="submit" className="bg-blue-500 text-white px-4 py-2 rounded">Ajouter</button>
</form>
<input
value={query}
onChange={(e) => setQuery(e.target.value)}
placeholder="Rechercher..."
className="w-full border p-4 mb-4 rounded"
/>
{loading ? (
<p>Chargement...</p>
) : (
<div className="grid gap-4">
{datasets.map((ds) => (
<div key={ds.id} className="border p-4 rounded shadow">
<h3 className="font-bold">{ds.name}</h3>
<p>{ds.description}</p>
<p>Owner: {ds.owner} | Tags: {ds.tags.join(', ')}</p>
<pre className="text-xs mt-2">Schema: {JSON.stringify(ds.schema, null, 2)}</pre>
</div>
))}
</div>
)}
</div>
)
}Hybrid React Server/Client page with useEffect for live search. Basic POST form (add hidden JSON schema input if needed). Tailwind for responsive UI. Native fetch avoids extra deps. Test at http://localhost:3000/datasets.
Best Practices
- Paginate queries: Add
take: 20, skipin Prisma for scalability. - Secure APIs: Integrate NextAuth or Clerk for roles (admin/reader).
- Index the DB: Use
@@index([name])and Postgrestsvectorfor full-text in prod. - Logging and monitoring: Sentry for API errors, Prometheus for metrics.
- Version schemas: Git + Prisma migrate from dev to prod.
Common Errors to Avoid
- DB connection leaks: Always use Prisma singleton in serverless.
- No validation: Zod/ZodError crashes without
.parse(body). - Unoptimized search: Without
mode: 'insensitive', results are incomplete. - Tag issues: String[] arrays need
tags: { has: q }for queries.
Next Steps
- Deploy on Vercel with Prisma Accelerate for global DB.
- Add lineage with parent/child models.
- Explore open-source Amundsen or DataHub for advanced features.
- Check out our Learni trainings: Advanced Data Engineering to scale your data catalog for enterprise.