Skip to content
Learni
View all tutorials
Data Engineering

How to Build a Data Catalog with Next.js and Prisma in 2026

Lire en français

Introduction

A data catalog is a centralized repository that inventories an organization's datasets, enriched with metadata like descriptions, schemas, owners, and tags. It enables data discovery, governance, and collaboration, eliminating data silos. In a world of exploding data volumes, a well-designed data catalog boosts data team productivity by 30-50% according to Gartner.

This tutorial guides you through building a full-stack data catalog with Next.js 15 (App Router), Prisma for the ORM, PostgreSQL for a scalable DB, and a simple UI with Tailwind CSS. We'll implement:

  • A Dataset model with JSON schema and tags.
  • REST API for CRUD and full-text search.
  • Listing and add interface with search.

Result: a production-ready prototype, deployable on Vercel. Estimated time: 30 min. Perfect for intermediate data engineers seeking a customizable open-source solution.

Prerequisites

  • Node.js 20+ installed
  • Docker and Docker Compose for PostgreSQL
  • Basic knowledge of TypeScript, Next.js, and SQL
  • GitHub account for optional deployment
  • Tools: VS Code with Prisma and Tailwind extensions

Initialize the Next.js Project

terminal
npx create-next-app@latest data-catalog --typescript --tailwind --eslint --app --src-dir --import-alias "@/*"
cd data-catalog
npm install prisma @prisma/client
npx prisma init --datasource-provider postgresql

This script creates a Next.js 15 project with TypeScript, Tailwind, and App Router. Prisma is installed for the ORM and initialized for PostgreSQL. Skip minimal templates to save time on Tailwind and ESLint setup.

PostgreSQL Database Setup

Launch a dedicated PostgreSQL container via Docker Compose to isolate your dev environment. Copy the code below into docker-compose.yml at the root, then update your Prisma connection URL.

Docker Compose for PostgreSQL

docker-compose.yml
version: '3.8'
services:
  postgres:
    image: postgres:16
    restart: always
    environment:
      POSTGRES_DB: datacatalog
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
    ports:
      - '5432:5432'
    volumes:
      - postgres_data:/var/lib/postgresql/data
volumes:
  postgres_data:

This file sets up a persistent PostgreSQL service with the datacatalog database. A volume prevents data loss. Run with docker compose up -d. Set DATABASE_URL=postgresql://user:password@localhost:5432/datacatalog in .env.

Prisma Schema for Dataset Model

prisma/schema.prisma
generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider  = "postgresql"
  url       = env("DATABASE_URL")
}

model Dataset {
  id          String   @id @default(cuid())
  name        String   @unique
  description String?
  schema      Json?
  owner       String
  tags        String[]
  createdAt   DateTime @default(now())
  updatedAt   DateTime @updatedAt

  @@map("datasets")
}

This schema defines a Dataset model with essential fields: unique name, description, flexible JSON schema (for tables/cols), owner, and tags array. @@map for custom table name. Use Json for dynamic schemas without complex migrations.

Prisma Migration and Client Generation

terminal
docker compose up -d
cp .env.example .env  # Si pas fait
# Éditez .env : DATABASE_URL="postgresql://user:password@localhost:5432/datacatalog"
npx prisma db push
npx prisma generate
npm run dev

Pushes the schema to the DB without manual SQL migrations (great for dev). Generates the typed TS client. Run npm run dev to verify. Pitfall: Skip generate and types won't be available in your code.

Shared Prisma Client Implementation

Create a singleton utility for the Prisma client to avoid unnecessary reconnections in serverless environments (like Vercel). Place it in src/lib/prisma.ts.

Prisma Client Singleton

src/lib/prisma.ts
import { PrismaClient } from '@prisma/client'

const globalForPrisma = globalThis as unknown as {
  prisma: PrismaClient | undefined
}

export const prisma = globalForPrisma.prisma ?? new PrismaClient()

if (process.env.NODE_ENV !== 'production') globalForPrisma.prisma = prisma

Global singleton reuses the Prisma instance during dev hot-reload. Essential in Next.js serverless to prevent 100+ DB connections. In prod, one instance per cold start request.

API Route for Dataset CRUD

src/app/api/datasets/route.ts
import { NextRequest, NextResponse } from 'next/server'
import { prisma } from '@/lib/prisma'
import { z } from 'zod'

const createSchema = z.object({
  name: z.string().min(1),
  description: z.string().optional(),
  schema: z.object({}).passthrough().optional(),
  owner: z.string().min(1),
  tags: z.array(z.string()).optional().default([]),
})

export async function GET(request: NextRequest) {
  const { searchParams } = new URL(request.url)
  const q = searchParams.get('q') || ''
  const datasets = await prisma.dataset.findMany({
    where: {
      OR: [
        { name: { contains: q, mode: 'insensitive' } },
        { description: { contains: q, mode: 'insensitive' } },
        { tags: { has: q } },
      ],
    },
    orderBy: { createdAt: 'desc' },
  })
  return NextResponse.json(datasets)
}

export async function POST(request: NextRequest) {
  try {
    const body = await request.json()
    const validated = createSchema.parse(body)
    const dataset = await prisma.dataset.create({ data: validated })
    return NextResponse.json(dataset, { status: 201})
  } catch (error) {
    return NextResponse.json({ error: 'Validation failed' }, { status: 400 })
  }
}

Dynamic GET/POST route with Zod validation. Full-text search on name/desc/tags via Prisma OR. POST creates with safe parsing. Install with npm i zod first. Pitfall: Without mode: 'insensitive', search is case-sensitive.

User Interface for Listing and Adding Datasets

Tailwind is preconfigured. Create the /datasets page with a search form and dynamic list. Use native fetch for simplicity.

Datasets UI Page with Search

src/app/datasets/page.tsx
import { useState, useEffect } from 'react'

export default function DatasetsPage() {
  const [datasets, setDatasets] = useState<any[]>([])
  const [query, setQuery] = useState('')
  const [loading, setLoading] = useState(false)

  useEffect(() => {
    fetchDatasets()
  }, [query])

  const fetchDatasets = async () => {
    setLoading(true)
    const params = new URLSearchParams({ q: query })
    const res = await fetch(`/api/datasets?${params}`)
    const data = await res.json()
    setDatasets(data)
    setLoading(false)
  }

  const addDataset = async (e: React.FormEvent) => {
    e.preventDefault()
    const form = new FormData(e.target as HTMLFormElement)
    await fetch('/api/datasets', {
      method: 'POST',
      body: JSON.stringify(Object.fromEntries(form)),
    })
    fetchDatasets()
  }

  return (
    <div className="container mx-auto p-8">
      <h1 className="text-3xl font-bold mb-8">Data Catalog</h1>
      <form onSubmit={addDataset} className="mb-8 p-4 border rounded">
        <input name="name" placeholder="Nom dataset" className="border p-2 mr-2" required />
        <input name="description" placeholder="Description" className="border p-2 mr-2" />
        <input name="owner" placeholder="Owner" className="border p-2 mr-2" required />
        <input name="tags" placeholder="tags,comma,separated" className="border p-2 mr-2" />
        <button type="submit" className="bg-blue-500 text-white px-4 py-2 rounded">Ajouter</button>
      </form>
      <input
        value={query}
        onChange={(e) => setQuery(e.target.value)}
        placeholder="Rechercher..."
        className="w-full border p-4 mb-4 rounded"
      />
      {loading ? (
        <p>Chargement...</p>
      ) : (
        <div className="grid gap-4">
          {datasets.map((ds) => (
            <div key={ds.id} className="border p-4 rounded shadow">
              <h3 className="font-bold">{ds.name}</h3>
              <p>{ds.description}</p>
              <p>Owner: {ds.owner} | Tags: {ds.tags.join(', ')}</p>
              <pre className="text-xs mt-2">Schema: {JSON.stringify(ds.schema, null, 2)}</pre>
            </div>
          ))}
        </div>
      )}
    </div>
  )
}

Hybrid React Server/Client page with useEffect for live search. Basic POST form (add hidden JSON schema input if needed). Tailwind for responsive UI. Native fetch avoids extra deps. Test at http://localhost:3000/datasets.

Best Practices

  • Paginate queries: Add take: 20, skip in Prisma for scalability.
  • Secure APIs: Integrate NextAuth or Clerk for roles (admin/reader).
  • Index the DB: Use @@index([name]) and Postgres tsvector for full-text in prod.
  • Logging and monitoring: Sentry for API errors, Prometheus for metrics.
  • Version schemas: Git + Prisma migrate from dev to prod.

Common Errors to Avoid

  • DB connection leaks: Always use Prisma singleton in serverless.
  • No validation: Zod/ZodError crashes without .parse(body).
  • Unoptimized search: Without mode: 'insensitive', results are incomplete.
  • Tag issues: String[] arrays need tags: { has: q } for queries.

Next Steps

  • Deploy on Vercel with Prisma Accelerate for global DB.
  • Add lineage with parent/child models.
  • Explore open-source Amundsen or DataHub for advanced features.
  • Check out our Learni trainings: Advanced Data Engineering to scale your data catalog for enterprise.