April 22, 2026 · 10 min read

Deploy Langfuse on Kubernetes: Production Self-Hosted Guide (2026)

Self-host Langfuse v3 on Kubernetes in production: reference architecture, Helm values, Postgres + ClickHouse + Redis HA setup, S3 media storage, OIDC auth, backups, and observability hooks for GCC data-sovereign deployments.

Deploy Langfuse on Kubernetes: Production Self-Hosted Guide (2026)

Langfuse became the default open-source LLM observability choice in 2025 after overtaking Helicone and Phoenix on GitHub stars and production adoption. The managed cloud offering is fine for side projects, but enterprise teams running regulated workloads or wanting full control over trace data need the self-hosted version - and that means running Langfuse on Kubernetes.

This guide covers the production deployment we recommend to clients: a split-tier Helm install, HA dependencies, OIDC authentication, KEDA-driven worker scaling, and the specific GCC data-sovereignty patterns we’ve used with UAE clients.

Why self-host Langfuse at all

Three reasons teams move off Langfuse Cloud:

  1. Data sovereignty. Every prompt, completion, tool call, and user identifier that flows through Langfuse is training data in someone’s eyes. Regulated industries (fintech, healthcare, government) cannot ship raw LLM traces to EU or US regions.
  2. Trace volume cost. Langfuse Cloud pricing scales linearly with ingested observations. A chatty agent platform at 50M events/month hits a cloud bill larger than the self-hosted infrastructure cost by roughly 3-5x.
  3. Custom integrations. Self-hosted Langfuse lets you join trace data with internal billing, feature-flag, and incident systems via direct database access - something the SaaS API cannot match.

The trade-off: you now operate four stateful services. That’s the part most teams underestimate.

Langfuse v3 architecture refresher

Langfuse v3 (released September 2024) split the original single-container design into an async, event-driven topology:

                             ┌──────────────┐
            OTLP/HTTP    ───▶│ Langfuse Web │◀─── Browser UI
            Traces           │  (Next.js)   │
                             └──────┬───────┘
                                    │ enqueue
                                    ▼
                             ┌──────────────┐
                             │    Redis     │◀─── Rate limits, cache
                             │   (BullMQ)   │
                             └──────┬───────┘
                                    │ consume
                                    ▼
                             ┌──────────────┐
                             │   Worker     │
                             │ (Node.js)    │
                             └──┬────────┬──┘
                                │        │
            ┌───────────────────┘        └───────────────────┐
            ▼                                                ▼
    ┌──────────────┐                                ┌──────────────┐
    │   Postgres   │  Metadata, users,              │  ClickHouse  │  Traces, observations,
    │              │  projects, prompts, keys       │              │  scores, sessions
    └──────────────┘                                └──────┬───────┘
                                                           │ cold data
                                                           ▼
                                                    ┌──────────────┐
                                                    │  S3 / MinIO  │  Media, attachments
                                                    └──────────────┘

Key invariants:

  • Web tier is stateless. Scale it for UI and ingestion HTTP traffic.
  • Worker tier is stateless but I/O-bound. Scale it on queue depth, not CPU.
  • Postgres is the control plane. Small footprint, but it must be HA.
  • ClickHouse is the data plane. Grows linearly with event volume; needs ZooKeeper or ClickHouse Keeper for replication.
  • Redis holds ephemeral queue state. Needs HA for availability, not durability.
  • Object storage holds large payloads (images, PDFs, long completions). Lifecycle-policy it.

Reference architecture for production

Here’s the topology we deploy for clients on EKS, AKS, and sovereign K8s platforms like Core42:

ComponentRuntimeReplicasNotes
langfuse-webDeployment3Behind HPA (CPU 60%), min 2
langfuse-workerDeployment2-20Scaled by KEDA on Redis queue depth
PostgresCloudNativePG cluster or RDS/Aurora3 (primary + 2 replicas)Outside the chart
ClickHouseAltinity Operator cluster3 shards × 2 replicasWith ClickHouse Keeper, not bundled
RedisOperator (OT Redis) or ElastiCachePrimary + 1 replica, Sentinel
Object storageS3 / Azure Blob / MinIO tenantManagedSSE + lifecycle rules
Ingressingress-nginx2+cert-manager for Let’s Encrypt
AuthOIDC via Entra ID / Okta / Auth0ExternalNo local accounts in production

Do not use the Postgres, ClickHouse, or Redis subcharts shipped with langfuse-k8s. They are single-replica and assume emptyDir volumes. They exist to let you smoke-test the chart, not to run production.

Prerequisites

Before starting the deploy, have these ready:

# Required tooling
kubectl version --client                      # 1.28+
helm version                                  # 3.14+

Cluster add-ons we assume are installed:

  • cert-manager for TLS
  • ingress-nginx or Gateway API-compatible controller
  • external-secrets-operator for secret sync from AWS Secrets Manager / Azure Key Vault
  • metrics-server (for HPA)
  • KEDA (for worker queue-depth scaling)
  • prometheus-operator (for monitoring)

And three external dependencies provisioned:

  • Postgres 15+ with a dedicated langfuse database
  • ClickHouse 24.3+ cluster with a dedicated langfuse database
  • Redis 7+ (single instance or Sentinel cluster)
  • S3-compatible bucket with a dedicated IAM principal

Helm install: production values

Add the repo and create the namespace:

helm repo add langfuse https://langfuse.github.io/langfuse-k8s
helm repo update

kubectl create namespace langfuse
kubectl label namespace langfuse \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/audit=restricted

The production values.yaml we use as a baseline:

# values.prod.yaml
langfuse:
  image:
    tag: "3.54.0"                    # pin version, never use :latest
  nextauth:
    url: "https://langfuse.example.ae"
    secret:
      existingSecret:
        name: langfuse-secrets
        key: nextauth-secret
  salt:
    existingSecret:
      name: langfuse-secrets
      key: salt
  encryptionKey:
    existingSecret:
      name: langfuse-secrets
      key: encryption-key

  web:
    replicas: 3
    resources:
      requests:
        cpu: "500m"
        memory: "1Gi"
      limits:
        memory: "2Gi"
    podDisruptionBudget:
      enabled: true
      minAvailable: 2
    autoscaling:
      enabled: true
      minReplicas: 3
      maxReplicas: 10
      targetCPUUtilizationPercentage: 60

  worker:
    replicas: 2
    resources:
      requests:
        cpu: "500m"
        memory: "1Gi"
      limits:
        memory: "4Gi"
    podDisruptionBudget:
      enabled: true
      minAvailable: 1
    # KEDA ScaledObject defined separately (see below)

  additionalEnv:
    - name: TELEMETRY_ENABLED
      value: "false"
    - name: LANGFUSE_ENABLE_EXPERIMENTAL_FEATURES
      value: "false"
    - name: AUTH_DISABLE_USERNAME_PASSWORD
      value: "true"              # force OIDC in prod
    - name: AUTH_AZURE_AD_CLIENT_ID
      valueFrom:
        secretKeyRef:
          name: langfuse-oidc
          key: client-id
    - name: AUTH_AZURE_AD_CLIENT_SECRET
      valueFrom:
        secretKeyRef:
          name: langfuse-oidc
          key: client-secret
    - name: AUTH_AZURE_AD_TENANT_ID
      valueFrom:
        secretKeyRef:
          name: langfuse-oidc
          key: tenant-id

# Disable bundled subcharts - we bring our own
postgresql:
  deploy: false
  host: "langfuse-pg-rw.data.svc.cluster.local"
  port: 5432
  auth:
    username: langfuse
    existingSecret: langfuse-pg-creds
    secretKeys:
      userPasswordKey: password
  database: langfuse

clickhouse:
  deploy: false
  host: "chi-langfuse-cluster-0-0.clickhouse.svc.cluster.local"
  httpPort: 8123
  nativePort: 9000
  user: langfuse
  auth:
    existingSecret: langfuse-ch-creds
    secretKey: password
  database: langfuse
  cluster: langfuse-cluster

redis:
  deploy: false
  host: "langfuse-redis-master.data.svc.cluster.local"
  port: 6379
  auth:
    existingSecret: langfuse-redis-creds
    existingSecretPasswordKey: password

s3:
  deploy: false
  bucket: "langfuse-prod-me-central-1"
  region: "me-central-1"
  endpoint: "https://s3.me-central-1.amazonaws.com"
  forcePathStyle: false
  accessKeyId:
    secretKeyRef:
      name: langfuse-s3-creds
      key: access-key-id
  secretAccessKey:
    secretKeyRef:
      name: langfuse-s3-creds
      key: secret-access-key

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
  hosts:
    - host: langfuse.example.ae
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: langfuse-tls
      hosts:
        - langfuse.example.ae

serviceMonitor:
  enabled: true
  namespace: monitoring
  labels:
    release: kube-prometheus-stack

Apply it:

helm upgrade --install langfuse langfuse/langfuse \
  --namespace langfuse \
  --values values.prod.yaml \
  --version 1.2.0 \
  --wait --timeout 15m

KEDA autoscaling for the worker tier

Langfuse workers consume BullMQ queues. The job mix is bursty: a batch eval run can dump 100k events in seconds. Plain CPU HPA lags badly. Scale on queue depth instead:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: langfuse-redis-auth
  namespace: langfuse
spec:
  secretTargetRef:
    - parameter: password
      name: langfuse-redis-creds
      key: password
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: langfuse-worker
  namespace: langfuse
spec:
  scaleTargetRef:
    name: langfuse-worker
  minReplicaCount: 2
  maxReplicaCount: 20
  pollingInterval: 15
  cooldownPeriod: 300
  triggers:
    - type: redis
      metadata:
        address: langfuse-redis-master.data.svc.cluster.local:6379
        listName: "bull:ingestion:wait"
        listLength: "1000"
        enableTLS: "false"
      authenticationRef:
        name: langfuse-redis-auth
    - type: redis
      metadata:
        address: langfuse-redis-master.data.svc.cluster.local:6379
        listName: "bull:evaluation:wait"
        listLength: "200"
      authenticationRef:
        name: langfuse-redis-auth

The two triggers handle different workload shapes: ingestion tolerates up to 1,000 queued messages before scaling (high throughput, cheap per message), evaluation scales at 200 (low volume but each job runs an LLM evaluation and is expensive to let wait).

Postgres with CloudNativePG

If you’re running Postgres in-cluster, CloudNativePG is the right operator. Minimal production cluster:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: langfuse-pg
  namespace: data
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgresql:15.6
  primaryUpdateStrategy: unsupervised
  storage:
    size: 100Gi
    storageClass: gp3
  resources:
    requests:
      cpu: "1"
      memory: "4Gi"
    limits:
      memory: "8Gi"
  bootstrap:
    initdb:
      database: langfuse
      owner: langfuse
      secret:
        name: langfuse-pg-creds
  backup:
    barmanObjectStore:
      destinationPath: s3://langfuse-backups/pg
      s3Credentials:
        accessKeyId:
          name: pg-backup-creds
          key: access-key-id
        secretAccessKey:
          name: pg-backup-creds
          key: secret-access-key
      wal:
        retention: "7d"
      data:
        retention: "30d"
  monitoring:
    enablePodMonitor: true

That gives you point-in-time recovery to any second in the last seven days, daily full backups retained 30 days, and automatic failover on primary loss.

ClickHouse with the Altinity operator

ClickHouse is the component that actually breaks if you get it wrong. Use the Altinity operator with ClickHouse Keeper (drop ZooKeeper - it’s deprecated for new deployments):

apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
  name: langfuse
  namespace: clickhouse
spec:
  configuration:
    clusters:
      - name: langfuse-cluster
        layout:
          shardsCount: 3
          replicasCount: 2
    users:
      langfuse/password:
        valueFrom:
          secretKeyRef:
            name: langfuse-ch-creds
            key: password
      langfuse/networks/ip:
        - "10.0.0.0/8"
    profiles:
      default/max_memory_usage: 10000000000
      default/max_bytes_before_external_group_by: 5000000000
  defaults:
    templates:
      podTemplate: clickhouse-pod
      dataVolumeClaimTemplate: data-volume
  templates:
    podTemplates:
      - name: clickhouse-pod
        spec:
          containers:
            - name: clickhouse
              image: clickhouse/clickhouse-server:24.3
              resources:
                requests:
                  cpu: "2"
                  memory: "16Gi"
                limits:
                  memory: "16Gi"
    volumeClaimTemplates:
      - name: data-volume
        spec:
          storageClassName: gp3
          accessModes: [ReadWriteOnce]
          resources:
            requests:
              storage: 500Gi

Sizing note: budget ~20 GB ClickHouse storage per million observations as a starting heuristic. A team at 50M observations/month with 30-day retention needs ~30 TB of ClickHouse storage, roughly split across 3 shards.

Langfuse will create its own tables on first connect, but you need to pre-create the langfuse database:

kubectl exec -n clickhouse chi-langfuse-langfuse-cluster-0-0-0 -- \
  clickhouse-client --user langfuse --password "$CH_PASSWORD" \
  --query "CREATE DATABASE IF NOT EXISTS langfuse ON CLUSTER langfuse-cluster"

Network isolation

By default, anything in the cluster can hit ClickHouse and Postgres. Lock it down with a default-deny NetworkPolicy plus explicit allows:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-langfuse-to-clickhouse
  namespace: clickhouse
spec:
  podSelector:
    matchLabels:
      clickhouse.altinity.com/chi: langfuse
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: langfuse
          podSelector:
            matchLabels:
              app.kubernetes.io/name: langfuse
      ports:
        - protocol: TCP
          port: 8123
        - protocol: TCP
          port: 9000
  policyTypes: [Ingress]

Repeat for Postgres (port 5432) and Redis (6379). This is the first thing auditors ask about in a NESA or CBUAE review.

Observability hooks

Langfuse web and worker expose Prometheus metrics on :3000/api/metrics and :3030/metrics. With the serviceMonitor.enabled: true flag above, Prometheus Operator scrapes them automatically. The dashboards worth building from day one:

  • Ingestion lag: redis_list_length{list="bull:ingestion:wait"} vs. bull:ingestion:active. Alert when lag exceeds 5 minutes.
  • ClickHouse insert rate: rate(clickhouse_InsertQuery_total[5m]). Sudden drops mean the worker is stuck.
  • Worker error rate: rate(langfuse_worker_jobs_failed_total[5m]) / rate(langfuse_worker_jobs_total[5m]). Page at 5% sustained.
  • Postgres connection pool saturation: pg_stat_activity_count / pg_settings_max_connections. Langfuse defaults to a per-pod pool of 10; under-provisioning this causes mysterious 502s.

Ship Langfuse’s own application logs to your stack via OpenTelemetry:

additionalEnv:
  - name: OTEL_EXPORTER_OTLP_ENDPOINT
    value: "http://otel-collector.observability.svc.cluster.local:4318"
  - name: OTEL_SERVICE_NAME
    value: "langfuse-web"

This closes the loop: you have an LLM observability platform that is itself observable.

Sizing tiers we recommend

TierEvents/moWebWorkerPostgresClickHouseEst. monthly cost (AED, EKS me-central-1)
Small<10M2 × 1 vCPU2 × 1 vCPU3 × db.t3.medium3 × r6i.large, 500 GB~12,000
Medium10-100M4 × 2 vCPU5 × 2 vCPU3 × db.r6g.xlarge6 × r6i.xlarge, 4 TB~45,000
Large100M-1B8 × 2 vCPU10-20 × 2 vCPU3 × db.r6g.2xlarge9 × r6i.4xlarge, 30 TB~185,000

Those are steady-state numbers. Add 20-30% buffer for spike capacity, CI/load-test traffic, and dev/staging environments.

GCC data sovereignty checklist

For UAE clients subject to NESA, CBUAE, or ADGM requirements:

  • All compute (EKS / AKS / Core42) in me-central-1, me-south-1, or in-country sovereign zone
  • S3 bucket in the same region, with bucket-level PublicAccessBlock and SSE-KMS with a customer-managed key
  • Postgres, ClickHouse, Redis all in-region; no cross-region replicas to EU/US
  • OIDC federated to the client’s Entra ID or internal IdP - no local usernames
  • ClickHouse s3() table functions disabled to prevent data exfiltration via SELECT
  • Audit log shipped to the client’s SIEM (Splunk, Sentinel, Chronicle)
  • DPA in place with whoever operates the cluster (the client, a sovereign MSP, or NomadX)

Common failure modes we’ve debugged

  • “ClickHouse tables missing” errors at boot - the worker runs migrations but needs write access to create tables. Grant CREATE, ALTER, INSERT, SELECT on the langfuse database.
  • Workers stuck in CrashLoopBackOff with MaxRetriesPerRequestError - Redis AUTH password has special characters. Base64-encode via the secret, don’t inline.
  • Ingestion latency spikes at hour boundaries - a batch eval job is saturating the queue. Raise the KEDA maxReplicaCount or give evals a separate worker pool.
  • UI pages return 500 after deploy but API works - NextAuth secret mismatch between replicas. Ensure nextauth.secret is set via the Helm values, not left to random generation per pod.
  • ClickHouse OOM-killed under load - max_memory_usage default is too high relative to pod limits. Set it to 60-70% of the pod memory limit, not the node’s.

What to deploy next in the same cluster

Langfuse is the observability layer. The stack is most useful when it sits next to:

  • A vector database (Qdrant or Milvus) for your RAG retrieval path - see our upcoming Qdrant-on-K8s guide
  • An LLM gateway (LiteLLM) to centralize API keys and rate limits feeding Langfuse traces
  • A feature store (Feast) if you’re evaluating model quality against ground-truth business data

Each of these gets its own production deployment guide in this series.

Getting help

We run Langfuse in production for AI-native teams across the GCC, including ones subject to NESA and CBUAE data sovereignty. If you want a second set of eyes on your deployment topology, a migration from Langfuse Cloud to self-hosted, or a security review before you go live, our AI/ML Infrastructure on K8s engagement covers exactly this. Typical scope is 2-3 weeks from kickoff to production cutover.

Frequently Asked Questions

Can Langfuse run on Kubernetes in production?

Yes. Langfuse publishes an official Helm chart (langfuse/langfuse-k8s) that deploys the web and worker tiers, but the chart assumes you bring your own Postgres, ClickHouse, Redis, and S3-compatible object storage. Production deployments should run each of those as HA services (CloudNativePG, Altinity ClickHouse Operator, Redis with Sentinel or a managed service) rather than using the in-chart subcharts, which are single-replica and meant only for smoke testing.

What are the minimum resources to run Langfuse on Kubernetes?

A realistic small-tier production footprint is roughly 8 vCPU and 16 GB RAM across the Langfuse web and worker deployments, plus a 3-node ClickHouse cluster (4 vCPU / 16 GB / 200 GB SSD per node), HA Postgres (2 vCPU / 8 GB / 100 GB SSD), and a Redis pair (1 vCPU / 2 GB each). That sizing handles roughly 10–20 million trace events per month. Scale ClickHouse storage linearly with event volume and worker replicas with queue depth.

Why is ClickHouse required for Langfuse?

Langfuse v3 moved high-volume trace, observation, and score storage from Postgres to ClickHouse because Postgres cannot handle the append-heavy, wide-row, analytical query patterns at scale. Postgres still holds metadata (users, projects, prompts, API keys). If you skip ClickHouse or run it single-node, trace ingestion and dashboard queries will become the bottleneck above roughly 100,000 events per day.

How do I scale Langfuse workers based on queue depth?

Langfuse workers consume BullMQ queues backed by Redis. Use KEDA with the redis-streams or redis scaler to watch queue length and scale worker replicas between a min and max. A typical policy: scale up when queue depth exceeds 1,000 messages, scale down when depth is below 100 for five minutes, cap at 20 replicas for small tiers. Plain HPA on CPU is insufficient because ingestion spikes are I/O-bound, not CPU-bound.

How do I back up a self-hosted Langfuse deployment?

Back up each stateful component separately: (1) Postgres via pg_dump on a schedule or streaming logical replication to a disaster-recovery cluster; (2) ClickHouse via the native BACKUP command to S3 (Altinity operator automates this); (3) S3 media bucket via cross-region replication or lifecycle rules; (4) Kubernetes manifests via Velero. Redis does not need backups - it only holds ephemeral queue state. Test restore quarterly with a real recovery drill, not just a dry run.

Is Langfuse compliant with UAE data sovereignty requirements?

Langfuse is open-source and self-hostable, so compliance depends on where you run it. For UAE data residency (NESA, CBUAE, ADGM requirements), deploy into an in-region Kubernetes cluster (Azure UAE North, AWS Middle East Bahrain, or a sovereign cloud like Core42) and pin Postgres, ClickHouse, Redis, and the S3 bucket to the same region. The Langfuse SaaS offering stores data in the EU or US, which is unsuitable for regulated GCC workloads.

Get Started for Free

We would be happy to speak with you and arrange a free consultation with our Kubernetes Expert in Dubai, UAE. 30-minute call, actionable results in days.

Talk to an Expert