Deploy Langfuse on Kubernetes: Production Self-Hosted Guide (2026)
Self-host Langfuse v3 on Kubernetes in production: reference architecture, Helm values, Postgres + ClickHouse + Redis HA setup, S3 media storage, OIDC auth, backups, and observability hooks for GCC data-sovereign deployments.
Langfuse became the default open-source LLM observability choice in 2025 after overtaking Helicone and Phoenix on GitHub stars and production adoption. The managed cloud offering is fine for side projects, but enterprise teams running regulated workloads or wanting full control over trace data need the self-hosted version - and that means running Langfuse on Kubernetes.
This guide covers the production deployment we recommend to clients: a split-tier Helm install, HA dependencies, OIDC authentication, KEDA-driven worker scaling, and the specific GCC data-sovereignty patterns we’ve used with UAE clients.
Why self-host Langfuse at all
Three reasons teams move off Langfuse Cloud:
- Data sovereignty. Every prompt, completion, tool call, and user identifier that flows through Langfuse is training data in someone’s eyes. Regulated industries (fintech, healthcare, government) cannot ship raw LLM traces to EU or US regions.
- Trace volume cost. Langfuse Cloud pricing scales linearly with ingested observations. A chatty agent platform at 50M events/month hits a cloud bill larger than the self-hosted infrastructure cost by roughly 3-5x.
- Custom integrations. Self-hosted Langfuse lets you join trace data with internal billing, feature-flag, and incident systems via direct database access - something the SaaS API cannot match.
The trade-off: you now operate four stateful services. That’s the part most teams underestimate.
Langfuse v3 architecture refresher
Langfuse v3 (released September 2024) split the original single-container design into an async, event-driven topology:
┌──────────────┐
OTLP/HTTP ───▶│ Langfuse Web │◀─── Browser UI
Traces │ (Next.js) │
└──────┬───────┘
│ enqueue
▼
┌──────────────┐
│ Redis │◀─── Rate limits, cache
│ (BullMQ) │
└──────┬───────┘
│ consume
▼
┌──────────────┐
│ Worker │
│ (Node.js) │
└──┬────────┬──┘
│ │
┌───────────────────┘ └───────────────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Postgres │ Metadata, users, │ ClickHouse │ Traces, observations,
│ │ projects, prompts, keys │ │ scores, sessions
└──────────────┘ └──────┬───────┘
│ cold data
▼
┌──────────────┐
│ S3 / MinIO │ Media, attachments
└──────────────┘
Key invariants:
- Web tier is stateless. Scale it for UI and ingestion HTTP traffic.
- Worker tier is stateless but I/O-bound. Scale it on queue depth, not CPU.
- Postgres is the control plane. Small footprint, but it must be HA.
- ClickHouse is the data plane. Grows linearly with event volume; needs ZooKeeper or ClickHouse Keeper for replication.
- Redis holds ephemeral queue state. Needs HA for availability, not durability.
- Object storage holds large payloads (images, PDFs, long completions). Lifecycle-policy it.
Reference architecture for production
Here’s the topology we deploy for clients on EKS, AKS, and sovereign K8s platforms like Core42:
| Component | Runtime | Replicas | Notes |
|---|---|---|---|
| langfuse-web | Deployment | 3 | Behind HPA (CPU 60%), min 2 |
| langfuse-worker | Deployment | 2-20 | Scaled by KEDA on Redis queue depth |
| Postgres | CloudNativePG cluster or RDS/Aurora | 3 (primary + 2 replicas) | Outside the chart |
| ClickHouse | Altinity Operator cluster | 3 shards × 2 replicas | With ClickHouse Keeper, not bundled |
| Redis | Operator (OT Redis) or ElastiCache | Primary + 1 replica, Sentinel | |
| Object storage | S3 / Azure Blob / MinIO tenant | Managed | SSE + lifecycle rules |
| Ingress | ingress-nginx | 2+ | cert-manager for Let’s Encrypt |
| Auth | OIDC via Entra ID / Okta / Auth0 | External | No local accounts in production |
Do not use the Postgres, ClickHouse, or Redis subcharts shipped with langfuse-k8s. They are single-replica and assume emptyDir volumes. They exist to let you smoke-test the chart, not to run production.
Prerequisites
Before starting the deploy, have these ready:
# Required tooling
kubectl version --client # 1.28+
helm version # 3.14+
Cluster add-ons we assume are installed:
- cert-manager for TLS
- ingress-nginx or Gateway API-compatible controller
- external-secrets-operator for secret sync from AWS Secrets Manager / Azure Key Vault
- metrics-server (for HPA)
- KEDA (for worker queue-depth scaling)
- prometheus-operator (for monitoring)
And three external dependencies provisioned:
- Postgres 15+ with a dedicated
langfusedatabase - ClickHouse 24.3+ cluster with a dedicated
langfusedatabase - Redis 7+ (single instance or Sentinel cluster)
- S3-compatible bucket with a dedicated IAM principal
Helm install: production values
Add the repo and create the namespace:
helm repo add langfuse https://langfuse.github.io/langfuse-k8s
helm repo update
kubectl create namespace langfuse
kubectl label namespace langfuse \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/audit=restricted
The production values.yaml we use as a baseline:
# values.prod.yaml
langfuse:
image:
tag: "3.54.0" # pin version, never use :latest
nextauth:
url: "https://langfuse.example.ae"
secret:
existingSecret:
name: langfuse-secrets
key: nextauth-secret
salt:
existingSecret:
name: langfuse-secrets
key: salt
encryptionKey:
existingSecret:
name: langfuse-secrets
key: encryption-key
web:
replicas: 3
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
memory: "2Gi"
podDisruptionBudget:
enabled: true
minAvailable: 2
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 60
worker:
replicas: 2
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
memory: "4Gi"
podDisruptionBudget:
enabled: true
minAvailable: 1
# KEDA ScaledObject defined separately (see below)
additionalEnv:
- name: TELEMETRY_ENABLED
value: "false"
- name: LANGFUSE_ENABLE_EXPERIMENTAL_FEATURES
value: "false"
- name: AUTH_DISABLE_USERNAME_PASSWORD
value: "true" # force OIDC in prod
- name: AUTH_AZURE_AD_CLIENT_ID
valueFrom:
secretKeyRef:
name: langfuse-oidc
key: client-id
- name: AUTH_AZURE_AD_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: langfuse-oidc
key: client-secret
- name: AUTH_AZURE_AD_TENANT_ID
valueFrom:
secretKeyRef:
name: langfuse-oidc
key: tenant-id
# Disable bundled subcharts - we bring our own
postgresql:
deploy: false
host: "langfuse-pg-rw.data.svc.cluster.local"
port: 5432
auth:
username: langfuse
existingSecret: langfuse-pg-creds
secretKeys:
userPasswordKey: password
database: langfuse
clickhouse:
deploy: false
host: "chi-langfuse-cluster-0-0.clickhouse.svc.cluster.local"
httpPort: 8123
nativePort: 9000
user: langfuse
auth:
existingSecret: langfuse-ch-creds
secretKey: password
database: langfuse
cluster: langfuse-cluster
redis:
deploy: false
host: "langfuse-redis-master.data.svc.cluster.local"
port: 6379
auth:
existingSecret: langfuse-redis-creds
existingSecretPasswordKey: password
s3:
deploy: false
bucket: "langfuse-prod-me-central-1"
region: "me-central-1"
endpoint: "https://s3.me-central-1.amazonaws.com"
forcePathStyle: false
accessKeyId:
secretKeyRef:
name: langfuse-s3-creds
key: access-key-id
secretAccessKey:
secretKeyRef:
name: langfuse-s3-creds
key: secret-access-key
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
hosts:
- host: langfuse.example.ae
paths:
- path: /
pathType: Prefix
tls:
- secretName: langfuse-tls
hosts:
- langfuse.example.ae
serviceMonitor:
enabled: true
namespace: monitoring
labels:
release: kube-prometheus-stack
Apply it:
helm upgrade --install langfuse langfuse/langfuse \
--namespace langfuse \
--values values.prod.yaml \
--version 1.2.0 \
--wait --timeout 15m
KEDA autoscaling for the worker tier
Langfuse workers consume BullMQ queues. The job mix is bursty: a batch eval run can dump 100k events in seconds. Plain CPU HPA lags badly. Scale on queue depth instead:
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: langfuse-redis-auth
namespace: langfuse
spec:
secretTargetRef:
- parameter: password
name: langfuse-redis-creds
key: password
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: langfuse-worker
namespace: langfuse
spec:
scaleTargetRef:
name: langfuse-worker
minReplicaCount: 2
maxReplicaCount: 20
pollingInterval: 15
cooldownPeriod: 300
triggers:
- type: redis
metadata:
address: langfuse-redis-master.data.svc.cluster.local:6379
listName: "bull:ingestion:wait"
listLength: "1000"
enableTLS: "false"
authenticationRef:
name: langfuse-redis-auth
- type: redis
metadata:
address: langfuse-redis-master.data.svc.cluster.local:6379
listName: "bull:evaluation:wait"
listLength: "200"
authenticationRef:
name: langfuse-redis-auth
The two triggers handle different workload shapes: ingestion tolerates up to 1,000 queued messages before scaling (high throughput, cheap per message), evaluation scales at 200 (low volume but each job runs an LLM evaluation and is expensive to let wait).
Postgres with CloudNativePG
If you’re running Postgres in-cluster, CloudNativePG is the right operator. Minimal production cluster:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: langfuse-pg
namespace: data
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:15.6
primaryUpdateStrategy: unsupervised
storage:
size: 100Gi
storageClass: gp3
resources:
requests:
cpu: "1"
memory: "4Gi"
limits:
memory: "8Gi"
bootstrap:
initdb:
database: langfuse
owner: langfuse
secret:
name: langfuse-pg-creds
backup:
barmanObjectStore:
destinationPath: s3://langfuse-backups/pg
s3Credentials:
accessKeyId:
name: pg-backup-creds
key: access-key-id
secretAccessKey:
name: pg-backup-creds
key: secret-access-key
wal:
retention: "7d"
data:
retention: "30d"
monitoring:
enablePodMonitor: true
That gives you point-in-time recovery to any second in the last seven days, daily full backups retained 30 days, and automatic failover on primary loss.
ClickHouse with the Altinity operator
ClickHouse is the component that actually breaks if you get it wrong. Use the Altinity operator with ClickHouse Keeper (drop ZooKeeper - it’s deprecated for new deployments):
apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
name: langfuse
namespace: clickhouse
spec:
configuration:
clusters:
- name: langfuse-cluster
layout:
shardsCount: 3
replicasCount: 2
users:
langfuse/password:
valueFrom:
secretKeyRef:
name: langfuse-ch-creds
key: password
langfuse/networks/ip:
- "10.0.0.0/8"
profiles:
default/max_memory_usage: 10000000000
default/max_bytes_before_external_group_by: 5000000000
defaults:
templates:
podTemplate: clickhouse-pod
dataVolumeClaimTemplate: data-volume
templates:
podTemplates:
- name: clickhouse-pod
spec:
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:24.3
resources:
requests:
cpu: "2"
memory: "16Gi"
limits:
memory: "16Gi"
volumeClaimTemplates:
- name: data-volume
spec:
storageClassName: gp3
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 500Gi
Sizing note: budget ~20 GB ClickHouse storage per million observations as a starting heuristic. A team at 50M observations/month with 30-day retention needs ~30 TB of ClickHouse storage, roughly split across 3 shards.
Langfuse will create its own tables on first connect, but you need to pre-create the langfuse database:
kubectl exec -n clickhouse chi-langfuse-langfuse-cluster-0-0-0 -- \
clickhouse-client --user langfuse --password "$CH_PASSWORD" \
--query "CREATE DATABASE IF NOT EXISTS langfuse ON CLUSTER langfuse-cluster"
Network isolation
By default, anything in the cluster can hit ClickHouse and Postgres. Lock it down with a default-deny NetworkPolicy plus explicit allows:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-langfuse-to-clickhouse
namespace: clickhouse
spec:
podSelector:
matchLabels:
clickhouse.altinity.com/chi: langfuse
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: langfuse
podSelector:
matchLabels:
app.kubernetes.io/name: langfuse
ports:
- protocol: TCP
port: 8123
- protocol: TCP
port: 9000
policyTypes: [Ingress]
Repeat for Postgres (port 5432) and Redis (6379). This is the first thing auditors ask about in a NESA or CBUAE review.
Observability hooks
Langfuse web and worker expose Prometheus metrics on :3000/api/metrics and :3030/metrics. With the serviceMonitor.enabled: true flag above, Prometheus Operator scrapes them automatically. The dashboards worth building from day one:
- Ingestion lag:
redis_list_length{list="bull:ingestion:wait"}vs.bull:ingestion:active. Alert when lag exceeds 5 minutes. - ClickHouse insert rate:
rate(clickhouse_InsertQuery_total[5m]). Sudden drops mean the worker is stuck. - Worker error rate:
rate(langfuse_worker_jobs_failed_total[5m]) / rate(langfuse_worker_jobs_total[5m]). Page at 5% sustained. - Postgres connection pool saturation:
pg_stat_activity_count / pg_settings_max_connections. Langfuse defaults to a per-pod pool of 10; under-provisioning this causes mysterious 502s.
Ship Langfuse’s own application logs to your stack via OpenTelemetry:
additionalEnv:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector.observability.svc.cluster.local:4318"
- name: OTEL_SERVICE_NAME
value: "langfuse-web"
This closes the loop: you have an LLM observability platform that is itself observable.
Sizing tiers we recommend
| Tier | Events/mo | Web | Worker | Postgres | ClickHouse | Est. monthly cost (AED, EKS me-central-1) |
|---|---|---|---|---|---|---|
| Small | <10M | 2 × 1 vCPU | 2 × 1 vCPU | 3 × db.t3.medium | 3 × r6i.large, 500 GB | ~12,000 |
| Medium | 10-100M | 4 × 2 vCPU | 5 × 2 vCPU | 3 × db.r6g.xlarge | 6 × r6i.xlarge, 4 TB | ~45,000 |
| Large | 100M-1B | 8 × 2 vCPU | 10-20 × 2 vCPU | 3 × db.r6g.2xlarge | 9 × r6i.4xlarge, 30 TB | ~185,000 |
Those are steady-state numbers. Add 20-30% buffer for spike capacity, CI/load-test traffic, and dev/staging environments.
GCC data sovereignty checklist
For UAE clients subject to NESA, CBUAE, or ADGM requirements:
- All compute (EKS / AKS / Core42) in
me-central-1,me-south-1, or in-country sovereign zone - S3 bucket in the same region, with bucket-level
PublicAccessBlockand SSE-KMS with a customer-managed key - Postgres, ClickHouse, Redis all in-region; no cross-region replicas to EU/US
- OIDC federated to the client’s Entra ID or internal IdP - no local usernames
- ClickHouse
s3()table functions disabled to prevent data exfiltration via SELECT - Audit log shipped to the client’s SIEM (Splunk, Sentinel, Chronicle)
- DPA in place with whoever operates the cluster (the client, a sovereign MSP, or NomadX)
Common failure modes we’ve debugged
- “ClickHouse tables missing” errors at boot - the worker runs migrations but needs write access to create tables. Grant
CREATE,ALTER,INSERT,SELECTon thelangfusedatabase. - Workers stuck in
CrashLoopBackOffwithMaxRetriesPerRequestError- Redis AUTH password has special characters. Base64-encode via the secret, don’t inline. - Ingestion latency spikes at hour boundaries - a batch eval job is saturating the queue. Raise the KEDA
maxReplicaCountor give evals a separate worker pool. - UI pages return
500after deploy but API works - NextAuth secret mismatch between replicas. Ensurenextauth.secretis set via the Helm values, not left to random generation per pod. - ClickHouse OOM-killed under load -
max_memory_usagedefault is too high relative to pod limits. Set it to 60-70% of the pod memory limit, not the node’s.
What to deploy next in the same cluster
Langfuse is the observability layer. The stack is most useful when it sits next to:
- A vector database (Qdrant or Milvus) for your RAG retrieval path - see our upcoming Qdrant-on-K8s guide
- An LLM gateway (LiteLLM) to centralize API keys and rate limits feeding Langfuse traces
- A feature store (Feast) if you’re evaluating model quality against ground-truth business data
Each of these gets its own production deployment guide in this series.
Getting help
We run Langfuse in production for AI-native teams across the GCC, including ones subject to NESA and CBUAE data sovereignty. If you want a second set of eyes on your deployment topology, a migration from Langfuse Cloud to self-hosted, or a security review before you go live, our AI/ML Infrastructure on K8s engagement covers exactly this. Typical scope is 2-3 weeks from kickoff to production cutover.
Frequently Asked Questions
Can Langfuse run on Kubernetes in production?
Yes. Langfuse publishes an official Helm chart (langfuse/langfuse-k8s) that deploys the web and worker tiers, but the chart assumes you bring your own Postgres, ClickHouse, Redis, and S3-compatible object storage. Production deployments should run each of those as HA services (CloudNativePG, Altinity ClickHouse Operator, Redis with Sentinel or a managed service) rather than using the in-chart subcharts, which are single-replica and meant only for smoke testing.
What are the minimum resources to run Langfuse on Kubernetes?
A realistic small-tier production footprint is roughly 8 vCPU and 16 GB RAM across the Langfuse web and worker deployments, plus a 3-node ClickHouse cluster (4 vCPU / 16 GB / 200 GB SSD per node), HA Postgres (2 vCPU / 8 GB / 100 GB SSD), and a Redis pair (1 vCPU / 2 GB each). That sizing handles roughly 10–20 million trace events per month. Scale ClickHouse storage linearly with event volume and worker replicas with queue depth.
Why is ClickHouse required for Langfuse?
Langfuse v3 moved high-volume trace, observation, and score storage from Postgres to ClickHouse because Postgres cannot handle the append-heavy, wide-row, analytical query patterns at scale. Postgres still holds metadata (users, projects, prompts, API keys). If you skip ClickHouse or run it single-node, trace ingestion and dashboard queries will become the bottleneck above roughly 100,000 events per day.
How do I scale Langfuse workers based on queue depth?
Langfuse workers consume BullMQ queues backed by Redis. Use KEDA with the redis-streams or redis scaler to watch queue length and scale worker replicas between a min and max. A typical policy: scale up when queue depth exceeds 1,000 messages, scale down when depth is below 100 for five minutes, cap at 20 replicas for small tiers. Plain HPA on CPU is insufficient because ingestion spikes are I/O-bound, not CPU-bound.
How do I back up a self-hosted Langfuse deployment?
Back up each stateful component separately: (1) Postgres via pg_dump on a schedule or streaming logical replication to a disaster-recovery cluster; (2) ClickHouse via the native BACKUP command to S3 (Altinity operator automates this); (3) S3 media bucket via cross-region replication or lifecycle rules; (4) Kubernetes manifests via Velero. Redis does not need backups - it only holds ephemeral queue state. Test restore quarterly with a real recovery drill, not just a dry run.
Is Langfuse compliant with UAE data sovereignty requirements?
Langfuse is open-source and self-hostable, so compliance depends on where you run it. For UAE data residency (NESA, CBUAE, ADGM requirements), deploy into an in-region Kubernetes cluster (Azure UAE North, AWS Middle East Bahrain, or a sovereign cloud like Core42) and pin Postgres, ClickHouse, Redis, and the S3 bucket to the same region. The Langfuse SaaS offering stores data in the EU or US, which is unsuitable for regulated GCC workloads.
Get Started for Free
We would be happy to speak with you and arrange a free consultation with our Kubernetes Expert in Dubai, UAE. 30-minute call, actionable results in days.
Talk to an Expert