Deploy Dify on Kubernetes: Self-Hosted AI Application Platform Guide (2026)
Self-host Dify on Kubernetes in production: API, worker, web, and sandbox components, Postgres and Weaviate dependencies, secure sandbox isolation with gVisor, SSRF hardening, SSO, and GCC-sovereign deployment for agent and RAG workloads.
Dify has become the go-to self-hosted AI application platform for teams that want LangChain-level flexibility with a product on top. It’s adopted heavily in China and Southeast Asia, and increasingly in the Middle East for teams building Arabic-first AI products. This guide covers deploying Dify on Kubernetes the way we deploy it for clients - with every component replaced by a production-grade alternative and the sandbox tier locked down.
Architecture
Dify in production is a cluster of stateless services plus heavy stateful dependencies:
┌──────────────┐
Browser UI ───▶ │ Dify Web │ Next.js frontend
│ (stateless) │
└──────┬───────┘
│ REST
▼
┌──────────────┐
│ Dify API │ Main application
│ (FastAPI) │
└──────┬───────┘
async │ sync
tasks │ queries
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Worker │ │ Sandbox │ │ Plugin │
│ (Celery) │ │(isolated │ │ Daemon │
│ │ │ runtime) │ │ │
└────┬─────┘ └─────┬────┘ └─────┬────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────┐
│ SSRF Proxy (Squid) │
│ Filters outbound calls from user workflows │
└──────────────────────┬──────────────────────────┘
▼
External provider APIs
(via LiteLLM)
┌──────────┬──────────┬──────────┬──────────┐
▼ ▼ ▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Postgres│ │ Redis │ │Weaviate│ │S3/Blob │ │Plugin │
│ (app) │ │(queue, │ │(vectors│ │(files) │ │Storage │
│ │ │ cache) │ │ by dflt│ │ │ │(RWX) │
└────────┘ └────────┘ └────────┘ └────────┘ └────────┘
Invariants to internalize:
- Web, API, Worker are stateless - scale horizontally.
- Sandbox is isolated - runs user code; needs its own node pool and runtime class.
- Plugin daemon requires RWX storage for installed plugin artifacts (EFS, Azure Files, or a filestore-class PVC).
- SSRF proxy is the only egress path from sandbox and worker to external URLs fetched by user code.
- Postgres, Redis, Weaviate are first-class dependencies - not in-chart subcharts.
Prerequisites
kubectl version --client # 1.28+
helm version # 3.14+
Cluster add-ons:
- cert-manager, ingress-nginx, external-secrets-operator, prometheus-operator
- A RuntimeClass for the sandbox pool. We use gVisor (
runsc). Kata Containers is the alternative on bare metal. - A ReadWriteMany storage class for the plugin daemon (EFS, Azure Files, Filestore, or Longhorn)
External dependencies provisioned in advance:
- Postgres 14+ with databases
difyanddify_plugin - Redis 6+ (primary/replica with Sentinel)
- Weaviate cluster, or Qdrant if you prefer - see our Qdrant guide
- S3-compatible bucket for file uploads
Namespace and isolation
Dify splits cleanly into two trust zones: control plane (web, API, worker) and execution plane (sandbox). Put them in separate namespaces and pin the sandbox to a dedicated node pool.
apiVersion: v1
kind: Namespace
metadata:
name: dify
labels:
pod-security.kubernetes.io/enforce: restricted
---
apiVersion: v1
kind: Namespace
metadata:
name: dify-sandbox
labels:
pod-security.kubernetes.io/enforce: restricted
nomadx.io/isolation-tier: untrusted
Sandbox node pool setup (EKS example):
# A Karpenter NodePool for sandbox workloads
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: dify-sandbox
spec:
template:
metadata:
labels:
nomadx.io/workload: dify-sandbox
spec:
taints:
- key: nomadx.io/workload
value: dify-sandbox
effect: NoSchedule
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: [on-demand]
- key: kubernetes.io/arch
operator: In
values: [amd64]
- key: node.kubernetes.io/instance-type
operator: In
values: [m6i.large, m6i.xlarge]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: sandbox-class
These nodes exclusively run sandbox pods, so compromise of a sandboxed workflow can’t escalate to the main Dify API pods.
Install the gVisor runtime on the sandbox nodes via a DaemonSet or custom AMI, then register the runtime class:
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc
scheduling:
nodeSelector:
nomadx.io/workload: dify-sandbox
Helm install: production values
helm repo add dify https://langgenius.github.io/dify-helm
helm repo update
Production values.yaml:
# values.prod.yaml
image:
api:
repository: langgenius/dify-api
tag: "1.1.0"
web:
repository: langgenius/dify-web
tag: "1.1.0"
worker:
repository: langgenius/dify-api
tag: "1.1.0"
sandbox:
repository: langgenius/dify-sandbox
tag: "0.2.10"
# Stateless components
api:
replicas: 3
resources:
requests: {cpu: 500m, memory: 1Gi}
limits: {memory: 4Gi}
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
podDisruptionBudget:
enabled: true
minAvailable: 2
web:
replicas: 3
resources:
requests: {cpu: 250m, memory: 512Mi}
limits: {memory: 1Gi}
podDisruptionBudget:
enabled: true
minAvailable: 2
worker:
replicas: 2
resources:
requests: {cpu: 500m, memory: 1Gi}
limits: {memory: 4Gi}
# Scale on Celery queue depth via KEDA (defined separately)
podDisruptionBudget:
enabled: true
minAvailable: 1
# Sandbox isolation
sandbox:
namespace: dify-sandbox
replicas: 2
runtimeClassName: gvisor
nodeSelector:
nomadx.io/workload: dify-sandbox
tolerations:
- key: nomadx.io/workload
operator: Equal
value: dify-sandbox
effect: NoSchedule
resources:
requests: {cpu: 250m, memory: 512Mi}
limits: {memory: 1Gi}
securityContext:
runAsNonRoot: true
runAsUser: 65534
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: [ALL]
env:
- name: GIN_MODE
value: "release"
- name: SANDBOX_PORT
value: "8194"
- name: API_KEY
valueFrom:
secretKeyRef:
name: dify-sandbox-key
key: api-key
# SSRF proxy - the only egress path for sandbox/worker
ssrfProxy:
enabled: true
replicas: 2
resources:
requests: {cpu: 100m, memory: 128Mi}
limits: {memory: 256Mi}
# Plugin daemon (Dify 1.0+)
pluginDaemon:
enabled: true
replicas: 2
persistence:
storageClassName: efs-sc # must be RWX
accessModes: [ReadWriteMany]
size: 100Gi
# External dependencies
postgresql:
embedded: false
external:
host: dify-pg-rw.data.svc.cluster.local
port: 5432
database: dify
username: dify
existingSecret: dify-pg-creds
existingSecretKey: password
redis:
embedded: false
external:
host: dify-redis-master.data.svc.cluster.local
port: 6379
existingSecret: dify-redis-creds
existingSecretKey: password
vectorStore:
type: weaviate
weaviate:
endpoint: http://weaviate.vectordb.svc.cluster.local
existingSecret: dify-weaviate-creds
existingSecretKey: api-key
storage:
type: s3
s3:
bucket: dify-prod-me-central-1
region: me-central-1
endpoint: https://s3.me-central-1.amazonaws.com
accessKeyId:
existingSecret: dify-s3-creds
key: access-key-id
secretAccessKey:
existingSecret: dify-s3-creds
key: secret-access-key
# Global envs
env:
- name: CONSOLE_WEB_URL
value: "https://dify.example.ae"
- name: APP_WEB_URL
value: "https://dify.example.ae"
- name: SERVICE_API_URL
value: "https://dify.example.ae/api"
- name: SECRET_KEY
valueFrom:
secretKeyRef:
name: dify-secret-key
key: secret-key
- name: INIT_PASSWORD
valueFrom:
secretKeyRef:
name: dify-init
key: password
# Point Dify at LiteLLM instead of direct provider keys
- name: OPENAI_API_BASE
value: "http://litellm.llm-gateway.svc.cluster.local:4000"
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: litellm-virtual-key
key: dify-key
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
hosts:
- host: dify.example.ae
paths:
- path: /
pathType: Prefix
tls:
- secretName: dify-tls
hosts: [dify.example.ae]
Install:
helm upgrade --install dify dify/dify \
--namespace dify \
--values values.prod.yaml \
--version 0.23.0 \
--wait --timeout 15m
Sandbox hardening: the part everyone gets wrong
The sandbox component runs user-authored code. Dify’s default configuration is reasonable; here’s what must be in place before you expose Dify to untrusted users:
- RuntimeClass: gVisor (runsc) - syscalls are intercepted by user-space kernel.
- Dedicated node pool with taints - sandbox pods can’t land on control-plane nodes.
- NetworkPolicy default-deny with allow only to SSRF proxy and kube-dns:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: sandbox-default-deny
namespace: dify-sandbox
spec:
podSelector: {}
policyTypes: [Ingress, Egress]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: sandbox-allow-egress
namespace: dify-sandbox
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: dify-sandbox
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: dify
podSelector:
matchLabels:
app.kubernetes.io/name: ssrf-proxy
ports:
- protocol: TCP
port: 3128
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: dify
podSelector:
matchLabels:
app.kubernetes.io/component: worker
ports:
- protocol: TCP
port: 8194
- Pod security: non-root, read-only root fs, drop all caps, no privilege escalation (already set in the values above).
- No mounted ServiceAccount token: set
automountServiceAccountToken: falseon the sandbox ServiceAccount. - Resource limits: memory limit == request to prevent noisy-neighbor DoS.
This is not optional if you let users write their own Python nodes.
SSRF proxy hardening
The SSRF proxy (Squid) blocks user-controlled workflow nodes from hitting internal services. The default Dify Squid config is permissive - tighten it:
# /etc/squid/squid.conf overrides mounted via ConfigMap
# Block private and link-local ranges explicitly
acl blocked_subnets dst 10.0.0.0/8
acl blocked_subnets dst 172.16.0.0/12
acl blocked_subnets dst 192.168.0.0/16
acl blocked_subnets dst 169.254.0.0/16
acl blocked_subnets dst 127.0.0.0/8
acl blocked_subnets dst 100.64.0.0/10
acl blocked_subnets dst fc00::/7
acl blocked_subnets dst fe80::/10
acl blocked_subnets dst ::1/128
# Block cloud metadata endpoints
acl metadata_ips dst 169.254.169.254
acl metadata_ips dst fd00:ec2::254
http_access deny blocked_subnets
http_access deny metadata_ips
# Explicit allowlist of provider hostnames via dstdomain
acl allowed_hosts dstdomain .openai.azure.com
acl allowed_hosts dstdomain .amazonaws.com
acl allowed_hosts dstdomain .anthropic.com
acl allowed_hosts dstdomain .googleapis.com
http_access allow allowed_hosts
http_access deny all
Mount it as a ConfigMap and override the chart’s default:
ssrfProxy:
configMap:
name: ssrf-proxy-hardened
key: squid.conf
Test from inside a sandbox pod:
kubectl exec -n dify-sandbox deploy/dify-sandbox -- \
curl -x http://ssrf-proxy.dify.svc.cluster.local:3128 \
http://169.254.169.254/latest/meta-data/ -v
# expected: 403 from Squid
If this returns metadata, you have a production-breaking vulnerability. Don’t launch.
Connecting to the LLM gateway
Point Dify at your LiteLLM proxy instead of inline provider keys (already wired in the values above). In the Dify admin UI, under Settings → Model Provider, add OpenAI as a provider with:
- API base:
http://litellm.llm-gateway.svc.cluster.local:4000 - API key: the virtual key issued by LiteLLM for Dify’s team
- Model list: whatever LiteLLM exposes (
gpt-4o-uae-primary,claude-sonnet-bedrock-me, etc.)
Benefits:
- Provider keys live in LiteLLM’s Postgres, not Dify’s
- Dify’s usage attributes to the
dify-teamvirtual key in LiteLLM’s spend log - Provider fallback, rate limits, and caching apply to every Dify workflow
- You get Langfuse traces from every Dify LLM call via LiteLLM’s callbacks
Observability
Dify exposes Prometheus metrics on :5001/metrics. Worthwhile dashboards:
- Workflow success rate by app
- Sandbox execution time and failure rate - anomalies indicate malicious or broken user code
- Celery queue depth (
celery_queue_length) - drives KEDA scaling - LLM call latency and cost - already captured by LiteLLM + Langfuse if you wired them
- Plugin daemon errors - plugin installations failing usually points to RWX storage issues
Use Langfuse for the actual LLM-call tracing; Dify’s internal tracing is less rich.
SSO and multi-tenancy
Dify Enterprise adds SAML/OIDC SSO, multi-workspace management, audit logs, and RBAC roles. For GCC deployments subject to ISO 27001, NESA, or ADGM controls, this is usually required. The community edition supports basic OAuth (GitHub, Google) which isn’t acceptable for enterprise use.
If you need SSO on the community edition, front Dify with oauth2-proxy as an ingress sidecar:
annotations:
nginx.ingress.kubernetes.io/auth-url: "https://oauth2.example.ae/oauth2/auth"
nginx.ingress.kubernetes.io/auth-signin: "https://oauth2.example.ae/oauth2/start?rd=$scheme://$host$request_uri"
It’s a partial solution - Dify’s internal user model won’t map to SSO identities cleanly. The Enterprise license is worth the money if compliance matters.
Sizing tiers
| Tier | Users | API / Worker / Web | Sandbox | Postgres | Vector DB | Est. monthly cost (AED, EKS me-central-1) |
|---|---|---|---|---|---|---|
| Small | <50 | 3 / 2 / 3 × small | 2 × m6i.large | db.t3.medium | Weaviate 3 × r6i.large | ~20,000 |
| Medium | 50-500 | 6 / 5 / 6 × medium | 5 × m6i.large | db.r6g.xlarge | Weaviate 3 × r6i.xlarge | ~55,000 |
| Large | 500-5000 | 20 / 20 / 10 × medium | 20 × m6i.xlarge | db.r6g.2xlarge | Qdrant 6 × r6i.2xlarge | ~180,000 |
LLM token spend is separate and usually dwarfs infra cost.
Common failure modes we’ve debugged
- Sandbox pods OOMKilled running user code - limits too low, or a user wrote an accidental fork bomb. Set reasonable limits, and run the sandbox node pool with cluster autoscaler ceiling to cap damage.
- Plugin installs fail sporadically - RWX storage is underprovisioned (EFS burst credits exhausted, Azure Files low tier, Longhorn replica stuck). Move to a higher-performance RWX class.
- Workflows stuck in “running” - Celery worker is stuck on a slow or frozen task. Default Celery task timeout is too long; set
CELERY_TASK_TIME_LIMIT=600andCELERY_TASK_SOFT_TIME_LIMIT=540. - Dify UI can’t reach API after ingress upgrade - the chart uses relative paths; ensure ingress rewrite rules don’t mangle
/api/.... Use theCONSOLE_WEB_URL/SERVICE_API_URLenv vars to be explicit. - “Request blocked by SSRF proxy” errors for legitimate URLs - your hardened Squid config is missing a hostname. Add it to
allowed_hostsand reload. This is expected and fine - if the proxy is silent, it’s probably misconfigured.
Where Dify fits in the broader stack
Dify is the application platform for teams building LLM products. Pair it with:
- LiteLLM as the LLM gateway (guide)
- Langfuse for trace and evaluation observability (guide)
- Qdrant if you outgrow the bundled Weaviate (guide)
- Self-hosted vLLM for local model serving, exposed to Dify via LiteLLM
The full architecture is in our Production RAG Stack reference architecture.
Getting help
We deploy and operate Dify for enterprise AI teams across the GCC who want a self-hosted LLM application platform with sandbox isolation, SSO, and in-region data residency. AI/ML Infrastructure on Kubernetes is the engagement - typical deploy is 3-5 weeks from kickoff including sandbox hardening and integration with existing IdP.
Frequently Asked Questions
What is Dify and who is it for?
Dify is an open-source LLM application development platform combining a visual workflow builder, RAG pipeline, agent runtime, and a model gateway. It targets product teams and AI engineers who want to ship LLM-powered apps faster than building the full stack themselves, while keeping the option to self-host. Typical use cases: internal chatbots, customer-support assistants, RAG-over-documents apps, and multi-step agent workflows. Compared to LangChain or LlamaIndex, Dify is an end-to-end product; compared to LangFlow or Flowise, it's more feature-complete and production-oriented.
Is Dify production-ready on Kubernetes?
Yes, with caveats. The community Helm chart (langgenius/dify-helm) and docker-compose manifests work in production, but you need to replace the bundled Postgres/Redis/Weaviate with externally managed HA instances, configure the sandbox component correctly, and harden the SSRF proxy. Dify's enterprise edition adds SSO, multi-workspace management, and audit logging - worth it for organizations with compliance requirements.
What is the Dify sandbox and why does it matter?
The Dify sandbox runs user-authored Python and JavaScript code that's part of workflow nodes - e.g., data-transformation steps in an agent. Because it executes potentially arbitrary code, it must be sandboxed. In production on Kubernetes, run sandbox pods on a dedicated node pool with gVisor or Kata Containers runtime class, strict NetworkPolicy egress deny (only allow the SSRF proxy and DNS), non-root UID, read-only root filesystem, and no mounted secrets. This is the single most security-sensitive component in the stack.
What dependencies does Dify need on Kubernetes?
Dify requires Postgres 14+ for application data, Redis 6+ for queue and cache, a vector database (Weaviate by default, but Qdrant, Milvus, pgvector, and Chroma are supported), and S3-compatible object storage for uploaded files and generated artifacts. The plugin daemon in Dify 1.0+ also needs persistent storage for installed plugins. Treat each dependency as a first-class production service with its own HA setup - don't use the in-chart subcharts in production.
How do I connect Dify to provider LLMs in a GCC-sovereign deployment?
Dify has a model configuration UI where you add provider endpoints. For UAE-sovereign deployments, configure Azure OpenAI UAE North or Bedrock Middle East as the primary providers. Better: point Dify at a LiteLLM proxy running in the same cluster - the proxy handles provider routing, virtual keys, and fallbacks, while Dify sees only one OpenAI-compatible endpoint. This keeps provider credentials out of Dify's configuration database.
Can Dify and the RAG/observability stack share Kubernetes infrastructure?
Yes, and they should. A common pattern: Dify runs as the user-facing application platform; Qdrant serves as its vector DB; LiteLLM fronts provider calls and enforces budgets; Langfuse captures traces via Dify's webhook integration. All of them run on the same cluster under separate namespaces with explicit NetworkPolicy allow rules. This is the reference architecture we deploy for teams using Dify as their internal AI platform.
Get Started for Free
We would be happy to speak with you and arrange a free consultation with our Kubernetes Expert in Dubai, UAE. 30-minute call, actionable results in days.
Talk to an Expert