April 22, 2026 · 10 min read

Deploy Dify on Kubernetes: Self-Hosted AI Application Platform Guide (2026)

Self-host Dify on Kubernetes in production: API, worker, web, and sandbox components, Postgres and Weaviate dependencies, secure sandbox isolation with gVisor, SSRF hardening, SSO, and GCC-sovereign deployment for agent and RAG workloads.

Deploy Dify on Kubernetes: Self-Hosted AI Application Platform Guide (2026)

Dify has become the go-to self-hosted AI application platform for teams that want LangChain-level flexibility with a product on top. It’s adopted heavily in China and Southeast Asia, and increasingly in the Middle East for teams building Arabic-first AI products. This guide covers deploying Dify on Kubernetes the way we deploy it for clients - with every component replaced by a production-grade alternative and the sandbox tier locked down.

Architecture

Dify in production is a cluster of stateless services plus heavy stateful dependencies:

                     ┌──────────────┐
    Browser UI  ───▶ │   Dify Web   │  Next.js frontend
                     │  (stateless) │
                     └──────┬───────┘
                            │ REST
                            ▼
                     ┌──────────────┐
                     │   Dify API   │  Main application
                     │  (FastAPI)   │
                     └──────┬───────┘
                     async  │  sync
                      tasks │  queries
         ┌──────────────────┼──────────────────┐
         ▼                  ▼                  ▼
  ┌──────────┐       ┌──────────┐       ┌──────────┐
  │  Worker  │       │ Sandbox  │       │  Plugin  │
  │ (Celery) │       │(isolated │       │  Daemon  │
  │          │       │ runtime) │       │          │
  └────┬─────┘       └─────┬────┘       └─────┬────┘
       │                   │                  │
       ▼                   ▼                  ▼
┌─────────────────────────────────────────────────┐
│           SSRF Proxy (Squid)                     │
│  Filters outbound calls from user workflows      │
└──────────────────────┬──────────────────────────┘
                       ▼
              External provider APIs
                  (via LiteLLM)

         ┌──────────┬──────────┬──────────┬──────────┐
         ▼          ▼          ▼          ▼          ▼
    ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
    │Postgres│ │ Redis  │ │Weaviate│ │S3/Blob │ │Plugin  │
    │ (app)  │ │(queue, │ │(vectors│ │(files) │ │Storage │
    │        │ │ cache) │ │ by dflt│ │        │ │(RWX)   │
    └────────┘ └────────┘ └────────┘ └────────┘ └────────┘

Invariants to internalize:

  • Web, API, Worker are stateless - scale horizontally.
  • Sandbox is isolated - runs user code; needs its own node pool and runtime class.
  • Plugin daemon requires RWX storage for installed plugin artifacts (EFS, Azure Files, or a filestore-class PVC).
  • SSRF proxy is the only egress path from sandbox and worker to external URLs fetched by user code.
  • Postgres, Redis, Weaviate are first-class dependencies - not in-chart subcharts.

Prerequisites

kubectl version --client    # 1.28+
helm version                # 3.14+

Cluster add-ons:

  • cert-manager, ingress-nginx, external-secrets-operator, prometheus-operator
  • A RuntimeClass for the sandbox pool. We use gVisor (runsc). Kata Containers is the alternative on bare metal.
  • A ReadWriteMany storage class for the plugin daemon (EFS, Azure Files, Filestore, or Longhorn)

External dependencies provisioned in advance:

  • Postgres 14+ with databases dify and dify_plugin
  • Redis 6+ (primary/replica with Sentinel)
  • Weaviate cluster, or Qdrant if you prefer - see our Qdrant guide
  • S3-compatible bucket for file uploads

Namespace and isolation

Dify splits cleanly into two trust zones: control plane (web, API, worker) and execution plane (sandbox). Put them in separate namespaces and pin the sandbox to a dedicated node pool.

apiVersion: v1
kind: Namespace
metadata:
  name: dify
  labels:
    pod-security.kubernetes.io/enforce: restricted
---
apiVersion: v1
kind: Namespace
metadata:
  name: dify-sandbox
  labels:
    pod-security.kubernetes.io/enforce: restricted
    nomadx.io/isolation-tier: untrusted

Sandbox node pool setup (EKS example):

# A Karpenter NodePool for sandbox workloads
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: dify-sandbox
spec:
  template:
    metadata:
      labels:
        nomadx.io/workload: dify-sandbox
    spec:
      taints:
        - key: nomadx.io/workload
          value: dify-sandbox
          effect: NoSchedule
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: [on-demand]
        - key: kubernetes.io/arch
          operator: In
          values: [amd64]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: [m6i.large, m6i.xlarge]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: sandbox-class

These nodes exclusively run sandbox pods, so compromise of a sandboxed workflow can’t escalate to the main Dify API pods.

Install the gVisor runtime on the sandbox nodes via a DaemonSet or custom AMI, then register the runtime class:

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc
scheduling:
  nodeSelector:
    nomadx.io/workload: dify-sandbox

Helm install: production values

helm repo add dify https://langgenius.github.io/dify-helm
helm repo update

Production values.yaml:

# values.prod.yaml
image:
  api:
    repository: langgenius/dify-api
    tag: "1.1.0"
  web:
    repository: langgenius/dify-web
    tag: "1.1.0"
  worker:
    repository: langgenius/dify-api
    tag: "1.1.0"
  sandbox:
    repository: langgenius/dify-sandbox
    tag: "0.2.10"

# Stateless components
api:
  replicas: 3
  resources:
    requests: {cpu: 500m, memory: 1Gi}
    limits: {memory: 4Gi}
  autoscaling:
    enabled: true
    minReplicas: 3
    maxReplicas: 10
    targetCPUUtilizationPercentage: 70
  podDisruptionBudget:
    enabled: true
    minAvailable: 2

web:
  replicas: 3
  resources:
    requests: {cpu: 250m, memory: 512Mi}
    limits: {memory: 1Gi}
  podDisruptionBudget:
    enabled: true
    minAvailable: 2

worker:
  replicas: 2
  resources:
    requests: {cpu: 500m, memory: 1Gi}
    limits: {memory: 4Gi}
  # Scale on Celery queue depth via KEDA (defined separately)
  podDisruptionBudget:
    enabled: true
    minAvailable: 1

# Sandbox isolation
sandbox:
  namespace: dify-sandbox
  replicas: 2
  runtimeClassName: gvisor
  nodeSelector:
    nomadx.io/workload: dify-sandbox
  tolerations:
    - key: nomadx.io/workload
      operator: Equal
      value: dify-sandbox
      effect: NoSchedule
  resources:
    requests: {cpu: 250m, memory: 512Mi}
    limits: {memory: 1Gi}
  securityContext:
    runAsNonRoot: true
    runAsUser: 65534
    readOnlyRootFilesystem: true
    allowPrivilegeEscalation: false
    capabilities:
      drop: [ALL]
  env:
    - name: GIN_MODE
      value: "release"
    - name: SANDBOX_PORT
      value: "8194"
    - name: API_KEY
      valueFrom:
        secretKeyRef:
          name: dify-sandbox-key
          key: api-key

# SSRF proxy - the only egress path for sandbox/worker
ssrfProxy:
  enabled: true
  replicas: 2
  resources:
    requests: {cpu: 100m, memory: 128Mi}
    limits: {memory: 256Mi}

# Plugin daemon (Dify 1.0+)
pluginDaemon:
  enabled: true
  replicas: 2
  persistence:
    storageClassName: efs-sc        # must be RWX
    accessModes: [ReadWriteMany]
    size: 100Gi

# External dependencies
postgresql:
  embedded: false
  external:
    host: dify-pg-rw.data.svc.cluster.local
    port: 5432
    database: dify
    username: dify
    existingSecret: dify-pg-creds
    existingSecretKey: password

redis:
  embedded: false
  external:
    host: dify-redis-master.data.svc.cluster.local
    port: 6379
    existingSecret: dify-redis-creds
    existingSecretKey: password

vectorStore:
  type: weaviate
  weaviate:
    endpoint: http://weaviate.vectordb.svc.cluster.local
    existingSecret: dify-weaviate-creds
    existingSecretKey: api-key

storage:
  type: s3
  s3:
    bucket: dify-prod-me-central-1
    region: me-central-1
    endpoint: https://s3.me-central-1.amazonaws.com
    accessKeyId:
      existingSecret: dify-s3-creds
      key: access-key-id
    secretAccessKey:
      existingSecret: dify-s3-creds
      key: secret-access-key

# Global envs
env:
  - name: CONSOLE_WEB_URL
    value: "https://dify.example.ae"
  - name: APP_WEB_URL
    value: "https://dify.example.ae"
  - name: SERVICE_API_URL
    value: "https://dify.example.ae/api"
  - name: SECRET_KEY
    valueFrom:
      secretKeyRef:
        name: dify-secret-key
        key: secret-key
  - name: INIT_PASSWORD
    valueFrom:
      secretKeyRef:
        name: dify-init
        key: password
  # Point Dify at LiteLLM instead of direct provider keys
  - name: OPENAI_API_BASE
    value: "http://litellm.llm-gateway.svc.cluster.local:4000"
  - name: OPENAI_API_KEY
    valueFrom:
      secretKeyRef:
        name: litellm-virtual-key
        key: dify-key

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/proxy-body-size: "100m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
  hosts:
    - host: dify.example.ae
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: dify-tls
      hosts: [dify.example.ae]

Install:

helm upgrade --install dify dify/dify \
  --namespace dify \
  --values values.prod.yaml \
  --version 0.23.0 \
  --wait --timeout 15m

Sandbox hardening: the part everyone gets wrong

The sandbox component runs user-authored code. Dify’s default configuration is reasonable; here’s what must be in place before you expose Dify to untrusted users:

  1. RuntimeClass: gVisor (runsc) - syscalls are intercepted by user-space kernel.
  2. Dedicated node pool with taints - sandbox pods can’t land on control-plane nodes.
  3. NetworkPolicy default-deny with allow only to SSRF proxy and kube-dns:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: sandbox-default-deny
  namespace: dify-sandbox
spec:
  podSelector: {}
  policyTypes: [Ingress, Egress]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: sandbox-allow-egress
  namespace: dify-sandbox
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: dify-sandbox
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: dify
          podSelector:
            matchLabels:
              app.kubernetes.io/name: ssrf-proxy
      ports:
        - protocol: TCP
          port: 3128
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: dify
          podSelector:
            matchLabels:
              app.kubernetes.io/component: worker
      ports:
        - protocol: TCP
          port: 8194
  1. Pod security: non-root, read-only root fs, drop all caps, no privilege escalation (already set in the values above).
  2. No mounted ServiceAccount token: set automountServiceAccountToken: false on the sandbox ServiceAccount.
  3. Resource limits: memory limit == request to prevent noisy-neighbor DoS.

This is not optional if you let users write their own Python nodes.

SSRF proxy hardening

The SSRF proxy (Squid) blocks user-controlled workflow nodes from hitting internal services. The default Dify Squid config is permissive - tighten it:

# /etc/squid/squid.conf overrides mounted via ConfigMap

# Block private and link-local ranges explicitly
acl blocked_subnets dst 10.0.0.0/8
acl blocked_subnets dst 172.16.0.0/12
acl blocked_subnets dst 192.168.0.0/16
acl blocked_subnets dst 169.254.0.0/16
acl blocked_subnets dst 127.0.0.0/8
acl blocked_subnets dst 100.64.0.0/10
acl blocked_subnets dst fc00::/7
acl blocked_subnets dst fe80::/10
acl blocked_subnets dst ::1/128

# Block cloud metadata endpoints
acl metadata_ips dst 169.254.169.254
acl metadata_ips dst fd00:ec2::254

http_access deny blocked_subnets
http_access deny metadata_ips

# Explicit allowlist of provider hostnames via dstdomain
acl allowed_hosts dstdomain .openai.azure.com
acl allowed_hosts dstdomain .amazonaws.com
acl allowed_hosts dstdomain .anthropic.com
acl allowed_hosts dstdomain .googleapis.com
http_access allow allowed_hosts

http_access deny all

Mount it as a ConfigMap and override the chart’s default:

ssrfProxy:
  configMap:
    name: ssrf-proxy-hardened
    key: squid.conf

Test from inside a sandbox pod:

kubectl exec -n dify-sandbox deploy/dify-sandbox -- \
  curl -x http://ssrf-proxy.dify.svc.cluster.local:3128 \
  http://169.254.169.254/latest/meta-data/ -v
# expected: 403 from Squid

If this returns metadata, you have a production-breaking vulnerability. Don’t launch.

Connecting to the LLM gateway

Point Dify at your LiteLLM proxy instead of inline provider keys (already wired in the values above). In the Dify admin UI, under Settings → Model Provider, add OpenAI as a provider with:

  • API base: http://litellm.llm-gateway.svc.cluster.local:4000
  • API key: the virtual key issued by LiteLLM for Dify’s team
  • Model list: whatever LiteLLM exposes (gpt-4o-uae-primary, claude-sonnet-bedrock-me, etc.)

Benefits:

  • Provider keys live in LiteLLM’s Postgres, not Dify’s
  • Dify’s usage attributes to the dify-team virtual key in LiteLLM’s spend log
  • Provider fallback, rate limits, and caching apply to every Dify workflow
  • You get Langfuse traces from every Dify LLM call via LiteLLM’s callbacks

Observability

Dify exposes Prometheus metrics on :5001/metrics. Worthwhile dashboards:

  • Workflow success rate by app
  • Sandbox execution time and failure rate - anomalies indicate malicious or broken user code
  • Celery queue depth (celery_queue_length) - drives KEDA scaling
  • LLM call latency and cost - already captured by LiteLLM + Langfuse if you wired them
  • Plugin daemon errors - plugin installations failing usually points to RWX storage issues

Use Langfuse for the actual LLM-call tracing; Dify’s internal tracing is less rich.

SSO and multi-tenancy

Dify Enterprise adds SAML/OIDC SSO, multi-workspace management, audit logs, and RBAC roles. For GCC deployments subject to ISO 27001, NESA, or ADGM controls, this is usually required. The community edition supports basic OAuth (GitHub, Google) which isn’t acceptable for enterprise use.

If you need SSO on the community edition, front Dify with oauth2-proxy as an ingress sidecar:

annotations:
  nginx.ingress.kubernetes.io/auth-url: "https://oauth2.example.ae/oauth2/auth"
  nginx.ingress.kubernetes.io/auth-signin: "https://oauth2.example.ae/oauth2/start?rd=$scheme://$host$request_uri"

It’s a partial solution - Dify’s internal user model won’t map to SSO identities cleanly. The Enterprise license is worth the money if compliance matters.

Sizing tiers

TierUsersAPI / Worker / WebSandboxPostgresVector DBEst. monthly cost (AED, EKS me-central-1)
Small<503 / 2 / 3 × small2 × m6i.largedb.t3.mediumWeaviate 3 × r6i.large~20,000
Medium50-5006 / 5 / 6 × medium5 × m6i.largedb.r6g.xlargeWeaviate 3 × r6i.xlarge~55,000
Large500-500020 / 20 / 10 × medium20 × m6i.xlargedb.r6g.2xlargeQdrant 6 × r6i.2xlarge~180,000

LLM token spend is separate and usually dwarfs infra cost.

Common failure modes we’ve debugged

  • Sandbox pods OOMKilled running user code - limits too low, or a user wrote an accidental fork bomb. Set reasonable limits, and run the sandbox node pool with cluster autoscaler ceiling to cap damage.
  • Plugin installs fail sporadically - RWX storage is underprovisioned (EFS burst credits exhausted, Azure Files low tier, Longhorn replica stuck). Move to a higher-performance RWX class.
  • Workflows stuck in “running” - Celery worker is stuck on a slow or frozen task. Default Celery task timeout is too long; set CELERY_TASK_TIME_LIMIT=600 and CELERY_TASK_SOFT_TIME_LIMIT=540.
  • Dify UI can’t reach API after ingress upgrade - the chart uses relative paths; ensure ingress rewrite rules don’t mangle /api/.... Use the CONSOLE_WEB_URL / SERVICE_API_URL env vars to be explicit.
  • “Request blocked by SSRF proxy” errors for legitimate URLs - your hardened Squid config is missing a hostname. Add it to allowed_hosts and reload. This is expected and fine - if the proxy is silent, it’s probably misconfigured.

Where Dify fits in the broader stack

Dify is the application platform for teams building LLM products. Pair it with:

  • LiteLLM as the LLM gateway (guide)
  • Langfuse for trace and evaluation observability (guide)
  • Qdrant if you outgrow the bundled Weaviate (guide)
  • Self-hosted vLLM for local model serving, exposed to Dify via LiteLLM

The full architecture is in our Production RAG Stack reference architecture.

Getting help

We deploy and operate Dify for enterprise AI teams across the GCC who want a self-hosted LLM application platform with sandbox isolation, SSO, and in-region data residency. AI/ML Infrastructure on Kubernetes is the engagement - typical deploy is 3-5 weeks from kickoff including sandbox hardening and integration with existing IdP.

Frequently Asked Questions

What is Dify and who is it for?

Dify is an open-source LLM application development platform combining a visual workflow builder, RAG pipeline, agent runtime, and a model gateway. It targets product teams and AI engineers who want to ship LLM-powered apps faster than building the full stack themselves, while keeping the option to self-host. Typical use cases: internal chatbots, customer-support assistants, RAG-over-documents apps, and multi-step agent workflows. Compared to LangChain or LlamaIndex, Dify is an end-to-end product; compared to LangFlow or Flowise, it's more feature-complete and production-oriented.

Is Dify production-ready on Kubernetes?

Yes, with caveats. The community Helm chart (langgenius/dify-helm) and docker-compose manifests work in production, but you need to replace the bundled Postgres/Redis/Weaviate with externally managed HA instances, configure the sandbox component correctly, and harden the SSRF proxy. Dify's enterprise edition adds SSO, multi-workspace management, and audit logging - worth it for organizations with compliance requirements.

What is the Dify sandbox and why does it matter?

The Dify sandbox runs user-authored Python and JavaScript code that's part of workflow nodes - e.g., data-transformation steps in an agent. Because it executes potentially arbitrary code, it must be sandboxed. In production on Kubernetes, run sandbox pods on a dedicated node pool with gVisor or Kata Containers runtime class, strict NetworkPolicy egress deny (only allow the SSRF proxy and DNS), non-root UID, read-only root filesystem, and no mounted secrets. This is the single most security-sensitive component in the stack.

What dependencies does Dify need on Kubernetes?

Dify requires Postgres 14+ for application data, Redis 6+ for queue and cache, a vector database (Weaviate by default, but Qdrant, Milvus, pgvector, and Chroma are supported), and S3-compatible object storage for uploaded files and generated artifacts. The plugin daemon in Dify 1.0+ also needs persistent storage for installed plugins. Treat each dependency as a first-class production service with its own HA setup - don't use the in-chart subcharts in production.

How do I connect Dify to provider LLMs in a GCC-sovereign deployment?

Dify has a model configuration UI where you add provider endpoints. For UAE-sovereign deployments, configure Azure OpenAI UAE North or Bedrock Middle East as the primary providers. Better: point Dify at a LiteLLM proxy running in the same cluster - the proxy handles provider routing, virtual keys, and fallbacks, while Dify sees only one OpenAI-compatible endpoint. This keeps provider credentials out of Dify's configuration database.

Can Dify and the RAG/observability stack share Kubernetes infrastructure?

Yes, and they should. A common pattern: Dify runs as the user-facing application platform; Qdrant serves as its vector DB; LiteLLM fronts provider calls and enforces budgets; Langfuse captures traces via Dify's webhook integration. All of them run on the same cluster under separate namespaces with explicit NetworkPolicy allow rules. This is the reference architecture we deploy for teams using Dify as their internal AI platform.

Get Started for Free

We would be happy to speak with you and arrange a free consultation with our Kubernetes Expert in Dubai, UAE. 30-minute call, actionable results in days.

Talk to an Expert