LangSmith Self-Hosted — Cluster Sizing Calculator

Cluster resource requirements by use-case tier · Based on Helm chart v0.13+ defaults

Use-Case Tier
LangSmith Traces
Core tracing, projects, runs, evaluation
LangSmith
Traces + LSD
Adds LangSmith Deployments (LangGraph Platform)
LangSmithLSD
Traces + LSD + Agent Builder
Adds agent tool server, trigger server, and agent runner
LangSmithLSDAgent Builder
Traces + LSD + AB + Insights
Full stack — adds Insights (Clio) and Polly AI agents
LangSmithLSDAgent BuilderInsights
Load Profile
Concurrent users (reads)10
1Low≤5Med≤20High 50
Traces / second (writes)50
1Low≤10Med≤100High 1k
Reads: MediumWrites: Medium
PostgreSQL
Internal StatefulSet
Not recommended for production
External managed
RDS / Cloud SQL — recommended
ClickHouse
In-cluster — single instance
OK for low/medium read load
In-cluster — 3-node replicated
Required for high read load
External / ClickHouse Cloud
No cluster resources consumed
Redis (LangSmith)
Internal StatefulSet
Default 8 GiB — fine for low writes
External managed
Required for high writes (≥26 GiB)
Trace Retention (TTL)
~60 GiB/day (med writes), ~600 GiB/day (high writes) per TTL day in ClickHouse.
Options
Blob storage (S3 / GCS / Blob)
Required at high write scale
KEDA autoscaling
Queue-based scaling for ingest/queue services

Cluster Resource Requirements

MediumMedium

Profile: Med/Med · 20% scheduling overhead included

LangSmith
Internal PostgreSQL not recommended for production — use a managed external instance with disk autoexpansion.
KEDA enabled: replica counts shown are max replicas. Average utilization will be lower due to autoscaling.
Total vCPU needed
60 cores
services + in-cluster DBs
Total RAM needed
126 GiB
services + in-cluster DBs
Est. worker nodes
8 nodes
8 vCPU / 32 GiB each
ClickHouse storage
420 GiB
14d TTL

Service Replicas

32 total pods
ServiceReplicasCPU reqRAM req
backend1616.032.0 Gi
platform-backend33.06.0 Gi
ingest-queue66.012.0 Gi
frontend21.02.0 Gi
ace-backend10.21.0 Gi
listener11.02.0 Gi
operator11.02.0 Gi
playground10.51.0 Gi
queue11.02.0 Gi
Services total29.760.0 Gi

Databases

in-cluster only
DatabaseModeCPU reqRAM req
PostgreSQLinternal28 Gi
ClickHousesingle1624 Gi
Redis (LS)internal213 Gi
In-cluster DB total20.045.0 Gi

Storage

ClickHouse
420 GiB
14d TTL
PostgreSQL
10 GiB
+ enable autoexpansion
Redis
13 GiB
in-memory cache
ClickHouse requires SSD with 7,000+ IOPS and 1,000 MiB/s throughput.

Node Pool Guidance

Application node pool
8× — 8 vCPU / 32 GiB RAM
m5.2xlargen2-standard-8Standard_D8s_v3
ClickHouse node pool
1× — 16 vCPU / 24 GiB RAM / 420 GiB SSD
Dedicate to its own node pool or use node affinity to prevent resource contention.
values.yaml snippet
# ── Feature flags ──────────────────────────
config:
  blobStorage:
    enabled: true

# ── Scaled services ─────────────────────────
frontend:
  deployment:
    replicas: 2

platformBackend:
  deployment:
    replicas: 3

ingestQueue:
  deployment:
    replicas: 6
  autoscaling:
    keda:
      enabled: true
      minReplicaCount: 2
      maxReplicaCount: 6

backend:
  deployment:
    replicas: 16

# ── Redis ───────────────────────────────────
redis:
  statefulSet:
    resources:
      requests:
        memory: 13Gi
      limits:
        memory: 13Gi

# ── ClickHouse ──────────────────────────────
clickhouse:
  statefulSet:
    persistence:
      size: 420Gi
    resources:
      requests:
        cpu: "16"
        memory: "24Gi"
      limits:
        cpu: "28"
        memory: "36Gi"