LangSmith Self-Hosted — Cluster Sizing Calculator

Cluster resource requirements by use-case tier · Based on Helm chart v0.15+ defaults

Use-Case Tier

LangSmith Traces

Core tracing, projects, runs, evaluation

LangSmith

Traces + LSD

Adds LangSmith Deployments (LangGraph Platform)

LangSmithLSD

Traces + LSD + Fleet

Adds Fleet — api-server, tool-server, trigger-server, queue

LangSmithLSDFleet

Traces + LSD + Fleet + Insights

Full stack — adds Insights (Clio) and Polly AI agents

LangSmithLSDFleetInsights

Standalone Agent Server

LangGraph agent server only — no LangSmith control plane required

Agent Server

Load Profile

Concurrent users (reads)10

1Low≤5Med≤20High 50

Traces / second (writes)50

1Low≤10Med≤100High 1k

Reads: MediumWrites: Medium

PostgreSQL

Internal StatefulSet

Not recommended for production

External managed

RDS / Cloud SQL — recommended

ClickHouse

In-cluster — single instance

OK for low/medium read load

In-cluster — 3-node replicated

Required for high read load

External / ClickHouse Cloud

No cluster resources consumed

SmithDB SIZING COMING SOON

S3-native trace database — sizing available in a future update

Redis (LangSmith)

Internal StatefulSet

Default 8 GiB — fine for low writes

External managed

Required for high writes (≥26 GiB)

Trace Retention (TTL)

~60 GiB/day (med writes), ~600 GiB/day (high writes) without blob. With blob enabled, CH stores only references + search tokens (~20% of that).

Options

Blob storage (S3 / GCS / Blob)

Required at high write scale

KEDA autoscaling

Queue-based scaling for ingest/queue services

Cluster Resource Requirements

MediumMedium

Profile: Med/Med · 20% scheduling overhead included

LangSmith

⚠Internal PostgreSQL not recommended for production — use a managed external instance with disk autoexpansion.

ℹKEDA enabled: replica counts shown are max replicas. Average utilization will be lower due to autoscaling.

Total vCPU needed

64 cores

services + in-cluster DBs

Total RAM needed

134 GiB

services + in-cluster DBs

Est. worker nodes

8 nodes

8 vCPU / 32 GiB each

ClickHouse storage

84 GiB

14d TTL

Service Replicas

41 total pods

Service	Replicas	CPU req	RAM req
backend	16	16.0	32.0 Gi
platform-backend	3	1.5	3.0 Gi
ingest-queue	6	6.0	12.0 Gi
listener	6	3.0	6.0 Gi
frontend	2	1.0	2.0 Gi
host-backend	2	1.0	2.0 Gi
ace-backend	1	0.2	1.0 Gi
operator	1	0.5	1.0 Gi
playground	1	0.5	1.0 Gi
queue	3	3.0	6.0 Gi
Services total	—	32.7	66.0 Gi

Databases

in-cluster only

Database	Mode	CPU req	RAM req
PostgreSQL	internal	2	8 Gi
ClickHouse	single	16	24 Gi
Redis (LS)	internal	2	13 Gi
In-cluster DB total	—	20.0	45.0 Gi

Storage

ClickHouse

84 GiB

14d TTL

PostgreSQL

10 GiB

+ enable autoexpansion

Redis

13 GiB

in-memory cache

ClickHouse requires SSD with 7,000+ IOPS and 1,000 MiB/s throughput.

Node Pool Guidance

Application node pool

8× — 8 vCPU / 32 GiB RAM

m5.2xlargen2-standard-8Standard_D8s_v3

ClickHouse node pool

1× — 16 vCPU / 24 GiB RAM / 84 GiB SSD

Dedicate to its own node pool or use node affinity to prevent resource contention.

values.yaml snippet

# ── Feature flags ──────────────────────────
config:
  blobStorage:
    enabled: true

# ── Scaled services ─────────────────────────
frontend:
  deployment:
    replicas: 2

platformBackend:
  deployment:
    replicas: 3

ingestQueue:
  deployment:
    replicas: 6
  autoscaling:
    keda:
      enabled: true
      minReplicaCount: 2
      maxReplicaCount: 6

backend:
  deployment:
    replicas: 16

# ── Redis ───────────────────────────────────
redis:
  statefulSet:
    resources:
      requests:
        memory: 13Gi
      limits:
        memory: 13Gi

# ── ClickHouse / SmithDB ────────────────────
clickhouse:
  statefulSet:
    persistence:
      size: 84Gi
    resources:
      requests:
        cpu: "16"
        memory: "24Gi"
      limits:
        cpu: "28"
        memory: "36Gi"