LangSmithTOPOLOGIES

What runs in the CP and the DP

Canonical list of services that make up the LangSmith control plane, data plane, and supporting infrastructure.

The split between control plane (CP) and data plane (DP) does not change with topology. What changes is which environment each side runs in (LangChain SaaS, BYOC, your cluster, on-prem, air-gapped). The service lists below are the same across every topology.

Three helm charts cover customer installs:

  • langsmith: the control plane (UI, auth, trace pipeline, ClickHouse/blob wiring). Also installs LangGraph Deployments support when colocated.
  • langgraph-dataplane: the data plane (operator + listener + agent servers). Used when the DP runs in a different cluster from the CP.
  • langsmith-auth-proxy: the LLM auth proxy. Optional, separate install.

Review against the actual chart values before treating exact service or key names below as authoritative. Chart releases occasionally rename or split components.

How traces actually flow

Knowing the trace path matters because it determines where ClickHouse and the blob bucket sit, which is the most important factor in trace residency:

DP agent  →  HTTP multipart  →  CP API  →  CP Redis  →  asynq worker  →  ClickHouse + Blob

The DP-side agent only emits HTTP requests. Every storage component for traces (the API endpoint, the ingest queue, the asynq worker, ClickHouse, and the blob bucket) is CP-attached. ClickHouse should always live external to the cluster (managed service) for production; in-cluster ClickHouse is dev/POC only.

This is why "where does the CP run" determines "where do trace payloads land," not "where does the agent run." In Hybrid (SaaS CP, customer DP), agents execute in the customer VPC but traces flow back over HTTPS to the SaaS CP and land in LangChain's ClickHouse. In BYOC CP + on-prem DP, agents run on-prem but traces flow over the private link to ClickHouse in the BYOC cloud account.

Control Plane

The control plane owns user-facing state, platform identity, and the entire trace pipeline. It hosts the UI, handles SSO and API-key auth, manages workspaces/orgs/permissions, ingests and stores traces, and serves the LangGraph Studio debugger.

LangSmith services

ServiceRoleNotes
frontendWeb UIThe smith.langchain.com (or self-hosted equivalent) React app. Static asset bundle plus a small Node server.
platform-backendPlatform APIOrgs, workspaces, members, API keys, SSO/SCIM, billing, retention policy. Issues short-lived tokens for DP requests.
host-backendAuth proxy / fan-outAuthenticates requests from the UI and forwards them to the right deployment. Handles cross-DP routing.
backendTrace ingest + query APIAccepts HTTP multipart trace uploads from the SDK and DP agents. Also serves run, dataset, feedback, and evaluation queries.
ingest-queueAsync trace writer (asynq)Pulls buffered trace events from Redis and writes them to ClickHouse and blob storage.
queueBackground-job workerDataset processing, evaluation runs, exports, retention sweeps.
studioLangGraph debugging UIServed from the CP frontend. Connects out to the DP-side agent server to step through graph executions.
subscriptions-apiBilling / meteringOptional. Tracks usage events for billed plans. Not present in self-hosted deployments.

Backing services (CP)

ServicePurposeNotes
PostgresCP metadata: orgs, workspaces, users, API keys, dataset and prompt metadata.Required.
RedisSession state, rate limits, trace ingest queue (asynq), async-job queue.Required. Heavily used; sized for trace volume, not just sessions.
ClickHouseTrace storage.Required. Production deployments should use LangChain Managed ClickHouse instead of in-cluster.
Object storage (S3 / Blob / GCS)Trace payloads, large attachments, evaluation artifacts.Required. Payloads must not go into ClickHouse; that causes cluster issues.

Data Plane

The data plane is LangGraph Deployments (the agent runtime). It runs customer agent code, holds checkpointer state for in-flight runs, and emits trace events back to the CP. Agents reach the CP for auth (token validation, deployment config) and for trace upload; they reach LLM providers directly (or via an egress proxy).

LangGraph Deployments services

ServiceRoleNotes
operatorLangSmith operatorWatches LangGraphDeployment CRDs and reconciles per-deployment runtimes. Cluster-scoped CRD; chart upgrades must be version-locked across all DPs in the same cluster.
listenerPer-DP control loopOne per data plane. Pulls deployment changes from the CP over outbound HTTPS, applies them locally. This is what makes hybrid topologies work, since the customer never has to expose inbound from the CP.
server (per deployment)Agent runtimeOne Deployment+Service per deployed graph. Customer code runs here.

Optional add-ons (DP)

ServiceRoleNotes
ml-models (Polly)In-app chat assistantContext-aware Q&A over traces and datasets. Runs on LangGraph Deployments.
insightsTrace pattern analysisRuns on LangGraph Deployments.
llm-auth-proxyLLM Auth ProxyEgress component that sits between the DP and external LLM providers. Centralizes API-key management, applies per-workspace allowlists and rate limits, and produces an auditable record of every model call. Useful when the customer wants the data plane workloads to never hold raw provider credentials, or when policy requires per-workspace egress controls. Installed via the dedicated langsmith-auth-proxy chart, not bundled with the main chart. Toggle "Show LLM Auth Proxy in diagram" on the picker to see how it sits in any topology.

Backing services (DP)

ServicePurposeNotes
PostgresLangGraph checkpointer (default). Stores in-flight graph state, threads, and resumable runs.Default checkpointer. One per DP.
MongoDBLangGraph checkpointer (alternative).Added via the LangChain + MongoDB partnership as an alternative to Postgres for teams already running MongoDB at scale. Pick one: Postgres or MongoDB, not both. Package: langgraph-checkpoint-mongodb.
Object storagePer-deployment artifacts (optional).Not always required. Distinct from the CP-side blob bucket that holds trace payloads.

How topology affects this

The component lists above are the same across every topology. What topology changes:

  • Where CP services run. SaaS topologies put the entire CP (including ClickHouse and the blob bucket) in LangChain infrastructure. Self-hosted topologies put it all in the customer cluster (or external managed services attached to it). BYOC puts it in a customer cloud account that LangChain operates.
  • Where DP services run. The agent runtime can run in the same cluster as the CP (single-cluster), in separate per-env or per-namespace DPs (multi-DP topologies), or in a different environment entirely (hybrid: SaaS CP + customer-cluster DP; BYOC CP + on-prem DP).
  • Where trace data lands. Follows the CP. In hybrid topologies, agent code stays in the customer VPC but trace payloads flow over HTTPS to the SaaS CP and land in LangChain's ClickHouse + blob bucket. In BYOC CP + on-prem DP, agents run on-prem but traces flow back over the private link to the BYOC cloud account.

Component lists do not change with cloud or air-gap. Air-gapped deployments add image-mirror infrastructure (registry like Harbor / Artifactory + an image-sync tool like Skopeo, crane, or regclient) and an on-prem LLM gateway, but the LangSmith services running inside the cluster are the same set.

Helm config when CP and DP are not colocated

When the CP and DP run in different clusters or environments (hybrid, cross-cluster, BYOC CP + on-prem DP), you install two charts: langsmith on the CP side, langgraph-dataplane on the DP side.

DP side: langgraph-dataplane chart

Tells the listener and agent servers where the CP lives and how to authenticate to it. Defaults point at LangChain SaaS (api.host.langchain.com / api.smith.langchain.com); override these for a self-hosted or BYOC CP.

  • config.hostBackendUrl: URL of the CP host-backend. The listener calls this to pull deployment config and validate tokens. Default: https://api.host.langchain.com.
  • config.smithBackendUrl: URL of the CP LangSmith backend (the trace-ingest endpoint). Trace payloads are POSTed here. Default: https://api.smith.langchain.com.
  • config.langsmithApiKey: API key the listener uses to authenticate to the CP. Set via existingSecretName in production, not in plaintext.
  • config.langsmithWorkspaceId: workspace this DP belongs to. Required.
  • config.langgraphListenerId: unique ID for this listener. Required when multiple DPs share a CP.
  • config.hostQueue: SAQ queue name (default host). Set to a unique value per install when multiple langgraph-dataplane releases share a Redis instance, otherwise the queues collide.
  • config.existingSecretName: K8s Secret holding the above. Wired up via External Secrets Operator from the cloud secret store (SSM / Key Vault / Secret Manager).
  • ingress.* / gateway.* / istioGateway.*: pick one for how agent traffic enters the cluster. The chart supports plain Ingress, Gateway API HTTPRoute, or Istio VirtualService.

CP side: langsmith chart

The CP-side install holds the JWT secret, license, and platform config. The DP-side langsmithApiKey must be issued from this CP and the JWT secret must match.

  • config.langsmithLicenseKey: required for self-hosted/BYOC CP.
  • config.basicAuth.jwtSecret: must match the value the DP-side listener uses (via API key issuance). Mismatch is the most common cause of "auth works locally but DP cannot reach CP."
  • config.authType: typically mixed for self-hosted (basic auth + OIDC); oauth for OAuth-with-PKCE.
  • config.deployment.enabled: set true to install LangGraph Deployments support on the CP side. Required for the CP to issue listener IDs and route Studio traffic to DP agent servers.
  • config.existingSecretName: same ESO pattern as DP-side; points at the K8s Secret holding license, JWT secret, basic-auth password, etc.

Wiring it together

For the full secret chain (cloud secret store → ESO → K8s Secret → Helm values) and end-to-end terraform/Helm flow per cloud, see the per-cloud Architecture pages: AWS, Azure, GCP, OCP.

Related