LangSmithTOPOLOGIES

Deployment Topologies

Cross-provider catalog of LangSmith, Fleet, and LangGraph Deployments topologies, scored against a common rubric.

Generated from data/topologies/*.yaml by scripts/render_topology_doc.py. Do not edit this file by hand. Edit the YAML and regenerate.

How to read this. This catalog is meant to be walked through with your LangChain PS contact. The rubric scores reflect typical enterprise customers; your specifics may differ.

This catalog enumerates LangSmith, Fleet, and LangGraph Deployments topologies, ordered by recommendation preference. Each entry is scored against a common rubric so they can be compared on the dimensions that matter for choosing between them.

Polly and Insights (in-app context-aware chat and trace pattern analysis, respectively) are optional add-ons that run on LangGraph Deployments. They aren't modeled per topology; assume they're available wherever LangGraph Deployments is present.

Control plane vs. data plane

LangSmith is split into two logical components, and most topology choices come down to where each one runs and which trust boundary sits between them.

  • Control plane (CP). Owns user-facing state and platform identity. Hosts the UI, handles SSO and API-key auth, and manages workspaces, organizations, permissions, dataset and prompt metadata, evaluation configuration, and billing. Issues and validates the tokens the data plane uses.
  • Data plane (DP). Owns runtime traffic and customer payloads. Ingests and stores traces, runs evaluations and online checks, and hosts agent runtimes (LangGraph Deployments). Trace payloads only ever live in the data plane.

Because trace payloads only live in the data plane, the data-plane location determines data residency. The control-plane location mostly determines who operates the UI and where auth state lives. Topologies differ in whether CP and DP run in the same environment, who operates each one, and how they're connected.

Where things run

The table below uses these short labels for environments. Each topology's detail page calls out the specific cloud and region.

  • LangChain SaaS. Operated by LangChain at smith.langchain.com. The customer authenticates and uses it; LangChain runs the infrastructure.
  • Customer Env. Customer-owned infrastructure, operated by the customer. Can be a cloud account they control (AWS, Azure, GCP, OCI), an on-prem datacenter, or a hybrid of the two. The cloud field on each topology says which.
  • BYOC. Bring-your-own-cloud. The cloud account belongs to the customer; LangChain operates the LangSmith stack inside it.
  • Air-Gapped. Customer environment with no egress to LangChain. Images, charts, and licenses move in via internal mirrors.

Topologies at a glance

TopologyControl planeData planesBest forStatus
SaaS Cloud1 · LangChain SaaS1 · LangChain SaaSFastest time to first trace; lowest operational commitment.Recommended
BYOC, Full Stack1 · BYOC1 · BYOCDedicated stack in your cloud; LangChain operates.In Development
BYOC Control Plane, On-Prem Data Plane1 · BYOC1 · Customer EnvOn-prem agents and traces with a LangChain-operated control plane.In Development
Hybrid: SaaS Control Plane, Customer Data Plane1 · LangChain SaaS1 · Customer EnvTrace payloads stay in your cloud; LangChain owns the UI.Recommended
Self-Hosted, Single Cluster1 · Customer Env1 · Customer EnvFull platform in one cluster; you own every layer.Supported
Self-Hosted, Shared CP with Per-Env Data Planes1 · Customer EnvN · Customer Env (per env)Clean dev / stage / prod boundaries with one shared UI.Supported
Self-Hosted, Full Stack per EnvironmentN · Customer Env (per env)N · Customer Env (per env)Hard env isolation, separate UIs per environment.Supported
Self-Hosted, Data Plane per Namespace1 · Customer EnvN · Customer Env (per namespace)Namespace-level blast radius for teams, agents, or use cases.Supported
Self-Hosted, Cross-Cluster (Remote Data Planes)1 · Customer EnvN · Customer Env (many per cluster)Remote data planes in separate clusters or cloud accounts.Situational
Air-Gapped1 · Air-Gapped1 · Air-GappedFully isolated deployment, no egress to LangChain.Supported
Fleet, Headless (No LangSmith)01 · Customer EnvFleet runtime standalone; you build the UI.In Development

Detail

SaaS Cloud

saas-cloud | status: Recommended | version 1 | verified 2026-04-26

LangChain operates the full stack. Customers connect via the public smith.langchain.com endpoint with SSO and tenant-scoped workspaces. Fastest path to first trace, lowest operational burden, and first to receive new platform features, at the cost of sending trace payloads to LangChain infrastructure.

Products

  • LangSmith (control plane, data plane)
  • Fleet (Bundled)
  • LangGraph Deployments

Control plane

  • Location: LangChain SaaS | region: us-east-1

Data planes

  • Multiplicity: Single
NameLocationCloudRegionClusterNamespaceIsolation
langchain-saas-dpLangChain SaaSLangChain SaaSus-east-1n/an/aNone

Isolation and residency

  • Strongest boundary: None
  • Network boundary: None
  • Data residency scope: None
  • Traces leave customer env: yes
  • LLM traffic leaves customer env: yes

Flows and delivery

  • CP to DP: Not Applicable
  • DP to LLM: Direct Egress
  • User to CP: SSO over Public Internet
  • Ingress: Not Applicable
  • IaC: Not Applicable
  • Upgrade cadence: Continuous

Assessment

CategoryScoreNotes
Operational BurdenlowCustomer operates nothing. Workspace admin configures SSO/SCIM, API keys, workspace membership, and retention policies.
Cost DeltalowStandard LangSmith tier. No infra cost for the customer. Egress and LLM costs are whatever the customer's agents generate.
Compliance FitlowTraces and metadata are stored in LangChain infrastructure. Does not satisfy data-residency or air-gap mandates. SOC 2 and workspace isolation cover many enterprise controls but not sovereign-data requirements.
ComplexitylowSingle public endpoint, workspace-based logical tenancy, no customer-owned networking.
Failure Blast RadiushighShared-tenant: an incident affects all customers. LangChain operates SaaS as its highest-priority service: dedicated reliability engineering, incident response, and a public status page.
Skill RequirementslowWorkspace admin skills only. No Kubernetes, Helm, or cloud networking.
Time to First TracelowUnder an hour once a workspace is provisioned and an API key is in hand.
Scale CeilinghighLangChain scales the platform. Workspace-level rate limits apply, but the ceiling is well above most customer workloads.

Related

byoc-full, hybrid-single-dp


BYOC, Full Stack

byoc-full | status: In Development | version 1 | verified 2026-04-26

LangChain operates the LangSmith stack inside a cloud account the customer owns. Control plane, data plane, and backing services all run in the customer's account. Trace payloads stay there; LangChain SREs operate via a scoped cross-account IAM role plus a Tailscale agent for break-glass access. In active development; not yet generally available.

Products

  • LangSmith (control plane, data plane)
  • LangGraph Deployments

Control plane

  • Location: BYOC | cloud: AWS | region: us-east-1 | cluster: byoc-cp-eks

Data planes

  • Multiplicity: Single
NameLocationCloudRegionClusterNamespaceIsolation
byoc-dpBYOCAWSus-east-1byoc-cp-ekslangsmithCloud Account / Subscription

Isolation and residency

  • Strongest boundary: Cloud Account / Subscription
  • Network boundary: Cloud Account / Subscription
  • Data residency scope: Per Account
  • Traces leave customer env: no
  • LLM traffic leaves customer env: yes

Flows and delivery

  • CP to DP: Private Link / Service Endpoint
  • DP to LLM: Direct Egress
  • User to CP: SSO over Public Internet
  • Ingress: Kubernetes Ingress (controller of your choice)
  • IaC: Terraform + Helm
  • Upgrade cadence: Monthly

Assessment

CategoryScoreNotes
Operational BurdenlowLangChain operates the stack: upgrades, migrations, monitoring, and incident response. The customer owns the account and network boundary but does not run day-to-day operations.
Cost DeltamediumDedicated cloud infrastructure in the customer account plus a BYOC management fee. No multi-tenant amortization, so cost sits between SaaS and self-hosted.
Compliance FithighTrace payloads never leave the customer account. LangChain access is via a scoped IAM role and Tailscale break-glass; CloudTrail in the customer account is the system of record for all operator activity. Fits most regulated-industry isolation requirements short of air-gap.
ComplexitymediumSplit ownership: customer owns IAM, account, and network boundary; LangChain owns the application stack. Clear separation of concerns works well when both sides agree on the interface up front.
Failure Blast RadiuslowSingle-tenant. A BYOC incident affects only that customer. No multi-tenant common-mode failure.
Skill RequirementslowCustomer team needs cloud-account and IAM skills only. No Kubernetes or Helm required on the customer side.
Time to First Tracemedium2 to 6 weeks. Account provisioning, access-path setup, and security review drive the timeline more than software install.
Scale CeilinghighVertical and horizontal scale inside the account is standard. Adding a second region is a follow-on motion rather than a replatform.

Related

saas-cloud, byoc-cp-onprem-dp, hybrid-single-dp, self-hosted-single-cluster


BYOC Control Plane, On-Prem Data Plane

byoc-cp-onprem-dp | status: In Development | version 1 | verified 2026-04-23

LangChain operates the control plane inside a cloud account the customer owns. The data plane runs on the customer's on-prem Kubernetes. Agents and trace payloads stay on-prem; control-plane management (UI, metadata, auth) lives in the customer's cloud account. In active development alongside BYOC full stack; not yet generally available.

Products

  • LangSmith (control plane, data plane)
  • LangGraph Deployments

Control plane

  • Location: BYOC | cloud: AWS | region: us-east-1 | cluster: byoc-cp-eks

Data planes

  • Multiplicity: Single
NameLocationCloudRegionClusterNamespaceIsolation
onprem-dpCustomer EnvOn-Premn/aonprem-k8slangsmithAir-Gap

Isolation and residency

  • Strongest boundary: Air-Gap
  • Network boundary: Cloud Account / Subscription
  • Data residency scope: Per Account
  • Traces leave customer env: no
  • LLM traffic leaves customer env: no

Flows and delivery

  • CP to DP: VPN
  • DP to LLM: On-Prem LLM Gateway
  • User to CP: SSO over Public Internet
  • Ingress: Kubernetes Ingress (controller of your choice)
  • IaC: Terraform + Helm
  • Upgrade cadence: Monthly
  • Image sync: Skopeo, crane, or regclient
  • Private registry: Harbor, Artifactory, Quay, or equivalent

Assessment

CategoryScoreNotes
Operational BurdenhighSplit ownership: LangChain operates the cloud-side control plane, the customer operates the on-prem data plane. Coordinated upgrades and cross-environment debugging add overhead beyond either pure BYOC or pure self-hosted.
Cost DeltahighDedicated cloud infrastructure for the control plane plus on-prem compute and storage for the data plane. Cross-environment connectivity (VPN, Direct Connect, or ExpressRoute) adds recurring cost.
Compliance FithighOn-prem data plane satisfies the strongest data-sovereignty and air-gap-adjacent requirements while keeping the UI and metadata in a managed cloud account.
ComplexityhighTwo environments to reason about: cloud-side control plane and on-prem data plane. Private connectivity, on-prem DNS, and on-prem LLM gateway integration all require customer-network expertise.
Failure Blast RadiushighSingle data plane; any on-prem outage stops agent traffic. Control plane outage affects auth validation and the UI but not agent execution already in flight.
Skill RequirementshighOn-prem Kubernetes expertise plus cloud-to-on-prem networking. The customer team must be comfortable operating Kubernetes outside a managed cloud provider.
Time to First Tracehigh8+ weeks typical. On-prem provisioning, private connectivity, and security reviews dominate the timeline; neither the control plane nor the data plane alone is quick.
Scale CeilingmediumLimited by on-prem capacity. Horizontal scale is possible but involves coordinating cloud control-plane changes with on-prem data-plane expansion.

Related

byoc-full, hybrid-single-dp, airgapped


Hybrid: SaaS Control Plane, Customer Data Plane

hybrid-single-dp | status: Recommended | version 1 | verified 2026-04-23

LangChain operates the control plane (SaaS); the customer operates a single self-hosted data plane inside their own cloud account. Traces and agent code execute in the customer's environment; only metadata crosses the boundary.

Products

  • LangSmith (control plane, data plane)
  • LangGraph Deployments

Control plane

  • Location: LangChain SaaS | region: us-east-1

Data planes

  • Multiplicity: Single
NameLocationCloudRegionClusterNamespaceIsolation
production-dpCustomer EnvAWSus-east-1customer-eks-autolangsmithVPC / VNet

Isolation and residency

  • Strongest boundary: VPC / VNet
  • Network boundary: VPC / VNet
  • Data residency scope: Per Region
  • Traces leave customer env: no
  • LLM traffic leaves customer env: yes

Flows and delivery

  • CP to DP: Public TLS
  • DP to LLM: Direct Egress
  • User to CP: SSO over Public Internet
  • Ingress: Kubernetes Ingress (controller of your choice)
  • IaC: Terraform + Helm
  • Upgrade cadence: Monthly

Assessment

CategoryScoreNotes
Operational BurdenmediumCustomer operates the data plane Helm release, Postgres, and networking; control plane operations are LangChain's. Upgrades are monthly Helm bumps.
Cost DeltamediumEKS node group, RDS Postgres, OpenSearch or equivalent, plus NAT gateway egress. License cost is standard LangSmith tier.
Compliance FitmediumTrace payloads stay in the customer VPC; metadata (org, workspace, trace IDs) flows to the SaaS control plane. Fits most enterprise policies; not sufficient for air-gapped or data-sovereignty mandates.
ComplexitymediumThree networking paths to reason about (browser-to-SaaS, listener-to-CP, listener-to-agent). Health checks originate inside the data plane and route through NAT, which surprises teams expecting ingress from LangChain.
Failure Blast RadiusmediumData plane outage stops traffic for the whole deployment; control plane outage affects only observability and deploys, not request serving.
Skill RequirementsmediumProduction Kubernetes fluency, Helm, and cloud networking (VPC, NAT, ingress). Single-cluster scope, no GitOps required.
Time to First TracelowGreenfield install is typically 1 to 2 weeks once networking is approved.
Scale CeilingmediumSingle data plane scales vertically and horizontally within one cluster. Past tens of millions of traces per day, expect to split per env or region.

Related

saas-cloud, byoc-full, byoc-cp-onprem-dp, self-hosted-single-cluster


Self-Hosted, Single Cluster

self-hosted-single-cluster | status: Supported | version 1 | verified 2026-04-23

Control plane and data plane both run inside a single customer-operated Kubernetes cluster. One workspace, one upgrade path, one set of backing services. The default starting point for self-hosted customers who do not need per-team or per-environment isolation.

Products

  • LangSmith (control plane, data plane)
  • LangGraph Deployments

Control plane

  • Location: Customer Env | cloud: AWS | region: us-east-1 | cluster: customer-eks

Data planes

  • Multiplicity: Single
NameLocationCloudRegionClusterNamespaceIsolation
production-dpCustomer EnvAWSus-east-1customer-ekslangsmithKubernetes Cluster

Isolation and residency

  • Strongest boundary: Kubernetes Cluster
  • Network boundary: VPC / VNet
  • Data residency scope: Per Account
  • Traces leave customer env: no
  • LLM traffic leaves customer env: yes

Flows and delivery

  • CP to DP: Private Link / Service Endpoint
  • DP to LLM: Direct Egress
  • User to CP: SSO over Public Internet
  • Ingress: Kubernetes Ingress (controller of your choice)
  • IaC: Terraform + Helm
  • Upgrade cadence: Monthly

Assessment

CategoryScoreNotes
Operational BurdenmediumOne Helm release to maintain, one Postgres, one ClickHouse, one Redis. Upgrades are a single Helm bump plus migrations. ClickHouse disk growth is the most common operational surprise.
Cost DeltamediumEKS (or equivalent) node group, managed Postgres, managed Redis, and ClickHouse storage. License is the standard self-hosted tier.
Compliance FitmediumAll trace payloads stay in the customer account. Fits most enterprise policies; not sufficient for air-gap or per-team data-sovereignty mandates.
ComplexitylowEverything lives in one cluster. Debugging and log aggregation are local. No cross-cluster networking.
Failure Blast RadiushighA cluster outage takes down both control plane and data plane. All workspaces affected simultaneously.
Skill RequirementsmediumProduction Kubernetes and Helm. Familiarity with the backing-service set (Postgres, Redis, ClickHouse). No multi-cluster or GitOps required.
Time to First Tracemedium2 to 4 weeks in a typical enterprise, driven by network and IAM approvals more than install time.
Scale CeilingmediumScales vertically and horizontally inside one cluster. Teams running tens of millions of traces per day typically split into per-env or per-team topologies.

Related

hybrid-single-dp, self-hosted-shared-cp-per-env, self-hosted-stack-per-env, self-hosted-multi-dp-per-namespace


Self-Hosted, Shared CP with Per-Env Data Planes

self-hosted-shared-cp-per-env | status: Supported | version 1 | verified 2026-04-26

One shared control plane fans out to per-environment data planes (dev, staging, prod). All trace data, prompts, and evaluations roll up into a single UI while runtime workloads stay isolated in their own clusters. The shape most teams reach for when they want clean env boundaries without separate UIs to log into.

Products

  • LangSmith (control plane, data plane)
  • LangGraph Deployments

Control plane

  • Location: Customer Env | cloud: Azure | region: eastus | cluster: shared-cp

Data planes

  • Multiplicity: Per Environment
NameLocationCloudRegionClusterNamespaceIsolation
dev-dpCustomer EnvAzureeastusdev-akslangsmithKubernetes Cluster
prod-dpCustomer EnvAzureeastusprod-akslangsmithKubernetes Cluster

Isolation and residency

  • Strongest boundary: Kubernetes Cluster
  • Network boundary: VPC / VNet
  • Data residency scope: Per Region
  • Traces leave customer env: no
  • LLM traffic leaves customer env: yes

Flows and delivery

  • CP to DP: Private Link / Service Endpoint
  • DP to LLM: LangSmith LLM Auth Proxy
  • User to CP: SSO over Public Internet
  • Ingress: Kubernetes Ingress (controller of your choice)
  • IaC: Terraform + Helm
  • Upgrade cadence: Monthly

Assessment

CategoryScoreNotes
Operational BurdenmediumOne CP install plus a DP install per env. CP backing services run once; each DP brings its own Postgres, Redis, and ClickHouse. Upgrades roll DP-by-DP after the CP.
Cost DeltamediumOne set of CP infra plus N sets of DP infra (one per env). Cheaper than a full stack-per-env split, at the cost of a shared CP change-management window.
Compliance FitmediumPer-env DP isolation covers runtime separation and per-env data residency. Shared CP means org-level config and audit affect all envs together; not a fit when regulators require fully independent stacks per env.
ComplexityhighNetworking from the shared CP to each DP cluster, per-env ingress and DNS, and CP-to-DP auth across clusters. Cross-cluster identity is the main source of subtle breakage.
Failure Blast RadiusmediumA DP outage takes down one env; the shared CP and other envs stay up. A CP outage degrades the UI and platform operations across all envs, but trace ingestion continues.
Skill RequirementshighMulti-cluster Kubernetes, cross-cluster networking, and identity management for CP-to-DP auth.
Time to First Tracemedium4 to 6 weeks. After the CP is stable, adding each new env DP is incremental.
Scale CeilinghighScales horizontally by adding DPs. CP is the shared resource; watch its backing services as DP count grows.

Related

self-hosted-single-cluster, self-hosted-stack-per-env, self-hosted-multi-dp-per-namespace, self-hosted-cross-cluster


Self-Hosted, Full Stack per Environment

self-hosted-stack-per-env | status: Supported | version 2 | verified 2026-04-26

Each environment (dev, staging, prod) runs a complete LangSmith stack: its own control plane, data plane, and backing services in its own cluster. No shared UI; teams log into each env separately. The hardest isolation available short of air-gap, at the cost of duplicated backing services and N independent change-management windows.

Products

  • LangSmith (control plane, data plane)
  • LangGraph Deployments

Control plane

  • Location: Customer Env | cloud: Azure | region: eastus | instances: 2 (one per environment)

Data planes

  • Multiplicity: Per Environment
NameLocationCloudRegionClusterNamespaceIsolation
dev-dpCustomer EnvAzureeastusdev-akslangsmithKubernetes Cluster
prod-dpCustomer EnvAzureeastusprod-akslangsmithKubernetes Cluster

Isolation and residency

  • Strongest boundary: Kubernetes Cluster
  • Network boundary: VPC / VNet
  • Data residency scope: Per Region
  • Traces leave customer env: no
  • LLM traffic leaves customer env: yes

Flows and delivery

  • CP to DP: Private Link / Service Endpoint
  • DP to LLM: LangSmith LLM Auth Proxy
  • User to CP: SSO over Public Internet
  • Ingress: Kubernetes Ingress (controller of your choice)
  • IaC: Terraform + Helm
  • Upgrade cadence: Monthly

Assessment

CategoryScoreNotes
Operational BurdenhighEach env has its own complete install: CP, DP, Postgres, Redis, ClickHouse. Upgrades run dev first, then prod. N times the backing services to monitor.
Cost DeltahighN complete LangSmith stacks (CP + DP + backing services). If the LLM auth proxy lives in a different cloud or region from the data planes, expect cross-cloud egress costs and added latency.
Compliance FithighStrongest enterprise change-management separation. Independent CP, DP, config, and SSO scopes per env. Per-env data residency is possible. Fits regulators that require fully independent stacks.
ComplexityhighN complete LangSmith stacks to operate. Per-env ingress, DNS, SSO, and identity. A cross-cloud LLM auth proxy adds proxy latency and firewall coordination on top of that.
Failure Blast RadiusmediumAn outage stays in one env. CP, DP, and backing-service failures are all isolated to the env that owns them; other envs keep running.
Skill RequirementshighMulti-cluster Kubernetes and cross-cloud networking if LLM providers or gateways live in a different cloud. Multiple independent LangSmith installs to operate.
Time to First Tracemedium4 to 8 weeks. Second environment is faster once the first install is stable and the Terraform modules are validated.
Scale CeilinghighScales independently per env. Further horizontal splits are straightforward once the pattern is proven.

Related

self-hosted-shared-cp-per-env, self-hosted-single-cluster, self-hosted-multi-dp-per-namespace, self-hosted-cross-cluster


Self-Hosted, Data Plane per Namespace

self-hosted-multi-dp-per-namespace | status: Supported | version 1 | verified 2026-04-26

One shared control plane, one data plane per Kubernetes namespace within a shared cluster. Agent pods in each namespace cannot access control-plane secrets. Customers use namespaces to separate teams, agents, or use cases without paying the cost of a cluster per tenant.

Products

  • LangSmith (control plane, data plane)
  • LangGraph Deployments

Control plane

  • Location: Customer Env | cloud: AWS | region: us-east-1 | cluster: shared-platform-eks

Data planes

  • Multiplicity: Per Namespace
NameLocationCloudRegionClusterNamespaceIsolation
tenant-aCustomer EnvAWSus-east-1shared-platform-eksls-dp-tenant-aKubernetes Namespace
tenant-bCustomer EnvAWSus-east-1shared-platform-eksls-dp-tenant-bKubernetes Namespace

Isolation and residency

  • Strongest boundary: Kubernetes Namespace
  • Network boundary: Kubernetes Namespace
  • Data residency scope: Per Account
  • Traces leave customer env: no
  • LLM traffic leaves customer env: yes

Flows and delivery

  • CP to DP: mTLS over Private Link
  • DP to LLM: Egress Gateway
  • User to CP: SSO over Public Internet
  • Ingress: Kubernetes Ingress (controller of your choice)
  • IaC: Terraform + Helm
  • Upgrade cadence: Monthly

Assessment

CategoryScoreNotes
Operational BurdenhighOne upgrade path per data plane, locked to the same chart version as the control plane (cluster-scoped CRD). Namespace-scoped RBAC, quotas, network policies, and the LangSmith operator's WATCH_NAMESPACE all need explicit platform-team ownership.
Cost DeltamediumShared cluster amortizes node cost. Per-namespace data planes add pod-level overhead and egress gateway capacity.
Compliance FithighNamespace isolation plus per-namespace data locality and separate secrets satisfy most financial-services and regulated-industry tenant-isolation requirements.
ComplexityhighThree valid routing patterns (ALB-per-namespace, Envoy Gateway, or Istio), per-namespace IRSA and service-account setup, host-backend RBAC into each DP namespace, and a cluster-scoped CRD that requires version-locked chart upgrades across all releases.
Failure Blast RadiuslowA single data plane failure is contained to that namespace. Shared cluster control-plane incidents are the one common-mode risk.
Skill RequirementshighPlatform team fluent in Kubernetes multi-tenancy, ALB/Envoy/Istio ingress, IAM (IRSA on AWS, workload identity on GCP/Azure), GitOps, and per-namespace secret distribution.
Time to First Tracemedium4 to 8 weeks for platform build-out. Each subsequent namespace onboards in days once the platform is stable.
Scale CeilinghighScales to dozens of namespaces and tens of millions of traces per day. Past that, expect to split clusters.

Related

self-hosted-single-cluster, self-hosted-cross-cluster


Self-Hosted, Cross-Cluster (Remote Data Planes)

self-hosted-cross-cluster | status: Situational | version 1 | verified 2026-04-26

Control plane in one cluster; data planes in separate clusters, often separate accounts. Agent servers call back to the CP for auth. Cross-origin breaks the UI (Assistants, Threads, Crons, Studio, HITL) when CP and DPs live on different domains; a same-origin reverse proxy or colocation avoids it.

Products

  • LangSmith (control plane, data plane)
  • LangGraph Deployments

Control plane

  • Location: Customer Env | cloud: AWS | region: us-east-1 | cluster: cp-account-eks

Data planes

  • Multiplicity: Many per Cluster
NameLocationCloudRegionClusterNamespaceIsolation
dp-account-aCustomer EnvAWSus-east-1dp-account-a-ekslanggraphCloud Account / Subscription
dp-account-bCustomer EnvAWSus-east-1dp-account-b-ekslanggraphCloud Account / Subscription

Isolation and residency

  • Strongest boundary: Cloud Account / Subscription
  • Network boundary: Cloud Account / Subscription
  • Data residency scope: Per Account
  • Traces leave customer env: no
  • LLM traffic leaves customer env: yes

Flows and delivery

  • CP to DP: mTLS over Private Link
  • DP to LLM: Direct Egress
  • User to CP: SSO over Public Internet
  • Ingress: Istio
  • IaC: Terraform + Helm
  • Upgrade cadence: Monthly

Assessment

CategoryScoreNotes
Operational BurdenhighEach data plane is a separate Helm release, often in a separate cloud account and on its own domain. Upgrades fan out across accounts. The current cross-origin workarounds add a component (reverse proxy or sidecar) to operate.
Cost DeltahighPer-account infrastructure duplication plus cross-account networking (PrivateLink or Transit Gateway). LLM egress billed per account.
Compliance FithighAccount-level isolation satisfies the strongest tenant-isolation requirements short of air-gap. Blast-radius and billing attribution are both clean.
ComplexityhighCross-origin cookies, ingress gateway routes, per-cluster domains, and cross-account IAM. Routing workarounds require either a same-origin reverse proxy or an auth sidecar that injects API keys at the data plane ingress.
Failure Blast RadiuslowA single data plane outage affects only that workload. Control-plane failure affects auth validation for all data planes.
Skill RequirementshighMulti-account cloud operations, service-mesh or ingress-gateway fluency, cross-account networking, plus familiarity with the current cross-origin routing workarounds.
Time to First Tracehigh8+ weeks typical. Cross-account networking approvals dominate the timeline. First data plane slower than subsequent ones.
Scale CeilinghighScales horizontally by adding data-plane accounts. Control plane is the eventual bottleneck for auth validation throughput.

Related

self-hosted-shared-cp-per-env, self-hosted-stack-per-env, self-hosted-multi-dp-per-namespace


Air-Gapped

airgapped | status: Supported | version 1 | verified 2026-04-26

Control plane and data plane run in a fully air-gapped customer cluster with no outbound internet. Images mirror to an internal registry; Helm charts deploy from internal sources. The right model for regulated industries that cannot egress to LangChain. No beacon means license metering is contractual; the customer reports usage on the agreed cadence.

Products

  • LangSmith (control plane, data plane)
  • LangGraph Deployments

Control plane

  • Location: Air-Gapped | cloud: On-Prem | cluster: airgap-cp

Data planes

  • Multiplicity: Single
NameLocationCloudRegionClusterNamespaceIsolation
airgap-dpAir-GappedOn-Premn/aairgap-cplangsmithAir-Gap

Isolation and residency

  • Strongest boundary: Air-Gap
  • Network boundary: Air-Gap
  • Data residency scope: Air-Gapped
  • Traces leave customer env: no
  • LLM traffic leaves customer env: no

Flows and delivery

  • CP to DP: Air-Gap Ingest (manual or scheduled)
  • DP to LLM: On-Prem LLM Gateway
  • User to CP: VPN Only
  • Ingress: Kubernetes Ingress (controller of your choice)
  • IaC: Terraform + Helm
  • Upgrade cadence: Quarterly
  • Image sync: Skopeo, crane, or regclient
  • Private registry: Harbor, Artifactory, Quay, or equivalent
  • SSO: Custom OIDC | SCIM: no

Assessment

CategoryScoreNotes
Operational BurdenhighImage and chart mirroring pipelines plus license proxy all need active ownership. New releases require coordinated mirror cycles before they can be applied. Customer also owns capturing and reporting usage metrics to LangChain on the contractual cadence, since no automated beacon is possible.
Cost DeltahighInternal registry (Harbor or equivalent), Git infrastructure, and on-prem LLM gateway capacity. Telemetry and remote support paths cost extra engineering time.
Compliance FithighSatisfies the strongest data-sovereignty and air-gap mandates. Standard pattern for regulated financial services, defense, and healthcare. Order paperwork needs an addendum for manual usage reporting in lieu of the standard telemetry beacon.
ComplexityhighInternal-mirror discipline, chart-dependency management, and on-prem LLM gateway integration. Coordinating image-and-chart mirror cycles with cluster reconciliation is the main source of upgrade friction.
Failure Blast RadiushighAll tenants in one air-gapped deployment share a single cluster. No cross-region failover to LangChain infrastructure.
Skill RequirementshighPlatform team comfortable operating regulated Kubernetes clusters, maintaining internal image and chart mirrors, and integrating with an on-prem LLM gateway.
Time to First Tracehigh8+ weeks. Air-gap adds a week to nearly every phase, especially image mirroring and license-proxy validation.
Scale CeilingmediumVertical and horizontal scale inside the air-gapped environment is fine. Multi-region within air-gap requires replicating the entire mirror/Git/gateway stack.

Related

self-hosted-single-cluster, self-hosted-stack-per-env


Fleet, Headless (No LangSmith)

fleet-headless-no-langsmith | status: In Development | version 1 | verified 2026-04-23

Fleet runtime operates standalone in the customer's environment with no LangSmith control plane and no LangSmith UI. Customers build their own front-end against Fleet APIs. Emerging topology that follows from Fleet decoupling; not yet in production.

Products

  • Fleet (Headless)

Control plane

  • Location: Absent | instances: 0 (no control plane)

Data planes

  • Multiplicity: Single
NameLocationCloudRegionClusterNamespaceIsolation
fleet-runtimeCustomer EnvAWSus-west-2customer-eksfleetKubernetes Cluster

Isolation and residency

  • Strongest boundary: Kubernetes Cluster
  • Network boundary: VPC / VNet
  • Data residency scope: Per Account
  • Traces leave customer env: no
  • LLM traffic leaves customer env: yes

Flows and delivery

  • CP to DP: Not Applicable
  • DP to LLM: Direct Egress
  • User to CP: Private DNS
  • Ingress: Kubernetes Ingress (controller of your choice)
  • IaC: Terraform + Helm
  • Upgrade cadence: Monthly

Assessment

CategoryScoreNotes
Operational BurdenmediumCustomer operates the Fleet runtime and its backing services. No LangSmith stack to maintain, no trace-ingestion pipeline. Upgrade discipline still matters for Fleet itself.
Cost DeltamediumSmaller footprint than a full LangSmith install. Custom UI hosting and ongoing front-end engineering are the dominant costs.
Compliance FithighZero LangChain-hosted components. All agent state and traffic stay in the customer account. Matches profiles that reject SaaS observability on principle.
ComplexityhighCustomer owns all UX. No LangChain-provided debugging tools (no Studio, no traces UI). Fleet auth decoupling and API contracts are still evolving.
Failure Blast RadiushighA Fleet runtime outage stops all agent traffic. No LangSmith tier to fall back to for trace replay or rollback.
Skill RequirementshighProduction Kubernetes plus full-stack front-end engineering. Teams must be comfortable building and operating a bespoke UI on Fleet APIs.
Time to First TracehighNot applicable in the traditional sense. Time-to-first-agent-request depends on how much custom UI is in scope. 8+ weeks in practice.
Scale CeilinghighFleet runtime scales independently; the UI is the customer's problem. No shared infrastructure bottleneck.

Related

self-hosted-single-cluster