Deployment Topologies
Cross-provider catalog of LangSmith, Fleet, and LangGraph Deployments topologies, scored against a common rubric.
Generated from
data/topologies/*.yamlbyscripts/render_topology_doc.py. Do not edit this file by hand. Edit the YAML and regenerate.
How to read this. This catalog is meant to be walked through with your LangChain PS contact. The rubric scores reflect typical enterprise customers; your specifics may differ.
This catalog enumerates LangSmith, Fleet, and LangGraph Deployments topologies, ordered by recommendation preference. Each entry is scored against a common rubric so they can be compared on the dimensions that matter for choosing between them.
Polly and Insights (in-app context-aware chat and trace pattern analysis, respectively) are optional add-ons that run on LangGraph Deployments. They aren't modeled per topology; assume they're available wherever LangGraph Deployments is present.
Control plane vs. data plane
LangSmith is split into two logical components, and most topology choices come down to where each one runs and which trust boundary sits between them.
- Control plane (CP). Owns user-facing state and platform identity. Hosts the UI, handles SSO and API-key auth, and manages workspaces, organizations, permissions, dataset and prompt metadata, evaluation configuration, and billing. Issues and validates the tokens the data plane uses.
- Data plane (DP). Owns runtime traffic and customer payloads. Ingests and stores traces, runs evaluations and online checks, and hosts agent runtimes (LangGraph Deployments). Trace payloads only ever live in the data plane.
Because trace payloads only live in the data plane, the data-plane location determines data residency. The control-plane location mostly determines who operates the UI and where auth state lives. Topologies differ in whether CP and DP run in the same environment, who operates each one, and how they're connected.
Where things run
The table below uses these short labels for environments. Each topology's detail page calls out the specific cloud and region.
- LangChain SaaS. Operated by LangChain at smith.langchain.com. The customer authenticates and uses it; LangChain runs the infrastructure.
- Customer Env. Customer-owned infrastructure, operated by the customer. Can be a cloud account they control (AWS, Azure, GCP, OCI), an on-prem datacenter, or a hybrid of the two. The
cloudfield on each topology says which. - BYOC. Bring-your-own-cloud. The cloud account belongs to the customer; LangChain operates the LangSmith stack inside it.
- Air-Gapped. Customer environment with no egress to LangChain. Images, charts, and licenses move in via internal mirrors.
Topologies at a glance
| Topology | Control plane | Data planes | Best for | Status |
|---|---|---|---|---|
| SaaS Cloud | 1 · LangChain SaaS | 1 · LangChain SaaS | Fastest time to first trace; lowest operational commitment. | Recommended |
| BYOC, Full Stack | 1 · BYOC | 1 · BYOC | Dedicated stack in your cloud; LangChain operates. | In Development |
| BYOC Control Plane, On-Prem Data Plane | 1 · BYOC | 1 · Customer Env | On-prem agents and traces with a LangChain-operated control plane. | In Development |
| Hybrid: SaaS Control Plane, Customer Data Plane | 1 · LangChain SaaS | 1 · Customer Env | Trace payloads stay in your cloud; LangChain owns the UI. | Recommended |
| Self-Hosted, Single Cluster | 1 · Customer Env | 1 · Customer Env | Full platform in one cluster; you own every layer. | Supported |
| Self-Hosted, Shared CP with Per-Env Data Planes | 1 · Customer Env | N · Customer Env (per env) | Clean dev / stage / prod boundaries with one shared UI. | Supported |
| Self-Hosted, Full Stack per Environment | N · Customer Env (per env) | N · Customer Env (per env) | Hard env isolation, separate UIs per environment. | Supported |
| Self-Hosted, Data Plane per Namespace | 1 · Customer Env | N · Customer Env (per namespace) | Namespace-level blast radius for teams, agents, or use cases. | Supported |
| Self-Hosted, Cross-Cluster (Remote Data Planes) | 1 · Customer Env | N · Customer Env (many per cluster) | Remote data planes in separate clusters or cloud accounts. | Situational |
| Air-Gapped | 1 · Air-Gapped | 1 · Air-Gapped | Fully isolated deployment, no egress to LangChain. | Supported |
| Fleet, Headless (No LangSmith) | 0 | 1 · Customer Env | Fleet runtime standalone; you build the UI. | In Development |
Detail
SaaS Cloud
saas-cloud | status: Recommended | version 1 | verified 2026-04-26
LangChain operates the full stack. Customers connect via the public smith.langchain.com endpoint with SSO and tenant-scoped workspaces. Fastest path to first trace, lowest operational burden, and first to receive new platform features, at the cost of sending trace payloads to LangChain infrastructure.
Products
- LangSmith (control plane, data plane)
- Fleet (Bundled)
- LangGraph Deployments
Control plane
- Location: LangChain SaaS | region: us-east-1
Data planes
- Multiplicity: Single
| Name | Location | Cloud | Region | Cluster | Namespace | Isolation |
|---|---|---|---|---|---|---|
| langchain-saas-dp | LangChain SaaS | LangChain SaaS | us-east-1 | n/a | n/a | None |
Isolation and residency
- Strongest boundary: None
- Network boundary: None
- Data residency scope: None
- Traces leave customer env: yes
- LLM traffic leaves customer env: yes
Flows and delivery
- CP to DP: Not Applicable
- DP to LLM: Direct Egress
- User to CP: SSO over Public Internet
- Ingress: Not Applicable
- IaC: Not Applicable
- Upgrade cadence: Continuous
Assessment
| Category | Score | Notes |
|---|---|---|
| Operational Burden | low | Customer operates nothing. Workspace admin configures SSO/SCIM, API keys, workspace membership, and retention policies. |
| Cost Delta | low | Standard LangSmith tier. No infra cost for the customer. Egress and LLM costs are whatever the customer's agents generate. |
| Compliance Fit | low | Traces and metadata are stored in LangChain infrastructure. Does not satisfy data-residency or air-gap mandates. SOC 2 and workspace isolation cover many enterprise controls but not sovereign-data requirements. |
| Complexity | low | Single public endpoint, workspace-based logical tenancy, no customer-owned networking. |
| Failure Blast Radius | high | Shared-tenant: an incident affects all customers. LangChain operates SaaS as its highest-priority service: dedicated reliability engineering, incident response, and a public status page. |
| Skill Requirements | low | Workspace admin skills only. No Kubernetes, Helm, or cloud networking. |
| Time to First Trace | low | Under an hour once a workspace is provisioned and an API key is in hand. |
| Scale Ceiling | high | LangChain scales the platform. Workspace-level rate limits apply, but the ceiling is well above most customer workloads. |
Related
BYOC, Full Stack
byoc-full | status: In Development | version 1 | verified 2026-04-26
LangChain operates the LangSmith stack inside a cloud account the customer owns. Control plane, data plane, and backing services all run in the customer's account. Trace payloads stay there; LangChain SREs operate via a scoped cross-account IAM role plus a Tailscale agent for break-glass access. In active development; not yet generally available.
Products
- LangSmith (control plane, data plane)
- LangGraph Deployments
Control plane
- Location: BYOC | cloud: AWS | region: us-east-1 | cluster: byoc-cp-eks
Data planes
- Multiplicity: Single
| Name | Location | Cloud | Region | Cluster | Namespace | Isolation |
|---|---|---|---|---|---|---|
| byoc-dp | BYOC | AWS | us-east-1 | byoc-cp-eks | langsmith | Cloud Account / Subscription |
Isolation and residency
- Strongest boundary: Cloud Account / Subscription
- Network boundary: Cloud Account / Subscription
- Data residency scope: Per Account
- Traces leave customer env: no
- LLM traffic leaves customer env: yes
Flows and delivery
- CP to DP: Private Link / Service Endpoint
- DP to LLM: Direct Egress
- User to CP: SSO over Public Internet
- Ingress: Kubernetes Ingress (controller of your choice)
- IaC: Terraform + Helm
- Upgrade cadence: Monthly
Assessment
| Category | Score | Notes |
|---|---|---|
| Operational Burden | low | LangChain operates the stack: upgrades, migrations, monitoring, and incident response. The customer owns the account and network boundary but does not run day-to-day operations. |
| Cost Delta | medium | Dedicated cloud infrastructure in the customer account plus a BYOC management fee. No multi-tenant amortization, so cost sits between SaaS and self-hosted. |
| Compliance Fit | high | Trace payloads never leave the customer account. LangChain access is via a scoped IAM role and Tailscale break-glass; CloudTrail in the customer account is the system of record for all operator activity. Fits most regulated-industry isolation requirements short of air-gap. |
| Complexity | medium | Split ownership: customer owns IAM, account, and network boundary; LangChain owns the application stack. Clear separation of concerns works well when both sides agree on the interface up front. |
| Failure Blast Radius | low | Single-tenant. A BYOC incident affects only that customer. No multi-tenant common-mode failure. |
| Skill Requirements | low | Customer team needs cloud-account and IAM skills only. No Kubernetes or Helm required on the customer side. |
| Time to First Trace | medium | 2 to 6 weeks. Account provisioning, access-path setup, and security review drive the timeline more than software install. |
| Scale Ceiling | high | Vertical and horizontal scale inside the account is standard. Adding a second region is a follow-on motion rather than a replatform. |
Related
saas-cloud, byoc-cp-onprem-dp, hybrid-single-dp, self-hosted-single-cluster
BYOC Control Plane, On-Prem Data Plane
byoc-cp-onprem-dp | status: In Development | version 1 | verified 2026-04-23
LangChain operates the control plane inside a cloud account the customer owns. The data plane runs on the customer's on-prem Kubernetes. Agents and trace payloads stay on-prem; control-plane management (UI, metadata, auth) lives in the customer's cloud account. In active development alongside BYOC full stack; not yet generally available.
Products
- LangSmith (control plane, data plane)
- LangGraph Deployments
Control plane
- Location: BYOC | cloud: AWS | region: us-east-1 | cluster: byoc-cp-eks
Data planes
- Multiplicity: Single
| Name | Location | Cloud | Region | Cluster | Namespace | Isolation |
|---|---|---|---|---|---|---|
| onprem-dp | Customer Env | On-Prem | n/a | onprem-k8s | langsmith | Air-Gap |
Isolation and residency
- Strongest boundary: Air-Gap
- Network boundary: Cloud Account / Subscription
- Data residency scope: Per Account
- Traces leave customer env: no
- LLM traffic leaves customer env: no
Flows and delivery
- CP to DP: VPN
- DP to LLM: On-Prem LLM Gateway
- User to CP: SSO over Public Internet
- Ingress: Kubernetes Ingress (controller of your choice)
- IaC: Terraform + Helm
- Upgrade cadence: Monthly
- Image sync: Skopeo, crane, or regclient
- Private registry: Harbor, Artifactory, Quay, or equivalent
Assessment
| Category | Score | Notes |
|---|---|---|
| Operational Burden | high | Split ownership: LangChain operates the cloud-side control plane, the customer operates the on-prem data plane. Coordinated upgrades and cross-environment debugging add overhead beyond either pure BYOC or pure self-hosted. |
| Cost Delta | high | Dedicated cloud infrastructure for the control plane plus on-prem compute and storage for the data plane. Cross-environment connectivity (VPN, Direct Connect, or ExpressRoute) adds recurring cost. |
| Compliance Fit | high | On-prem data plane satisfies the strongest data-sovereignty and air-gap-adjacent requirements while keeping the UI and metadata in a managed cloud account. |
| Complexity | high | Two environments to reason about: cloud-side control plane and on-prem data plane. Private connectivity, on-prem DNS, and on-prem LLM gateway integration all require customer-network expertise. |
| Failure Blast Radius | high | Single data plane; any on-prem outage stops agent traffic. Control plane outage affects auth validation and the UI but not agent execution already in flight. |
| Skill Requirements | high | On-prem Kubernetes expertise plus cloud-to-on-prem networking. The customer team must be comfortable operating Kubernetes outside a managed cloud provider. |
| Time to First Trace | high | 8+ weeks typical. On-prem provisioning, private connectivity, and security reviews dominate the timeline; neither the control plane nor the data plane alone is quick. |
| Scale Ceiling | medium | Limited by on-prem capacity. Horizontal scale is possible but involves coordinating cloud control-plane changes with on-prem data-plane expansion. |
Related
byoc-full, hybrid-single-dp, airgapped
Hybrid: SaaS Control Plane, Customer Data Plane
hybrid-single-dp | status: Recommended | version 1 | verified 2026-04-23
LangChain operates the control plane (SaaS); the customer operates a single self-hosted data plane inside their own cloud account. Traces and agent code execute in the customer's environment; only metadata crosses the boundary.
Products
- LangSmith (control plane, data plane)
- LangGraph Deployments
Control plane
- Location: LangChain SaaS | region: us-east-1
Data planes
- Multiplicity: Single
| Name | Location | Cloud | Region | Cluster | Namespace | Isolation |
|---|---|---|---|---|---|---|
| production-dp | Customer Env | AWS | us-east-1 | customer-eks-auto | langsmith | VPC / VNet |
Isolation and residency
- Strongest boundary: VPC / VNet
- Network boundary: VPC / VNet
- Data residency scope: Per Region
- Traces leave customer env: no
- LLM traffic leaves customer env: yes
Flows and delivery
- CP to DP: Public TLS
- DP to LLM: Direct Egress
- User to CP: SSO over Public Internet
- Ingress: Kubernetes Ingress (controller of your choice)
- IaC: Terraform + Helm
- Upgrade cadence: Monthly
Assessment
| Category | Score | Notes |
|---|---|---|
| Operational Burden | medium | Customer operates the data plane Helm release, Postgres, and networking; control plane operations are LangChain's. Upgrades are monthly Helm bumps. |
| Cost Delta | medium | EKS node group, RDS Postgres, OpenSearch or equivalent, plus NAT gateway egress. License cost is standard LangSmith tier. |
| Compliance Fit | medium | Trace payloads stay in the customer VPC; metadata (org, workspace, trace IDs) flows to the SaaS control plane. Fits most enterprise policies; not sufficient for air-gapped or data-sovereignty mandates. |
| Complexity | medium | Three networking paths to reason about (browser-to-SaaS, listener-to-CP, listener-to-agent). Health checks originate inside the data plane and route through NAT, which surprises teams expecting ingress from LangChain. |
| Failure Blast Radius | medium | Data plane outage stops traffic for the whole deployment; control plane outage affects only observability and deploys, not request serving. |
| Skill Requirements | medium | Production Kubernetes fluency, Helm, and cloud networking (VPC, NAT, ingress). Single-cluster scope, no GitOps required. |
| Time to First Trace | low | Greenfield install is typically 1 to 2 weeks once networking is approved. |
| Scale Ceiling | medium | Single data plane scales vertically and horizontally within one cluster. Past tens of millions of traces per day, expect to split per env or region. |
Related
saas-cloud, byoc-full, byoc-cp-onprem-dp, self-hosted-single-cluster
Self-Hosted, Single Cluster
self-hosted-single-cluster | status: Supported | version 1 | verified 2026-04-23
Control plane and data plane both run inside a single customer-operated Kubernetes cluster. One workspace, one upgrade path, one set of backing services. The default starting point for self-hosted customers who do not need per-team or per-environment isolation.
Products
- LangSmith (control plane, data plane)
- LangGraph Deployments
Control plane
- Location: Customer Env | cloud: AWS | region: us-east-1 | cluster: customer-eks
Data planes
- Multiplicity: Single
| Name | Location | Cloud | Region | Cluster | Namespace | Isolation |
|---|---|---|---|---|---|---|
| production-dp | Customer Env | AWS | us-east-1 | customer-eks | langsmith | Kubernetes Cluster |
Isolation and residency
- Strongest boundary: Kubernetes Cluster
- Network boundary: VPC / VNet
- Data residency scope: Per Account
- Traces leave customer env: no
- LLM traffic leaves customer env: yes
Flows and delivery
- CP to DP: Private Link / Service Endpoint
- DP to LLM: Direct Egress
- User to CP: SSO over Public Internet
- Ingress: Kubernetes Ingress (controller of your choice)
- IaC: Terraform + Helm
- Upgrade cadence: Monthly
Assessment
| Category | Score | Notes |
|---|---|---|
| Operational Burden | medium | One Helm release to maintain, one Postgres, one ClickHouse, one Redis. Upgrades are a single Helm bump plus migrations. ClickHouse disk growth is the most common operational surprise. |
| Cost Delta | medium | EKS (or equivalent) node group, managed Postgres, managed Redis, and ClickHouse storage. License is the standard self-hosted tier. |
| Compliance Fit | medium | All trace payloads stay in the customer account. Fits most enterprise policies; not sufficient for air-gap or per-team data-sovereignty mandates. |
| Complexity | low | Everything lives in one cluster. Debugging and log aggregation are local. No cross-cluster networking. |
| Failure Blast Radius | high | A cluster outage takes down both control plane and data plane. All workspaces affected simultaneously. |
| Skill Requirements | medium | Production Kubernetes and Helm. Familiarity with the backing-service set (Postgres, Redis, ClickHouse). No multi-cluster or GitOps required. |
| Time to First Trace | medium | 2 to 4 weeks in a typical enterprise, driven by network and IAM approvals more than install time. |
| Scale Ceiling | medium | Scales vertically and horizontally inside one cluster. Teams running tens of millions of traces per day typically split into per-env or per-team topologies. |
Related
hybrid-single-dp, self-hosted-shared-cp-per-env, self-hosted-stack-per-env, self-hosted-multi-dp-per-namespace
Self-Hosted, Shared CP with Per-Env Data Planes
self-hosted-shared-cp-per-env | status: Supported | version 1 | verified 2026-04-26
One shared control plane fans out to per-environment data planes (dev, staging, prod). All trace data, prompts, and evaluations roll up into a single UI while runtime workloads stay isolated in their own clusters. The shape most teams reach for when they want clean env boundaries without separate UIs to log into.
Products
- LangSmith (control plane, data plane)
- LangGraph Deployments
Control plane
- Location: Customer Env | cloud: Azure | region: eastus | cluster: shared-cp
Data planes
- Multiplicity: Per Environment
| Name | Location | Cloud | Region | Cluster | Namespace | Isolation |
|---|---|---|---|---|---|---|
| dev-dp | Customer Env | Azure | eastus | dev-aks | langsmith | Kubernetes Cluster |
| prod-dp | Customer Env | Azure | eastus | prod-aks | langsmith | Kubernetes Cluster |
Isolation and residency
- Strongest boundary: Kubernetes Cluster
- Network boundary: VPC / VNet
- Data residency scope: Per Region
- Traces leave customer env: no
- LLM traffic leaves customer env: yes
Flows and delivery
- CP to DP: Private Link / Service Endpoint
- DP to LLM: LangSmith LLM Auth Proxy
- User to CP: SSO over Public Internet
- Ingress: Kubernetes Ingress (controller of your choice)
- IaC: Terraform + Helm
- Upgrade cadence: Monthly
Assessment
| Category | Score | Notes |
|---|---|---|
| Operational Burden | medium | One CP install plus a DP install per env. CP backing services run once; each DP brings its own Postgres, Redis, and ClickHouse. Upgrades roll DP-by-DP after the CP. |
| Cost Delta | medium | One set of CP infra plus N sets of DP infra (one per env). Cheaper than a full stack-per-env split, at the cost of a shared CP change-management window. |
| Compliance Fit | medium | Per-env DP isolation covers runtime separation and per-env data residency. Shared CP means org-level config and audit affect all envs together; not a fit when regulators require fully independent stacks per env. |
| Complexity | high | Networking from the shared CP to each DP cluster, per-env ingress and DNS, and CP-to-DP auth across clusters. Cross-cluster identity is the main source of subtle breakage. |
| Failure Blast Radius | medium | A DP outage takes down one env; the shared CP and other envs stay up. A CP outage degrades the UI and platform operations across all envs, but trace ingestion continues. |
| Skill Requirements | high | Multi-cluster Kubernetes, cross-cluster networking, and identity management for CP-to-DP auth. |
| Time to First Trace | medium | 4 to 6 weeks. After the CP is stable, adding each new env DP is incremental. |
| Scale Ceiling | high | Scales horizontally by adding DPs. CP is the shared resource; watch its backing services as DP count grows. |
Related
self-hosted-single-cluster, self-hosted-stack-per-env, self-hosted-multi-dp-per-namespace, self-hosted-cross-cluster
Self-Hosted, Full Stack per Environment
self-hosted-stack-per-env | status: Supported | version 2 | verified 2026-04-26
Each environment (dev, staging, prod) runs a complete LangSmith stack: its own control plane, data plane, and backing services in its own cluster. No shared UI; teams log into each env separately. The hardest isolation available short of air-gap, at the cost of duplicated backing services and N independent change-management windows.
Products
- LangSmith (control plane, data plane)
- LangGraph Deployments
Control plane
- Location: Customer Env | cloud: Azure | region: eastus | instances: 2 (one per environment)
Data planes
- Multiplicity: Per Environment
| Name | Location | Cloud | Region | Cluster | Namespace | Isolation |
|---|---|---|---|---|---|---|
| dev-dp | Customer Env | Azure | eastus | dev-aks | langsmith | Kubernetes Cluster |
| prod-dp | Customer Env | Azure | eastus | prod-aks | langsmith | Kubernetes Cluster |
Isolation and residency
- Strongest boundary: Kubernetes Cluster
- Network boundary: VPC / VNet
- Data residency scope: Per Region
- Traces leave customer env: no
- LLM traffic leaves customer env: yes
Flows and delivery
- CP to DP: Private Link / Service Endpoint
- DP to LLM: LangSmith LLM Auth Proxy
- User to CP: SSO over Public Internet
- Ingress: Kubernetes Ingress (controller of your choice)
- IaC: Terraform + Helm
- Upgrade cadence: Monthly
Assessment
| Category | Score | Notes |
|---|---|---|
| Operational Burden | high | Each env has its own complete install: CP, DP, Postgres, Redis, ClickHouse. Upgrades run dev first, then prod. N times the backing services to monitor. |
| Cost Delta | high | N complete LangSmith stacks (CP + DP + backing services). If the LLM auth proxy lives in a different cloud or region from the data planes, expect cross-cloud egress costs and added latency. |
| Compliance Fit | high | Strongest enterprise change-management separation. Independent CP, DP, config, and SSO scopes per env. Per-env data residency is possible. Fits regulators that require fully independent stacks. |
| Complexity | high | N complete LangSmith stacks to operate. Per-env ingress, DNS, SSO, and identity. A cross-cloud LLM auth proxy adds proxy latency and firewall coordination on top of that. |
| Failure Blast Radius | medium | An outage stays in one env. CP, DP, and backing-service failures are all isolated to the env that owns them; other envs keep running. |
| Skill Requirements | high | Multi-cluster Kubernetes and cross-cloud networking if LLM providers or gateways live in a different cloud. Multiple independent LangSmith installs to operate. |
| Time to First Trace | medium | 4 to 8 weeks. Second environment is faster once the first install is stable and the Terraform modules are validated. |
| Scale Ceiling | high | Scales independently per env. Further horizontal splits are straightforward once the pattern is proven. |
Related
self-hosted-shared-cp-per-env, self-hosted-single-cluster, self-hosted-multi-dp-per-namespace, self-hosted-cross-cluster
Self-Hosted, Data Plane per Namespace
self-hosted-multi-dp-per-namespace | status: Supported | version 1 | verified 2026-04-26
One shared control plane, one data plane per Kubernetes namespace within a shared cluster. Agent pods in each namespace cannot access control-plane secrets. Customers use namespaces to separate teams, agents, or use cases without paying the cost of a cluster per tenant.
Products
- LangSmith (control plane, data plane)
- LangGraph Deployments
Control plane
- Location: Customer Env | cloud: AWS | region: us-east-1 | cluster: shared-platform-eks
Data planes
- Multiplicity: Per Namespace
| Name | Location | Cloud | Region | Cluster | Namespace | Isolation |
|---|---|---|---|---|---|---|
| tenant-a | Customer Env | AWS | us-east-1 | shared-platform-eks | ls-dp-tenant-a | Kubernetes Namespace |
| tenant-b | Customer Env | AWS | us-east-1 | shared-platform-eks | ls-dp-tenant-b | Kubernetes Namespace |
Isolation and residency
- Strongest boundary: Kubernetes Namespace
- Network boundary: Kubernetes Namespace
- Data residency scope: Per Account
- Traces leave customer env: no
- LLM traffic leaves customer env: yes
Flows and delivery
- CP to DP: mTLS over Private Link
- DP to LLM: Egress Gateway
- User to CP: SSO over Public Internet
- Ingress: Kubernetes Ingress (controller of your choice)
- IaC: Terraform + Helm
- Upgrade cadence: Monthly
Assessment
| Category | Score | Notes |
|---|---|---|
| Operational Burden | high | One upgrade path per data plane, locked to the same chart version as the control plane (cluster-scoped CRD). Namespace-scoped RBAC, quotas, network policies, and the LangSmith operator's WATCH_NAMESPACE all need explicit platform-team ownership. |
| Cost Delta | medium | Shared cluster amortizes node cost. Per-namespace data planes add pod-level overhead and egress gateway capacity. |
| Compliance Fit | high | Namespace isolation plus per-namespace data locality and separate secrets satisfy most financial-services and regulated-industry tenant-isolation requirements. |
| Complexity | high | Three valid routing patterns (ALB-per-namespace, Envoy Gateway, or Istio), per-namespace IRSA and service-account setup, host-backend RBAC into each DP namespace, and a cluster-scoped CRD that requires version-locked chart upgrades across all releases. |
| Failure Blast Radius | low | A single data plane failure is contained to that namespace. Shared cluster control-plane incidents are the one common-mode risk. |
| Skill Requirements | high | Platform team fluent in Kubernetes multi-tenancy, ALB/Envoy/Istio ingress, IAM (IRSA on AWS, workload identity on GCP/Azure), GitOps, and per-namespace secret distribution. |
| Time to First Trace | medium | 4 to 8 weeks for platform build-out. Each subsequent namespace onboards in days once the platform is stable. |
| Scale Ceiling | high | Scales to dozens of namespaces and tens of millions of traces per day. Past that, expect to split clusters. |
Related
self-hosted-single-cluster, self-hosted-cross-cluster
Self-Hosted, Cross-Cluster (Remote Data Planes)
self-hosted-cross-cluster | status: Situational | version 1 | verified 2026-04-26
Control plane in one cluster; data planes in separate clusters, often separate accounts. Agent servers call back to the CP for auth. Cross-origin breaks the UI (Assistants, Threads, Crons, Studio, HITL) when CP and DPs live on different domains; a same-origin reverse proxy or colocation avoids it.
Products
- LangSmith (control plane, data plane)
- LangGraph Deployments
Control plane
- Location: Customer Env | cloud: AWS | region: us-east-1 | cluster: cp-account-eks
Data planes
- Multiplicity: Many per Cluster
| Name | Location | Cloud | Region | Cluster | Namespace | Isolation |
|---|---|---|---|---|---|---|
| dp-account-a | Customer Env | AWS | us-east-1 | dp-account-a-eks | langgraph | Cloud Account / Subscription |
| dp-account-b | Customer Env | AWS | us-east-1 | dp-account-b-eks | langgraph | Cloud Account / Subscription |
Isolation and residency
- Strongest boundary: Cloud Account / Subscription
- Network boundary: Cloud Account / Subscription
- Data residency scope: Per Account
- Traces leave customer env: no
- LLM traffic leaves customer env: yes
Flows and delivery
- CP to DP: mTLS over Private Link
- DP to LLM: Direct Egress
- User to CP: SSO over Public Internet
- Ingress: Istio
- IaC: Terraform + Helm
- Upgrade cadence: Monthly
Assessment
| Category | Score | Notes |
|---|---|---|
| Operational Burden | high | Each data plane is a separate Helm release, often in a separate cloud account and on its own domain. Upgrades fan out across accounts. The current cross-origin workarounds add a component (reverse proxy or sidecar) to operate. |
| Cost Delta | high | Per-account infrastructure duplication plus cross-account networking (PrivateLink or Transit Gateway). LLM egress billed per account. |
| Compliance Fit | high | Account-level isolation satisfies the strongest tenant-isolation requirements short of air-gap. Blast-radius and billing attribution are both clean. |
| Complexity | high | Cross-origin cookies, ingress gateway routes, per-cluster domains, and cross-account IAM. Routing workarounds require either a same-origin reverse proxy or an auth sidecar that injects API keys at the data plane ingress. |
| Failure Blast Radius | low | A single data plane outage affects only that workload. Control-plane failure affects auth validation for all data planes. |
| Skill Requirements | high | Multi-account cloud operations, service-mesh or ingress-gateway fluency, cross-account networking, plus familiarity with the current cross-origin routing workarounds. |
| Time to First Trace | high | 8+ weeks typical. Cross-account networking approvals dominate the timeline. First data plane slower than subsequent ones. |
| Scale Ceiling | high | Scales horizontally by adding data-plane accounts. Control plane is the eventual bottleneck for auth validation throughput. |
Related
self-hosted-shared-cp-per-env, self-hosted-stack-per-env, self-hosted-multi-dp-per-namespace
Air-Gapped
airgapped | status: Supported | version 1 | verified 2026-04-26
Control plane and data plane run in a fully air-gapped customer cluster with no outbound internet. Images mirror to an internal registry; Helm charts deploy from internal sources. The right model for regulated industries that cannot egress to LangChain. No beacon means license metering is contractual; the customer reports usage on the agreed cadence.
Products
- LangSmith (control plane, data plane)
- LangGraph Deployments
Control plane
- Location: Air-Gapped | cloud: On-Prem | cluster: airgap-cp
Data planes
- Multiplicity: Single
| Name | Location | Cloud | Region | Cluster | Namespace | Isolation |
|---|---|---|---|---|---|---|
| airgap-dp | Air-Gapped | On-Prem | n/a | airgap-cp | langsmith | Air-Gap |
Isolation and residency
- Strongest boundary: Air-Gap
- Network boundary: Air-Gap
- Data residency scope: Air-Gapped
- Traces leave customer env: no
- LLM traffic leaves customer env: no
Flows and delivery
- CP to DP: Air-Gap Ingest (manual or scheduled)
- DP to LLM: On-Prem LLM Gateway
- User to CP: VPN Only
- Ingress: Kubernetes Ingress (controller of your choice)
- IaC: Terraform + Helm
- Upgrade cadence: Quarterly
- Image sync: Skopeo, crane, or regclient
- Private registry: Harbor, Artifactory, Quay, or equivalent
- SSO: Custom OIDC | SCIM: no
Assessment
| Category | Score | Notes |
|---|---|---|
| Operational Burden | high | Image and chart mirroring pipelines plus license proxy all need active ownership. New releases require coordinated mirror cycles before they can be applied. Customer also owns capturing and reporting usage metrics to LangChain on the contractual cadence, since no automated beacon is possible. |
| Cost Delta | high | Internal registry (Harbor or equivalent), Git infrastructure, and on-prem LLM gateway capacity. Telemetry and remote support paths cost extra engineering time. |
| Compliance Fit | high | Satisfies the strongest data-sovereignty and air-gap mandates. Standard pattern for regulated financial services, defense, and healthcare. Order paperwork needs an addendum for manual usage reporting in lieu of the standard telemetry beacon. |
| Complexity | high | Internal-mirror discipline, chart-dependency management, and on-prem LLM gateway integration. Coordinating image-and-chart mirror cycles with cluster reconciliation is the main source of upgrade friction. |
| Failure Blast Radius | high | All tenants in one air-gapped deployment share a single cluster. No cross-region failover to LangChain infrastructure. |
| Skill Requirements | high | Platform team comfortable operating regulated Kubernetes clusters, maintaining internal image and chart mirrors, and integrating with an on-prem LLM gateway. |
| Time to First Trace | high | 8+ weeks. Air-gap adds a week to nearly every phase, especially image mirroring and license-proxy validation. |
| Scale Ceiling | medium | Vertical and horizontal scale inside the air-gapped environment is fine. Multi-region within air-gap requires replicating the entire mirror/Git/gateway stack. |
Related
self-hosted-single-cluster, self-hosted-stack-per-env
Fleet, Headless (No LangSmith)
fleet-headless-no-langsmith | status: In Development | version 1 | verified 2026-04-23
Fleet runtime operates standalone in the customer's environment with no LangSmith control plane and no LangSmith UI. Customers build their own front-end against Fleet APIs. Emerging topology that follows from Fleet decoupling; not yet in production.
Products
- Fleet (Headless)
Control plane
- Location: Absent | instances: 0 (no control plane)
Data planes
- Multiplicity: Single
| Name | Location | Cloud | Region | Cluster | Namespace | Isolation |
|---|---|---|---|---|---|---|
| fleet-runtime | Customer Env | AWS | us-west-2 | customer-eks | fleet | Kubernetes Cluster |
Isolation and residency
- Strongest boundary: Kubernetes Cluster
- Network boundary: VPC / VNet
- Data residency scope: Per Account
- Traces leave customer env: no
- LLM traffic leaves customer env: yes
Flows and delivery
- CP to DP: Not Applicable
- DP to LLM: Direct Egress
- User to CP: Private DNS
- Ingress: Kubernetes Ingress (controller of your choice)
- IaC: Terraform + Helm
- Upgrade cadence: Monthly
Assessment
| Category | Score | Notes |
|---|---|---|
| Operational Burden | medium | Customer operates the Fleet runtime and its backing services. No LangSmith stack to maintain, no trace-ingestion pipeline. Upgrade discipline still matters for Fleet itself. |
| Cost Delta | medium | Smaller footprint than a full LangSmith install. Custom UI hosting and ongoing front-end engineering are the dominant costs. |
| Compliance Fit | high | Zero LangChain-hosted components. All agent state and traffic stay in the customer account. Matches profiles that reject SaaS observability on principle. |
| Complexity | high | Customer owns all UX. No LangChain-provided debugging tools (no Studio, no traces UI). Fleet auth decoupling and API contracts are still evolving. |
| Failure Blast Radius | high | A Fleet runtime outage stops all agent traffic. No LangSmith tier to fall back to for trace replay or rollback. |
| Skill Requirements | high | Production Kubernetes plus full-stack front-end engineering. Teams must be comfortable building and operating a bespoke UI on Fleet APIs. |
| Time to First Trace | high | Not applicable in the traditional sense. Time-to-first-agent-request depends on how much custom UI is in scope. 8+ weeks in practice. |
| Scale Ceiling | high | Fleet runtime scales independently; the UI is the customer's problem. No shared infrastructure bottleneck. |
Related