LangChain Enterprise Hub

Generated from src/lib/topologies/data.ts and content/topologies/_overview.md by scripts/gen-topologies-doc.mjs. Do not edit this file by hand. Edit the data or the overview partial and run npm run gen-topologies.

How to read this. This catalog is meant to be walked through with your LangChain PS contact. The rubric scores reflect typical enterprise customers; your specifics may differ.

This catalog enumerates LangSmith deployment topologies, ordered by recommendation preference. Each entry is scored against a common rubric so they can be compared on the dimensions that matter for choosing between them.

How the platform is split

The LangChain platform ships as one LangSmith application, with optional product capabilities enabled on top. A topology is described by where each piece runs and which trust boundary sits between the pieces.

LangSmith (base platform). Always present. Owns the web UI, SSO and API-key auth, workspaces, organizations, permissions, datasets, prompts, evaluation configuration, billing, and the trace pipeline (ingest API, ClickHouse, and blob storage). The control plane and data plane ship together in the langsmith Helm chart.
- Control plane (CP). Manages agent deployments through the control-plane UI (the Deployments list and Studio, embedded in LangSmith) and the control-plane APIs: create and update Agent Servers, store deployment state, and view build/server logs and metrics. The data-plane listener polls it for changes; the control plane never connects to the data plane directly. In a self-hosted install this is the host-backend service, with the UI rendered by the LangSmith frontend.
- Data plane (DP). Runs the Agent Servers, the listener that pulls deployment config from the control plane over outbound HTTPS, and per-deployment backing services.
LangSmith Deployment. The agent-deployment runtime (formerly LangGraph Platform). Runs deployed LangGraph apps as long-lived Agent Servers with durable run state, cron, and a worker pool. Managed from the LangSmith control-plane UI (the Deployments list and Studio); the deployed Agent Servers themselves expose APIs and streaming endpoints rather than serving that UI. Optional, enabled with config.deployment.enabled. Legacy langgraphPlatform names persist in the chart and CRDs.
Fleet. A no-code platform for building and managing agents (Agent Builder is now LangSmith Fleet). Enabled on the base LangSmith platform, not on LangSmith Deployment. Provisions its own api-server, queue, tool-server, and trigger-server. Requires LangSmith Self-Hosted v0.13 or later.
Capability agents. Smaller features enabled alongside the base platform, each with its own api-server and queue:
- Insights. Automatic trace pattern and failure-mode analysis.
- LangSmith Chat (formerly Polly). In-app, context-aware chat over your projects, traces, threads, prompts, and datasets.
Managed Deep Agents. A separate hosted runtime for deep agents (private beta, LangSmith Cloud US only). A Managed Deep Agent is not a LangSmith Deployment and is not a Fleet surface; it provisions its own resources.

Coupling rules (today)

LangSmith (base) is always present; its control plane and data plane ship together in the langsmith chart.
LangSmith Deployment is independently optional. Agent Servers can also run standalone, with no control plane at all.
Fleet, Insights, and LangSmith Chat are independent capabilities on the base platform. Each runs as its own service set; the standalone deployment model for them requires v0.15 or later. None of them run on LangSmith Deployment. (An earlier model ran these as deployed agents on LangSmith Deployment; you may still see that on older installs.)
Managed Deep Agents does not require Fleet or LangSmith Deployment.

Where things run

The table below uses these short labels for environments. Each topology's detail page calls out the specific cloud and region.

LangChain SaaS. Operated by LangChain at smith.langchain.com. The customer authenticates and uses it; LangChain runs the infrastructure.
BYOC. Bring-your-own-cloud. The cloud account belongs to the customer; LangChain operates the LangSmith stack inside it.
Customer Env. Customer-owned infrastructure, operated by the customer. A cloud account they control (AWS, Azure, GCP, OCI), an on-prem datacenter, or a hybrid of the two. The cloud field on each topology says which.
Air-Gapped. Customer environment with no egress to LangChain. Images, charts, and licenses move in via internal mirrors.

Trace residency follows where LangSmith runs and how tracing is configured, not a fixed rule. SaaS sends traces to LangChain; self-hosted keeps them in your environment; hybrid uploads traces to the SaaS control plane while agents run in your cloud.

Topologies at a glance

Topology	Control plane	Data planes	Best for	Status
SaaS Cloud	1 · LangChain SaaS	1 · LangChain SaaS	Fastest time to first trace; lowest operational commitment.	Recommended
Self-Hosted, Single Cluster	1 · Customer Env	1 · Customer Env	Full platform in one cluster; you own every layer.	Recommended
Self-Hosted, Full Stack per Environment	N · Customer Env (per env)	N · Customer Env (per env)	Hard env isolation, separate UIs per environment.	Supported
Air-Gapped	1 · Air-Gapped	1 · Air-Gapped	Fully isolated deployment, no egress to LangChain.	Supported
Hybrid: SaaS Control Plane, Customer Data Plane	1 · LangChain SaaS	1 · Customer Env	Trace payloads stay in your cloud; LangChain owns the UI.	Situational
Self-Hosted, Shared CP with Per-Env Data Planes	1 · Customer Env	N · Customer Env (per env)	Clean dev / stage / prod boundaries with one shared UI.	Situational
Self-Hosted, Data Plane per Namespace	1 · Customer Env	N · Customer Env (per namespace)	Namespace-level blast radius for teams, agents, or use cases.	Situational
Self-Hosted, Cross-Cluster (Remote Data Planes)	1 · Customer Env	N · Customer Env (many per cluster)	Remote data planes in separate clusters or cloud accounts.	Situational
BYOC, Full Stack	1 · BYOC	1 · BYOC	Dedicated stack in your cloud; LangChain operates.	In Development
BYOC Control Plane, On-Prem Data Plane	1 · BYOC	1 · Customer Env	On-prem agents and traces with a LangChain-operated control plane.	In Development
Fleet, Headless (FaaS)	0	1 · Customer Env	Fleet runtime standalone; you build the UI.	In Development

Detail

SaaS Cloud

saas-cloud | status: Recommended

LangChain operates the full stack. Customers connect via the public smith.langchain.com endpoint with SSO and tenant-scoped workspaces. Fastest path to first trace, lowest operational burden, and first to receive new platform features, at the cost of sending trace payloads to LangChain infrastructure.

Products

LangSmith (control plane, data plane)
Fleet
LangSmith Deployment

Control plane

Location: LangChain SaaS | region: US (us-east-1) or EU (eu-central-1); APAC in development

Data planes

Multiplicity: Single

Name	Location	Cloud	Region	Cluster	Namespace	Isolation
langchain-saas-dp	LangChain SaaS	LangChain SaaS	US or EU	n/a	n/a	None

Isolation and residency

Strongest boundary: None
Network boundary: None
Data residency scope: None
Traces leave customer env: yes
LLM traffic leaves customer env: yes

Flows and delivery

CP to DP: Not Applicable
DP to LLM: Direct Egress
User to CP: SSO over Public Internet
Ingress: Not Applicable
IaC: Not Applicable
Upgrade cadence: Continuous

Assessment

Category	Score	Notes
Operational Burden	low	Customer operates nothing. Workspace admin configures SSO/SCIM, API keys, workspace membership, and retention policies.
Cost Delta	low	Standard LangSmith tier. No infra cost for the customer. Egress and LLM costs are whatever the customer's agents generate.
Compliance Fit	low	Traces and metadata are stored in LangChain infrastructure. Does not satisfy data-residency or air-gap mandates. SOC 2 and workspace isolation cover many enterprise controls but not sovereign-data requirements.
Complexity	low	Single public endpoint, workspace-based logical tenancy, no customer-owned networking.
Failure Blast Radius	high	Shared-tenant: an incident affects all customers. LangChain operates SaaS as its highest-priority service: dedicated reliability engineering, incident response, and a public status page.
Skill Requirements	low	Workspace admin skills only. No Kubernetes, Helm, or cloud networking.
Time to First Trace	low	Under an hour once a workspace is provisioned and an API key is in hand.
Scale Ceiling	high	LangChain scales the platform. Workspace-level rate limits apply, but the ceiling is well above most customer workloads.

Related

byoc-full, hybrid-single-dp

Self-Hosted, Single Cluster

self-hosted-single-cluster | status: Recommended

Control plane and data plane both run inside a single customer-operated Kubernetes cluster. One workspace, one upgrade path, one set of backing services. The default starting point for self-hosted customers who do not need per-team or per-environment isolation.

Products

LangSmith (control plane, data plane)
LangSmith Deployment

Control plane

Location: Customer Env | cloud: AWS | region: us-east-1 | cluster: customer-eks

Data planes

Multiplicity: Single

Name	Location	Cloud	Region	Cluster	Namespace	Isolation
production-dp	Customer Env	AWS	us-east-1	customer-eks	langsmith	Kubernetes Namespace

Isolation and residency

Strongest boundary: Kubernetes Cluster
Network boundary: VPC / VNet
Data residency scope: Per Account
Traces leave customer env: no
LLM traffic leaves customer env: yes

Flows and delivery

CP to DP: In-cluster (typically same namespace)
DP to LLM: Direct Egress
User to CP: SSO over Public Internet
Ingress: Kubernetes Ingress (controller of your choice)
IaC: Terraform + Helm
Upgrade cadence: Monthly

Assessment

Category	Score	Notes
Operational Burden	medium	One Helm release to maintain, one Postgres, one ClickHouse, one Redis. Upgrades are a single Helm bump plus migrations. ClickHouse disk growth is the most common operational surprise.
Cost Delta	medium	EKS (or equivalent) node group, managed Postgres, managed Redis, and ClickHouse storage. License is the standard self-hosted tier.
Compliance Fit	medium	All trace payloads stay in the customer account. Fits most enterprise policies; not sufficient for air-gap or per-team data-sovereignty mandates.
Complexity	low	Everything lives in one cluster. Debugging and log aggregation are local. No cross-cluster networking.
Failure Blast Radius	high	A cluster outage takes down both control plane and data plane. All workspaces affected simultaneously.
Skill Requirements	medium	Production Kubernetes and Helm. Familiarity with the backing-service set (Postgres, Redis, ClickHouse). No multi-cluster or GitOps required.
Time to First Trace	medium	2 to 4 weeks in a typical enterprise, driven by network and IAM approvals more than install time.
Scale Ceiling	medium	Scales vertically and horizontally inside one cluster. Teams running tens of millions of traces per day typically split into per-env or per-team topologies.

Self-Hosted, Full Stack per Environment

self-hosted-stack-per-env | status: Supported

Each environment (dev, staging, prod) runs a complete LangSmith stack: its own control plane, data plane, and backing services in its own cluster. No shared UI; teams log into each env separately. The hardest isolation available short of air-gap, at the cost of duplicated backing services and N independent change-management windows.

Products

LangSmith (control plane, data plane)
LangSmith Deployment

Control plane

Location: Customer Env | cloud: Azure | region: eastus | instances: 2 (one per environment)

Data planes

Multiplicity: Per Environment

Name	Location	Cloud	Region	Cluster	Namespace	Isolation
dev-dp	Customer Env	Azure	eastus	dev-aks	langsmith	Kubernetes Cluster
prod-dp	Customer Env	Azure	eastus	prod-aks	langsmith	Kubernetes Cluster

Isolation and residency

Strongest boundary: Kubernetes Cluster
Network boundary: VPC / VNet
Data residency scope: Per Region
Traces leave customer env: no
LLM traffic leaves customer env: yes

Flows and delivery

CP to DP: In-cluster (each env runs its own CP and DP in the same cluster)
DP to LLM: Direct Egress
User to CP: SSO over Public Internet
Ingress: Kubernetes Ingress (controller of your choice)
IaC: Terraform + Helm
Upgrade cadence: Monthly

Assessment

Category	Score	Notes
Operational Burden	high	Each env has its own complete install: CP, DP, Postgres, Redis, ClickHouse. Upgrades run dev first, then prod. N times the backing services to monitor.
Cost Delta	high	N complete LangSmith stacks (CP + DP + backing services), each with its own ClickHouse, Postgres, and Redis. If a shared LLM auth proxy is added in a different cloud or region from the data planes, expect cross-cloud egress costs and added latency.
Compliance Fit	high	Strongest enterprise change-management separation. Independent CP, DP, config, and SSO scopes per env. Per-env data residency is possible. Fits regulators that require fully independent stacks.
Complexity	high	N complete LangSmith stacks to operate. Per-env ingress, DNS, SSO, and identity. A cross-cloud LLM auth proxy adds proxy latency and firewall coordination on top of that.
Failure Blast Radius	medium	An outage stays in one env. CP, DP, and backing-service failures are all isolated to the env that owns them; other envs keep running.
Skill Requirements	high	Multi-cluster Kubernetes and cross-cloud networking if LLM providers or gateways live in a different cloud. Multiple independent LangSmith installs to operate.
Time to First Trace	medium	4 to 8 weeks. Second environment is faster once the first install is stable and the Terraform modules are validated.
Scale Ceiling	high	Scales independently per env. Further horizontal splits are straightforward once the pattern is proven.

Air-Gapped

airgapped | status: Supported

Control plane and data plane run in a fully air-gapped customer cluster with no outbound internet. Images mirror to an internal registry; Helm charts deploy from internal sources. The right model for regulated industries that cannot egress to LangChain. No beacon means license metering is contractual; the customer reports usage on the agreed cadence.

Products

LangSmith (control plane, data plane)
LangSmith Deployment

Control plane

Location: Air-Gapped | cloud: On-Prem | cluster: airgap-cp

Data planes

Multiplicity: Single

Name	Location	Cloud	Region	Cluster	Namespace	Isolation
airgap-dp	Air-Gapped	On-Prem	n/a	airgap-cp	langsmith	Air-Gap

Isolation and residency

Strongest boundary: Air-Gap
Network boundary: Air-Gap
Data residency scope: Air-Gapped
Traces leave customer env: no
LLM traffic leaves customer env: no

Flows and delivery

CP to DP: In-cluster (same cluster, typically same namespace)
DP to LLM: On-Prem LLM Gateway
User to CP: VPN Only
Ingress: Kubernetes Ingress (controller of your choice)
IaC: Terraform + Helm
Upgrade cadence: Quarterly
Image sync: Skopeo, crane, or regclient (manual or scheduled)
Private registry: Harbor, Artifactory, Quay, or equivalent
SSO: Custom OIDC | SCIM: no

Assessment

Category	Score	Notes
Operational Burden	high	Image and chart mirroring pipelines plus license proxy all need active ownership. New releases require coordinated mirror cycles before they can be applied. Customer also owns capturing and reporting usage metrics to LangChain on the contractual cadence, since no automated beacon is possible.
Cost Delta	high	Internal registry (Harbor or equivalent), Git infrastructure, and on-prem LLM gateway capacity. Telemetry and remote support paths cost extra engineering time.
Compliance Fit	high	Satisfies the strongest data-sovereignty and air-gap mandates. Standard pattern for regulated financial services, defense, and healthcare. Order paperwork needs an addendum for manual usage reporting in lieu of the standard telemetry beacon.
Complexity	high	Internal-mirror discipline, chart-dependency management, and on-prem LLM gateway integration. Coordinating image-and-chart mirror cycles with cluster reconciliation is the main source of upgrade friction.
Failure Blast Radius	high	All tenants in one air-gapped deployment share a single cluster. No cross-region failover to LangChain infrastructure.
Skill Requirements	high	Platform team comfortable operating regulated Kubernetes clusters, maintaining internal image and chart mirrors, and integrating with an on-prem LLM gateway.
Time to First Trace	high	8+ weeks. Air-gap adds a week to nearly every phase, especially image mirroring and license-proxy validation.
Scale Ceiling	medium	Vertical and horizontal scale inside the air-gapped environment is fine. Multi-region within air-gap requires replicating the entire mirror/Git/gateway stack.

Hybrid: SaaS Control Plane, Customer Data Plane

hybrid-single-dp | status: Situational

LangChain operates the control plane (SaaS); the customer operates a single self-hosted data plane inside their own cloud account. Agent code executes in the customer's environment with access to internal APIs and approved LLM providers; trace data flows back to the SaaS control plane for storage and analysis.

Products

LangSmith (control plane, data plane)
LangSmith Deployment

Control plane

Location: LangChain SaaS | region: US (us-east-1) or EU (eu-central-1); APAC in development

Data planes

Multiplicity: Single

Name	Location	Cloud	Region	Cluster	Namespace	Isolation
production-dp	Customer Env	AWS	us-east-1	customer-eks-auto	langsmith	VPC / VNet

Isolation and residency

Strongest boundary: VPC / VNet
Network boundary: VPC / VNet
Data residency scope: Per Region
Traces leave customer env: yes
LLM traffic leaves customer env: yes

Flows and delivery

CP to DP: DP-initiated outbound (listener pulls from CP)
DP to LLM: Direct Egress
User to CP: SSO over Public Internet
Ingress: Kubernetes Ingress (controller of your choice)
IaC: Terraform + Helm
Upgrade cadence: Monthly

Assessment

Category	Score	Notes
Operational Burden	medium	Customer operates the data plane Helm release, the LangGraph Deployments checkpointer DB, and networking; control plane operations (including trace ingest and storage) are LangChain's. Upgrades are monthly Helm bumps.
Cost Delta	medium	EKS node group plus the LangGraph Deployments checkpointer database (Postgres or MongoDB). NAT gateway egress for trace upload to SaaS and LLM calls. License cost is standard LangSmith tier.
Compliance Fit	medium	Agent code and customer LLM traffic stay in the VPC; trace payloads (prompts, completions, tool I/O) are uploaded to the SaaS control plane. Fits customers whose primary concern is agents reaching internal APIs from a controlled network. Not sufficient when trace contents themselves must stay in the customer environment; choose a self-hosted topology for that.
Complexity	medium	Three networking paths to reason about (browser-to-SaaS, listener-to-CP, listener-to-agent). Health checks originate inside the data plane and route through NAT, which surprises teams expecting ingress from LangChain.
Failure Blast Radius	medium	Data plane outage stops traffic for the whole deployment; control plane outage affects only observability and deploys, not request serving.
Skill Requirements	medium	Production Kubernetes fluency, Helm, and cloud networking (VPC, NAT, ingress). Single-cluster scope, no GitOps required.
Time to First Trace	low	Greenfield install is typically 1 to 2 weeks once networking is approved.
Scale Ceiling	medium	Single data plane scales vertically and horizontally within one cluster. Past tens of millions of traces per day, expect to split per env or region.

Self-Hosted, Shared CP with Per-Env Data Planes

self-hosted-shared-cp-per-env | status: Situational

One shared control plane fans out to per-environment data planes (dev, staging, prod). All trace data, prompts, and evaluations roll up into a single UI while runtime workloads stay isolated in their own clusters. The shape most teams reach for when they want clean env boundaries without separate UIs to log into.

Products

LangSmith (control plane, data plane)
LangSmith Deployment

Control plane

Location: Customer Env | cloud: Azure | region: eastus | cluster: shared-cp

Data planes

Multiplicity: Per Environment

Name	Location	Cloud	Region	Cluster	Namespace	Isolation
dev-dp	Customer Env	Azure	eastus	dev-aks	langsmith	Kubernetes Cluster
prod-dp	Customer Env	Azure	eastus	prod-aks	langsmith	Kubernetes Cluster

Isolation and residency

Strongest boundary: Kubernetes Cluster
Network boundary: VPC / VNet
Data residency scope: Per Region
Traces leave customer env: no
LLM traffic leaves customer env: yes

Flows and delivery

CP to DP: Cross-cluster private networking (VPC peering or Private Link)
DP to LLM: Direct Egress
User to CP: SSO over Public Internet
Ingress: Kubernetes Ingress (controller of your choice)
IaC: Terraform + Helm
Upgrade cadence: Monthly

Assessment

Category	Score	Notes
Operational Burden	medium	One CP install plus a DP install per env. CP backing services run once; each DP brings its own Postgres, Redis, and ClickHouse. Upgrades roll DP-by-DP after the CP.
Cost Delta	medium	One set of CP infra plus N sets of DP infra (one per env). Cheaper than a full stack-per-env split, at the cost of a shared CP change-management window.
Compliance Fit	medium	Per-env DP isolation covers runtime separation and per-env data residency. Shared CP means org-level config and audit affect all envs together; not a fit when regulators require fully independent stacks per env.
Complexity	high	Networking from the shared CP to each DP cluster, per-env ingress and DNS, and CP-to-DP auth across clusters. Cross-cluster identity is the main source of subtle breakage.
Failure Blast Radius	medium	A DP outage takes down one env; the shared CP and other envs stay up. A CP outage degrades the UI and platform operations across all envs, but trace ingestion continues.
Skill Requirements	high	Multi-cluster Kubernetes, cross-cluster networking, and identity management for CP-to-DP auth.
Time to First Trace	medium	4 to 6 weeks. After the CP is stable, adding each new env DP is incremental.
Scale Ceiling	high	Scales horizontally by adding DPs. CP is the shared resource; watch its backing services as DP count grows.

Self-Hosted, Data Plane per Namespace

self-hosted-multi-dp-per-namespace | status: Situational

One shared control plane, one data plane per Kubernetes namespace within a shared cluster. Agent pods in each namespace cannot access control-plane secrets. Customers use namespaces to separate teams, agents, or use cases without paying the cost of a cluster per tenant.

Products

LangSmith (control plane, data plane)
LangSmith Deployment

Control plane

Location: Customer Env | cloud: AWS | region: us-east-1 | cluster: shared-platform-eks

Data planes

Multiplicity: Per Namespace

Name	Location	Cloud	Region	Cluster	Namespace	Isolation
tenant-a	Customer Env	AWS	us-east-1	shared-platform-eks	ls-dp-tenant-a	Kubernetes Namespace
tenant-b	Customer Env	AWS	us-east-1	shared-platform-eks	ls-dp-tenant-b	Kubernetes Namespace

Isolation and residency

Strongest boundary: Kubernetes Namespace
Network boundary: Kubernetes Namespace
Data residency scope: Per Account
Traces leave customer env: no
LLM traffic leaves customer env: yes

Flows and delivery

CP to DP: In-cluster, namespace-to-namespace (NetworkPolicy + service mesh mTLS)
DP to LLM: Egress Gateway
User to CP: SSO over Public Internet
Ingress: ALB-per-namespace, Envoy Gateway, or Istio (pick one)
IaC: Terraform + Helm
Upgrade cadence: Monthly

Assessment

Category	Score	Notes
Operational Burden	high	One upgrade path per data plane, locked to the same chart version as the control plane (cluster-scoped CRD). Namespace-scoped RBAC, quotas, network policies, and the LangSmith operator's WATCH_NAMESPACE all need explicit platform-team ownership.
Cost Delta	medium	Shared cluster amortizes node cost. Per-namespace data planes add pod-level overhead and egress gateway capacity.
Compliance Fit	high	Namespace isolation plus per-namespace data locality and separate secrets satisfy most financial-services and regulated-industry tenant-isolation requirements.
Complexity	high	Three valid routing patterns (ALB-per-namespace, Envoy Gateway, or Istio), per-namespace IRSA and service-account setup, host-backend RBAC into each DP namespace, and a cluster-scoped CRD that requires version-locked chart upgrades across all releases.
Failure Blast Radius	low	A single data plane failure is contained to that namespace. Shared cluster control-plane incidents are the one common-mode risk.
Skill Requirements	high	Platform team fluent in Kubernetes multi-tenancy, ALB/Envoy/Istio ingress, IAM (IRSA on AWS, workload identity on GCP/Azure), GitOps, and per-namespace secret distribution.
Time to First Trace	medium	4 to 8 weeks for platform build-out. Each subsequent namespace onboards in days once the platform is stable.
Scale Ceiling	high	Scales to dozens of namespaces and tens of millions of traces per day. Past that, expect to split clusters.

Self-Hosted, Cross-Cluster (Remote Data Planes)

self-hosted-cross-cluster | status: Situational

Control plane in one cluster; data planes in separate clusters, often separate accounts. Agent servers call back to the CP for auth. Cross-origin breaks the UI (Assistants, Threads, Crons, Studio, HITL) when CP and DPs live on different domains; a same-origin reverse proxy or colocation avoids it.

Products

LangSmith (control plane, data plane)
LangSmith Deployment

Control plane

Location: Customer Env | cloud: AWS | region: us-east-1 | cluster: cp-account-eks

Data planes

Multiplicity: Many per Cluster

Name	Location	Cloud	Region	Cluster	Namespace	Isolation
dp-account-a	Customer Env	AWS	us-east-1	dp-account-a-eks	langgraph	Cloud Account / Subscription
dp-account-b	Customer Env	AWS	us-east-1	dp-account-b-eks	langgraph	Cloud Account / Subscription

Isolation and residency

Strongest boundary: Cloud Account / Subscription
Network boundary: Cloud Account / Subscription
Data residency scope: Per Account
Traces leave customer env: no
LLM traffic leaves customer env: yes

Flows and delivery

CP to DP: mTLS over Private Link
DP to LLM: Direct Egress
User to CP: SSO over Public Internet
Ingress: Istio
IaC: Terraform + Helm
Upgrade cadence: Monthly

Assessment

Category	Score	Notes
Operational Burden	high	Each data plane is a separate Helm release, often in a separate cloud account and on its own domain. Upgrades fan out across accounts. The current cross-origin workarounds add a component (reverse proxy or sidecar) to operate.
Cost Delta	high	Per-account infrastructure duplication plus cross-account networking (PrivateLink or Transit Gateway). LLM egress billed per account.
Compliance Fit	high	Account-level isolation satisfies the strongest tenant-isolation requirements short of air-gap. Blast-radius and billing attribution are both clean.
Complexity	high	Cross-origin cookies, ingress gateway routes, per-cluster domains, and cross-account IAM. Routing workarounds require either a same-origin reverse proxy or an auth sidecar that injects API keys at the data plane ingress.
Failure Blast Radius	low	A single data plane outage affects only that workload. Control-plane failure affects auth validation for all data planes.
Skill Requirements	high	Multi-account cloud operations, service-mesh or ingress-gateway fluency, cross-account networking, plus familiarity with the current cross-origin routing workarounds.
Time to First Trace	high	8+ weeks typical. Cross-account networking approvals dominate the timeline. First data plane slower than subsequent ones.
Scale Ceiling	high	Scales horizontally by adding data-plane accounts. Control plane is the eventual bottleneck for auth validation throughput.

BYOC, Full Stack

byoc-full | status: In Development

LangChain operates the LangSmith stack inside a cloud account the customer owns. Control plane, data plane, and backing services all run in the customer's account. Trace payloads stay there; LangChain SREs operate via a scoped cross-account IAM role plus a Tailscale agent for break-glass access.

Products

LangSmith (control plane, data plane)
LangSmith Deployment

Control plane

Location: BYOC | cloud: AWS | region: us-east-1 | cluster: byoc-cp-eks

Data planes

Multiplicity: Single

Name	Location	Cloud	Region	Cluster	Namespace	Isolation
byoc-dp	BYOC	AWS	us-east-1	byoc-cp-eks	langsmith	Cloud Account / Subscription

Isolation and residency

Strongest boundary: Cloud Account / Subscription
Network boundary: Cloud Account / Subscription
Data residency scope: Per Account
Traces leave customer env: no
LLM traffic leaves customer env: yes

Flows and delivery

CP to DP: In-cluster (same cluster, typically same namespace)
DP to LLM: Direct Egress
User to CP: SSO over Public Internet
Ingress: Kubernetes Ingress (controller of your choice)
IaC: Terraform + Helm
Upgrade cadence: Monthly

Assessment

Category	Score	Notes
Operational Burden	low	LangChain operates the stack: upgrades, migrations, monitoring, and incident response. The customer owns the account and network boundary but does not run day-to-day operations.
Cost Delta	medium	Dedicated cloud infrastructure in the customer account plus a BYOC management fee. No multi-tenant amortization, so cost sits between SaaS and self-hosted.
Compliance Fit	high	Trace payloads never leave the customer account. LangChain access is via a scoped IAM role and Tailscale break-glass; CloudTrail in the customer account is the system of record for all operator activity. Fits most regulated-industry isolation requirements short of air-gap.
Complexity	medium	Split ownership: customer owns IAM, account, and network boundary; LangChain owns the application stack. Clear separation of concerns works well when both sides agree on the interface up front.
Failure Blast Radius	low	Single-tenant. A BYOC incident affects only that customer. No multi-tenant common-mode failure.
Skill Requirements	low	Customer team needs cloud-account and IAM skills only. No Kubernetes or Helm required on the customer side.
Time to First Trace	medium	2 to 6 weeks. Account provisioning, access-path setup, and security review drive the timeline more than software install.
Scale Ceiling	high	Vertical and horizontal scale inside the account is standard. Adding a second region is a follow-on motion rather than a replatform.

BYOC Control Plane, On-Prem Data Plane

byoc-cp-onprem-dp | status: In Development

LangChain operates the control plane inside a cloud account the customer owns; the data plane runs on the customer's on-prem Kubernetes. Agent code, customer data, and outbound LLM calls stay on-prem; trace data flows back over the private link to the BYOC control plane (and the ClickHouse it owns) in the customer cloud account.

Products

LangSmith (control plane, data plane)
LangSmith Deployment

Control plane

Location: BYOC | cloud: AWS | region: us-east-1 | cluster: byoc-cp-eks

Data planes

Multiplicity: Single

Name	Location	Cloud	Region	Cluster	Namespace	Isolation
onprem-dp	Customer Env	On-Prem	n/a	onprem-k8s	langsmith	Air-Gap

Isolation and residency

Strongest boundary: Air-Gap
Network boundary: Cloud Account / Subscription
Data residency scope: Per Account
Traces leave customer env: no
LLM traffic leaves customer env: no

Flows and delivery

CP to DP: VPN
DP to LLM: On-Prem LLM Gateway
User to CP: SSO over Public Internet
Ingress: Kubernetes Ingress (controller of your choice)
IaC: Terraform + Helm
Upgrade cadence: Monthly
Image sync: Skopeo, crane, or regclient
Private registry: Harbor, Artifactory, Quay, or equivalent

Assessment

Category	Score	Notes
Operational Burden	high	Split ownership: LangChain operates the cloud-side control plane, the customer operates the on-prem data plane. Coordinated upgrades and cross-environment debugging add overhead beyond either pure BYOC or pure self-hosted.
Cost Delta	high	Dedicated cloud infrastructure for the control plane plus on-prem compute and storage for the data plane. Cross-environment connectivity (VPN, Direct Connect, or ExpressRoute) adds recurring cost.
Compliance Fit	high	Agent execution and customer LLM traffic stay on-prem; the UI, metadata, and trace storage live in the customer-owned cloud account. Fits customers who need air-gap-adjacent isolation for the runtime but accept cloud storage for traces. Customers requiring trace data itself to stay on-prem should choose the air-gapped topology.
Complexity	high	Two environments to reason about: cloud-side control plane and on-prem data plane. Private connectivity, on-prem DNS, and on-prem LLM gateway integration all require customer-network expertise.
Failure Blast Radius	high	Single data plane; any on-prem outage stops agent traffic. Control plane outage affects auth validation and the UI but not agent execution already in flight.
Skill Requirements	high	On-prem Kubernetes expertise plus cloud-to-on-prem networking. The customer team must be comfortable operating Kubernetes outside a managed cloud provider.
Time to First Trace	high	8+ weeks typical. On-prem provisioning, private connectivity, and security reviews dominate the timeline; neither the control plane nor the data plane alone is quick.
Scale Ceiling	medium	Limited by on-prem capacity. Horizontal scale is possible but involves coordinating cloud control-plane changes with on-prem data-plane expansion.

Fleet, Headless (FaaS)

fleet-headless-no-langsmith | status: In Development

Headless Fleet (Fleet-as-a-Service). The Fleet runtime runs in the customer's environment exposing agent APIs; the customer builds their own UI on top. No LangSmith CP, no LangSmith UI. Strong fit for product teams who want LangChain agents inside their own product, not behind a separate LangChain UI. Several enterprise customers are pursuing this pattern.

Products

Fleet

Control plane

Location: Absent | instances: 0 (no control plane)

Data planes

Multiplicity: Single

Name	Location	Cloud	Region	Cluster	Namespace	Isolation
fleet-runtime	Customer Env	AWS	us-west-2	customer-eks	fleet	Kubernetes Cluster

Isolation and residency

Strongest boundary: Kubernetes Cluster
Network boundary: VPC / VNet
Data residency scope: Per Account
Traces leave customer env: no
LLM traffic leaves customer env: yes

Flows and delivery

CP to DP: Not Applicable
DP to LLM: Direct Egress
User to CP: Private DNS
Ingress: Kubernetes Ingress (controller of your choice)
IaC: Terraform + Helm
Upgrade cadence: Monthly

Assessment

Category	Score	Notes
Operational Burden	medium	Customer operates the Fleet runtime and its backing services. No LangSmith stack to maintain, no trace-ingestion pipeline. Upgrade discipline still matters for Fleet itself.
Cost Delta	medium	Smaller footprint than a full LangSmith install. Custom UI hosting and ongoing front-end engineering are the dominant costs.
Compliance Fit	high	Zero LangChain-hosted components. All agent state and traffic stay in the customer account. Strong fit for customers who want LangChain agent capabilities without any SaaS observability tier in the data path.
Complexity	high	Customer owns all UX, all observability surfaces, and all auth UX. No LangChain-provided Studio or traces UI; the value comes from embedding agent execution in your own product. API contracts are stabilizing as Fleet's decoupling work lands.
Failure Blast Radius	high	A Fleet runtime outage stops all agent traffic for the embedding application. No LangSmith tier in the data path means no separate trace-replay or rollback surface to fall back to; runtime resilience is the customer engineering team's responsibility.
Skill Requirements	high	Production Kubernetes plus full-stack front-end engineering. Teams must be comfortable building and operating a bespoke UI on Fleet APIs.
Time to First Trace	high	Not applicable in the traditional sense; there are no LangSmith traces to capture. Time-to-first-agent-request is set by how much custom UI is in scope; expect 8+ weeks for a production rollout that includes a polished customer-facing surface.
Scale Ceiling	high	Fleet runtime scales independently; the UI is the customer’s problem. No shared infrastructure bottleneck.

Related

self-hosted-single-cluster

Deployment Topologies

How the platform is split

Coupling rules (today)

Where things run

Topologies at a glance

Detail

SaaS Cloud

Self-Hosted, Single Cluster

Self-Hosted, Full Stack per Environment

Air-Gapped

Hybrid: SaaS Control Plane, Customer Data Plane

Self-Hosted, Shared CP with Per-Env Data Planes

Self-Hosted, Data Plane per Namespace

Self-Hosted, Cross-Cluster (Remote Data Planes)

BYOC, Full Stack

BYOC Control Plane, On-Prem Data Plane

Fleet, Headless (FaaS)