Quickstart
Get from zero to a running LangSmith instance on EKS in under an hour.
# 1 — Unzip the Terraform modules provided by your LangChain SA
unzip aws.zip
cd aws
# 2 — Generate terraform.tfvars interactively
# Re-running is safe — Enter accepts current values
make quickstart
# 3 — Store secrets in SSM Parameter Store
source infra/scripts/setup-env.sh
# 4 — Deploy infrastructure (~20–25 min)
make init
make plan
make apply
# 5 — Configure kubectl
make kubeconfig
# Verify nodes are ready
kubectl get nodes
# 6 — Deploy LangSmith
cd ../helm
source scripts/init-values.sh
bash scripts/deploy.sh
# 7 — Get the endpoint
kubectl get svc -n langsmith
LangSmith on AWSSelf-hosted deployment on EKS, managed with Terraform.
Architecture
AWS resources created
| Resource | Type | Purpose |
|---|---|---|
| VPC | aws_vpc | Isolated network — 5 private, 3 public subnets across AZs |
| NAT Gateway | aws_nat_gateway | Outbound internet access for private subnets |
| EKS Cluster | aws_eks_cluster | Kubernetes — managed node groups with autoscaling |
| EBS CSI Driver | EKS addon | Persistent volume support |
| ALB Controller | EKS addon (Blueprints) | AWS Application Load Balancer ingress |
| Cluster Autoscaler | EKS addon (Blueprints) | Node autoscaling |
| RDS PostgreSQL | aws_db_instance | PostgreSQL 14 — org config, run metadata, graph checkpoints |
| ElastiCache Redis | aws_elasticache_cluster | Redis 7.0 — trace ingestion queue, pub/sub |
| S3 Bucket | aws_s3_bucket | Raw trace objects — VPC endpoint only access |
| S3 VPC Endpoint | aws_vpc_endpoint | Private S3 access without internet routing |
| IRSA Role | aws_iam_role | IAM Roles for Service Accounts — pod-level S3 access |
| GP3 Storage Class | Kubernetes | Default storage class with volume expansion |
| Network Firewall | aws_networkfirewall_firewall | FQDN-based egress filtering — opt-in (create_firewall = true). Inspects TLS SNI and HTTP Host headers; drops all traffic not in the domain allowlist. |
Prerequisites
Required tools
# AWS CLI v2
# macOS
brew install awscli
# Linux: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
# Terraform (>= 1.5)
brew tap hashicorp/tap && brew install hashicorp/tap/terraform
# kubectl
brew install kubectl
# Helm (>= 3.12)
brew install helm
# eksctl (useful for kubeconfig and debugging)
brew install eksctl
# Verify
aws --version
terraform version
kubectl version --client
helm version
Required AWS IAM permissions
The IAM user or role running Terraform needs the following managed policies (or equivalent inline policies):
| Policy | Purpose |
|---|---|
AmazonEKSClusterPolicy | Create and manage EKS clusters |
AmazonVPCFullAccess | Create VPC, subnets, route tables, NAT |
AmazonRDSFullAccess | Create and manage RDS instances |
AmazonElastiCacheFullAccess | Create ElastiCache clusters |
AmazonS3FullAccess | Create S3 buckets and VPC endpoints |
IAMFullAccess | Create IRSA roles and policies |
Authenticate and configure
# Configure AWS credentials
aws configure
# or use environment variables:
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=us-west-2
# Verify access
aws sts get-caller-identity
aws ec2 describe-availability-zones --query 'AvailabilityZones[].ZoneName' --output table
Repository Layout
terraform/aws/
├── infra/ ← Terraform root — run terraform from here
│ ├── main.tf ← Wires all sub-modules, IRSA + ESO role setup
│ ├── variables.tf ← All configurable inputs with defaults
│ ├── scripts/ ← setup-env.sh, set-kubeconfig.sh, preflight.sh, quickstart.sh, quickdeploy.sh, secrets-status.sh
│ └── modules/
│ ├── vpc/ ← VPC, subnets (5 private / 3 public), NAT, route tables
│ ├── eks/ ← EKS cluster, node groups, addons, IRSA role, GP3 storage class
│ ├── postgres/ ← RDS PostgreSQL, subnet group, security group, IAM auth
│ ├── redis/ ← ElastiCache Redis, subnet group, security group
│ ├── storage/ ← S3 bucket, VPC endpoint, bucket policy (VPC-only access)
│ ├── alb/ ← Pre-provisioned ALB (opt-in; ALB access logs opt-in)
│ ├── cloudtrail/ ← CloudTrail trail + S3 bucket (opt-in)
│ ├── waf/ ← WAFv2 Web ACL attached to ALB (opt-in)
│ ├── firewall/ ← AWS Network Firewall — FQDN egress filter (opt-in)
│ └── k8s-bootstrap/ ← Namespace, KEDA, cert-manager, ESO, Envoy Gateway (opt-in)
└── helm/
├── scripts/ ← init-values.sh, deploy.sh, apply-eso.sh, tls.sh, uninstall.sh
└── values/
├── examples/ ← Reference templates (sizing, addons, Envoy Gateway, dataplane)
│ ├── langsmith-values-ingress-envoy-gateway.yaml ← Gateway API overlay
│ ├── langsmith-values-dataplane.yaml ← langgraph-dataplane chart values
│ └── dataplane-rbac.yaml ← RBAC for dataplane namespace
└── ... ← Active base + overrides + sizing/addon filescreate_vpc = false and provide vpc_id, private_subnet_ids, and public_subnet_ids to use an existing VPC. The EKS and RDS modules will deploy into the provided subnets.Configuration
Create a terraform.tfvars file in terraform/aws/infra/:
# Resource naming — all resources are named {name_prefix}-{environment}-*
name_prefix = "acme" # short identifier, lowercase, no spaces
environment = "production" # or "dev", "staging", etc.
# AWS region
region = "us-west-2"
# TLS mode: "none" (HTTP), "acm" (ACM cert on ALB/NGINX), or "letsencrypt" (cert-manager DNS-01 for Istio/Envoy)
tls_certificate_source = "none"
# VPC — set create_vpc = false to use an existing VPC
create_vpc = true
# vpc_id = "vpc-xxxxxxxx" # if create_vpc = false
# private_subnet_ids = ["subnet-xx", ...] # if create_vpc = false
# public_subnet_ids = ["subnet-xx", ...] # if create_vpc = false
# EKS
eks_cluster_version = "1.31"
eks_managed_node_groups = {
default = {
name = "node-group-default"
instance_types = ["m5.4xlarge"]
min_size = 3
max_size = 10
}
}
# RDS PostgreSQL
postgres_instance_type = "db.t3.large"
postgres_storage_gb = 10
# ElastiCache Redis
redis_instance_type = "cache.m6g.xlarge"
# Gateway mode — pick ONE (all false = ALB native, simplest)
# enable_nginx_ingress = true # ALB → NGINX controller → pods
# enable_envoy_gateway = true # ALB → Envoy proxy:10080 → HTTPRoutes (split dataplane)
# enable_istio_gateway = true # ALB → Istio:80 → VirtualService (mTLS mesh)
# TLS: "none" (HTTP), "acm" (ACM cert on ALB), "letsencrypt" (cert-manager DNS-01)
tls_certificate_source = "none"
# langsmith_domain = "langsmith.example.com" # auto-provisions Route 53 zone + ACM cert
# Sizing + addon flags
sizing_profile = "production"
# enable_deployments = true
# enable_agent_builder = true
# enable_insights = true
# enable_polly = trueTerraform state backend (recommended)
# backend.tf
terraform {
backend "s3" {
bucket = "your-terraform-state-bucket"
key = "langsmith/aws/terraform.tfstate"
region = "us-west-2"
}
}Pass 1 — Required Infrastructure
make quickstart and source infra/scripts/setup-env.sh once, use make quickdeploy to chain Pass 1 + Pass 2 in one command. It gates on secrets being loaded and terraform.tfvars existing, then runs: terraform apply → kubeconfig → init-values → helm deploy.cd terraform/aws
# Generate terraform.tfvars interactively (re-run safe — Enter accepts current values)
make quickstart
# Prompts for license key and admin password; auto-generates salt and JWT secret.
# Stores all secrets in SSM Parameter Store — sourced (not executed) to export TF_VAR_*
source infra/scripts/setup-env.sh
make init
make plan
make applysetup-env.sh, run make secrets to confirm all SSM parameters are set and TF_VAR_* variables are exported — before running make plan.source infra/scripts/setup-env.sh — not ./infra/scripts/setup-env.sh. The script exports TF_VAR_postgres_password and TF_VAR_redis_auth_token into the calling shell. Running it without source silently exports nothing and Terraform fails at plan time. Run make setup-env if you forget the exact command.enable_envoy_gateway = true in terraform.tfvars before running make apply to install Envoy Gateway as an alternative to the ALB ingress controller. This is required for multi-namespace dataplane (LangSmith Deployments) setups where agent pods run in a separate namespace. When enabled, Pass 2 must use the langsmith-values-ingress-envoy-gateway.yaml overlay instead of the standard ALB ingress values.After apply — configure kubectl
# Sets kubeconfig to the EKS cluster
make kubeconfig
# Verify cluster access
kubectl get nodes
kubectl get pods -n kube-systemmake preflight-post after make apply to verify the cluster is reachable, all SSM parameters are present, and Helm values files exist before starting Pass 2.Pass 1 — Infrastructure
Provisions: VPC, EKS cluster, RDS PostgreSQL, ElastiCache Redis, S3 bucket + VPC endpoint, ALB, IRSA role, ESO IRSA role, SSM secrets.
cd terraform/aws
Pass 2 — Required LangSmith
cd terraform/aws
# Reads Terraform outputs and generates Helm values with
# RDS endpoint, Redis endpoint, S3 bucket, IRSA role ARN, and regional S3 API URL
make init-values
# Applies ESO ClusterSecretStore + ExternalSecret (syncs SSM → K8s secret),
# then runs helm upgrade --install
make deployconfig.hostname is blank. After the Helm release completes, deploy.sh reads the ALB hostname from the ingress and automatically writes it into langsmith-values-{env}.yaml, then runs a second Helm upgrade to lock in the hostname. This is expected — not an error.Resource sizing
Set sizing_profile in infra/terraform.tfvars, then re-run make init-values to copy the matching values file:
# infra/terraform.tfvars
sizing_profile = "production" # multi-replica with HPA (recommended)
sizing_profile = "production-large" # high-volume (~50 users, ~1000 traces/sec)
sizing_profile = "dev" # single-replica, minimal resources (dev/CI/demos)
sizing_profile = "default" # chart defaults (no sizing overlay applied)# Re-generate values after changing sizing_profile:
make init-values && make deployVerify
kubectl get pods -n langsmith
kubectl get ingress -n langsmithPass 2 — LangSmith Application
Two paths — pick one:
Fast Path — Single Command Deploy
If source infra/scripts/setup-env.sh and make quickstart have already been run, you can chain all of Pass 1 and Pass 2 in one command:
Pass 3 — Optional LangSmith Deployments
config.hostname must be set in langsmith-values-{env}.yaml before enabling Deployments. The operator uses the hostname to construct agent endpoint URLs. If it is blank, deployed graphs will never reach RUNNING.Set the flag in infra/terraform.tfvars and re-run init-values:
# infra/terraform.tfvars
enable_deployments = trueinit-values.sh reads your tls_certificate_source from terraform.tfvars and sets config.deployment.tlsEnabled accordingly. You do not need to edit the values file manually.cd terraform/aws
make init-values # copies langsmith-values-agent-deploys.yaml with correct TLS setting
make deployKEDA is installed during Pass 1 by the k8s-bootstrap Terraform module — no manual KEDA install is needed.
Pass 4 — Optional Agent Builder
langsmith-values-agent-deploys.yaml must be present. Agent Builder requires config.deployment.enabled: true. Agent Builder also requires a license entitlement — contact LangChain if this feature is not visible in your UI.1. Generate and store the encryption key
The Agent Builder encryption key is generated once and stored in SSM Parameter Store. It is pulled into the cluster by ESO automatically when the values file is present.
# Generate a Fernet key
KEY=$(python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")
# Store in SSM — replace <name_prefix> and <environment> with your terraform.tfvars values
aws ssm put-parameter \
--region <region> \
--name "/langsmith/<name_prefix>-<environment>/agent-builder-encryption-key" \
--value "$KEY" \
--type SecureString2. Enable Agent Builder
Set the flag in infra/terraform.tfvars and re-run init-values:
# infra/terraform.tfvars
enable_agent_builder = true # requires enable_deployments = truecd terraform/aws
make init-values # copies langsmith-values-agent-builder.yaml
make deployVerify
# Agent Builder pods (appear after bootstrap job completes — ~5 min)
kubectl get pods -n langsmith | grep -E "agent-builder|lg-"
# Bootstrap job status
kubectl get jobs -n langsmith | grep bootstrapPass 5 — Optional Insights
1. Generate and store the encryption key
KEY=$(python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")
aws ssm put-parameter \
--region <region> \
--name "/langsmith/<name_prefix>-<environment>/insights-encryption-key" \
--value "$KEY" \
--type SecureString2. Enable Insights
Set the flag and ClickHouse connection details in infra/terraform.tfvars:
# infra/terraform.tfvars
enable_insights = true
clickhouse_source = "external"
# Fill in after obtaining your ClickHouse credentials:
# clickhouse_host = "<clickhouse-hostname>"
# clickhouse_port = 9440 # native protocol (TLS)
# clickhouse_tls = truecd terraform/aws
make init-values # copies langsmith-values-insights.yaml with your ClickHouse config
make deployVerify
# Insights deploys a Clio pod on first invocation from the UI
kubectl get pods -n langsmith | grep clioArchitecture Overview
Ingress / Gateway modes
Four mutually exclusive gateway modes are supported. Set exactly one flag in infra/terraform.tfvars:
| Mode | Variable | Traffic path | TLS |
|---|---|---|---|
| ALB native (default) | none | ALB → frontend NodePort | ACM or Let's Encrypt HTTP-01 |
| NGINX | enable_nginx_ingress = true | ALB → TGB → NGINX:80 → ClusterIP | ACM (terminates at ALB) |
| Envoy Gateway | enable_envoy_gateway = true | ALB → TGB → Envoy proxy:10080 → HTTPRoute | ACM (terminates at ALB) |
| Istio | enable_istio_gateway = true | ALB → TGB → Istio:80 → VirtualService | cert-manager DNS-01 (in-cluster) |
IRSA — IAM Roles for Service Accounts
EKS pods access S3 using IRSA — the Kubernetes service account is annotated with an IAM role ARN. AWS injects temporary credentials via the Pod Identity Webhook. No static AWS credentials are stored in Kubernetes secrets.
S3 VPC endpoint
A Gateway VPC Endpoint for S3 is created and associated with all private route tables. Trace data written to S3 never traverses the public internet — traffic routes directly from the EKS nodes to S3 within the AWS network.
Network Firewall (opt-in)
Setting create_firewall = true deploys an AWS Network Firewall between the private subnets and the NAT gateway. All outbound internet traffic is inspected using a domain allowlist — only FQDNs in firewall_allowed_fqdns are permitted; everything else is dropped. The firewall inspects TLS SNI for HTTPS and the HTTP Host header for plaintext traffic. Internal VPC traffic (pod-to-pod, pod-to-RDS, pod-to-ElastiCache) routes via the local VPC route and bypasses the firewall entirely.
Requires create_vpc = true. Cost: ~$0.395/hr per endpoint + $0.065/GB data processed.
Variable Reference
| Variable | Default | Description |
|---|---|---|
name_prefix | required | Short identifier (max 15 chars, lowercase) — all resources are named {name_prefix}-{environment}-* |
environment | required | Environment label (production, dev, etc.) — part of all resource names |
region | us-west-2 | AWS region for all resources |
create_vpc | true | Create a new VPC. Set false to use existing. |
vpc_id | "" | Existing VPC ID (if create_vpc = false) |
private_subnet_ids | [] | Existing private subnet IDs (if create_vpc = false) |
public_subnet_ids | [] | Existing public subnet IDs (if create_vpc = false) |
eks_cluster_version | 1.31 | EKS Kubernetes version |
eks_managed_node_groups | {default: m5.4xlarge} | Managed node group definitions (instance type, min/max size) |
enable_public_eks_cluster | true | Enable public EKS API endpoint. Set false for private — requires bastion. |
create_langsmith_irsa_role | true | Create IRSA role for LangSmith pods (S3 access) |
postgres_instance_type | db.t3.large | RDS instance class |
postgres_storage_gb | 10 | Initial RDS storage in GB (autoscales to max 100 GB) |
postgres_iam_database_authentication_enabled | true | Enable IAM database authentication for RDS |
redis_instance_type | cache.m6g.xlarge | ElastiCache node type |
sizing_profile | default | Helm sizing: production, production-large, dev, minimum, default |
enable_deployments | false | Enable LangSmith Deployments — listener, operator, host-backend (Pass 3) |
enable_agent_builder | false | Enable Agent Builder UI (requires enable_deployments) |
enable_insights | false | Enable ClickHouse-backed analytics (requires external ClickHouse) |
enable_polly | false | Enable Polly AI eval/monitoring (requires enable_deployments + Polly entitlement) |
enable_envoy_gateway | false | Install Envoy Gateway (Gateway API) as an alternative to ALB ingress. Required for multi-namespace dataplane deployments. Installs gateway-helm v1.3.0 and creates GatewayClass eg + Gateway langsmith-gateway. |
alb_access_logs_enabled | false | Enable ALB access logging to a dedicated S3 bucket. Useful for traffic analysis and compliance. |
create_cloudtrail | false | Create a CloudTrail trail logging all AWS API calls to S3. Skip if an account-level or org-level trail already exists. |
cloudtrail_multi_region | true | Record API calls across all regions. Recommended — single-region trails miss global service events. |
cloudtrail_log_retention_days | 365 | Days to retain CloudTrail logs in S3. Set 0 to keep indefinitely. |
create_waf | false | Attach a WAFv2 Web ACL to the ALB. Includes AWS managed rules for OWASP Top 10, IP reputation, and known bad inputs. Cost: ~$8–10/mo base. |
create_firewall | false | Deploy AWS Network Firewall for FQDN-based egress filtering. Intercepts all outbound internet traffic from private subnets and drops everything not in firewall_allowed_fqdns. Requires create_vpc = true. Cost: ~$0.395/hr/endpoint + $0.065/GB processed (~$285/mo base). |
firewall_allowed_fqdns | ["beacon.langchain.com"] | Domains allowed for outbound internet traffic when create_firewall = true. Matched against TLS SNI (HTTPS) and HTTP Host header. All other destinations are dropped. Add entries for LangChain Managed ClickHouse, model providers, or package registries as needed. |
firewall_subnet_cidr | 10.0.64.0/21 | CIDR for the firewall subnet. Must be within the VPC CIDR and must not overlap with the default private (10.0.0.0/21–10.0.32.0/21) or public (10.0.40.0/21–10.0.56.0/21) subnets. |
enable_nginx_ingress | false | Install NGINX ingress-nginx controller. ALB TGB wires the pre-provisioned ALB target group to the NGINX controller pods. Mutually exclusive with Envoy and Istio. |
enable_envoy_gateway | false | Install Envoy Gateway controller. ALB TGB targets Envoy proxy pods on port 10080. Supports cross-namespace HTTPRoute routing (split dataplane). Mutually exclusive with NGINX and Istio. |
enable_istio_gateway | false | Configure Istio gateway resources (requires Istio installed separately). ALB TGB targets istio-ingressgateway on port 80 (Istio 1.23+ uses NET_BIND_SERVICE). Mutually exclusive with NGINX and Envoy. |
istio_nlb_scheme | "internet-facing" | Scheme for the Istio ingress gateway NLB. "internet-facing" for public access, "internal" for VPC-only. Only used when enable_istio_gateway = true. |
tls_certificate_source | none | TLS mode: none (HTTP only), acm (ACM certificate on ALB — works with ALB and NGINX), letsencrypt (cert-manager DNS-01 via Route 53 — for Istio/Envoy in-cluster TLS). |
langsmith_domain | "" | Custom domain (e.g. langsmith.example.com). When set with tls_certificate_source = "acm" and no acm_certificate_arn, Terraform creates a Route 53 hosted zone, ACM certificate, and DNS alias automatically. |
postgres_source | external | PostgreSQL backend: external (RDS — recommended) or in-cluster (Helm-managed single pod, dev/POC only). |
redis_source | external | Redis backend: external (ElastiCache — recommended) or in-cluster (Helm-managed, no auth, dev/POC only). |
clickhouse_source | in-cluster | ClickHouse backend: in-cluster (single StatefulSet pod — dev/POC only, no replication or backups) or external (required for Insights/Pass 5). |
langsmith_deployments_encryption_key | auto-generated | Fernet key for LangSmith Deployments (Pass 3). Generated by setup-env.sh on first run. Must stay stable — rotating it after deploy invalidates all deployment state. |
langsmith_agent_builder_encryption_key | manual | Fernet key for Agent Builder (Pass 4). Generate with python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())" and store in SSM at /langsmith/{name_prefix}-{environment}/agent-builder-encryption-key. |
langsmith_insights_encryption_key | manual | Fernet key for Insights (Pass 5). Same generation method as Agent Builder. Store in SSM at /langsmith/{name_prefix}-{environment}/insights-encryption-key. Changing this key permanently corrupts existing insights data. |
Variable Reference
| Variable | Default | Required | Description |
|---|---|---|---|
name_prefix | — | yes | Prefix for all resource names (1–11 chars, lowercase) |
environment | dev | no | Environment: dev, staging, prod, test, uat |
region | us-west-2 | no | AWS region for all resources |
create_vpc | true | no | Create a new VPC (set false to use existing) |
vpc_id | null | when !create_vpc | Existing VPC ID |
private_subnets | [] | when !create_vpc | Existing private subnet IDs |
public_subnets | [] | when !create_vpc | Existing public subnet IDs |
vpc_cidr_block | null | when !create_vpc | Existing VPC CIDR block |
enable_public_eks_cluster | true | no | Enable public EKS API endpoint |
eks_public_access_cidrs | ["0.0.0.0/0"] | no | CIDRs allowed to reach the public EKS API endpoint |
eks_cluster_version | 1.31 | no | EKS Kubernetes version |
eks_managed_node_group_defaults | {ami_type: AL2023} | no | Default config for managed node groups |
eks_managed_node_groups | {default: m5.4xlarge} | no | Managed node group definitions |
create_gp3_storage_class | true | no | Create and set gp3 as default StorageClass |
eks_cluster_enabled_log_types | ["api", "audit", ...] | no | EKS control plane log types (CloudWatch) |
eks_addons | {} | no | EKS managed add-on configurations |
create_langsmith_irsa_role | true | no | Create IRSA role for LangSmith pods (S3 access) |
postgres_source | external | no | external (RDS) or in-cluster (Helm) |
postgres_instance_type | db.t3.large | no | RDS instance class |
postgres_storage_gb | 10 | no | Initial RDS storage in GB |
postgres_max_storage_gb | 100 | no | Maximum RDS storage in GB (autoscaling) |
postgres_username | langsmith | no | RDS database username |
postgres_engine_version | 16 | no | PostgreSQL engine version for RDS |
postgres_password | "" | when external | RDS password — use TF_VAR_postgres_password |
postgres_iam_database_authentication_enabled | true | no | Enable IAM database authentication on RDS |
postgres_deletion_protection | true | no | Enable deletion protection on RDS |
postgres_backup_retention_period | 7 | no | Days to retain automated RDS backups (0 = disabled) |
redis_source | external | no | external (ElastiCache) or in-cluster (Helm) |
redis_instance_type | cache.m6g.xlarge | no | ElastiCache node type |
redis_auth_token | "" | when external | ElastiCache auth token (min 16 chars) — use TF_VAR_redis_auth_token |
s3_ttl_enabled | true | no | Enable S3 lifecycle rules for trace TTL |
s3_ttl_short_days | 14 | no | TTL for ttl_s/ prefix in days |
s3_ttl_long_days | 400 | no | TTL for ttl_l/ prefix in days |
s3_kms_key_arn | "" | no | KMS CMK ARN for S3 encryption (empty = SSE-S3) |
s3_versioning_enabled | false | no | Enable S3 bucket versioning |
tls_certificate_source | acm | no | acm, letsencrypt, or none |
acm_certificate_arn | "" | when acm | ACM certificate ARN |
letsencrypt_email | "" | when letsencrypt | Email for Let's Encrypt |
langsmith_domain | "" | no | Custom hostname (empty = use ALB DNS name) |
langsmith_namespace | langsmith | no | Kubernetes namespace for LangSmith |
clickhouse_source | in-cluster | no | in-cluster or external |
alb_scheme | internet-facing | no | ALB scheme: internet-facing or internal |
alb_access_logs_enabled | false | no | Enable ALB access logging to S3 |
create_bastion | false | no | Create EC2 bastion host for private cluster access (SSM or SSH) |
bastion_instance_type | t3.micro | no | EC2 instance type for bastion |
bastion_key_name | null | no | EC2 key pair for SSH (empty = SSM only) |
bastion_enable_ssh | false | no | Open port 22 on bastion security group |
bastion_ssh_allowed_cidrs | [] | no | CIDRs allowed to SSH to bastion |
bastion_root_volume_size_gb | 20 | no | Root EBS volume size for bastion |
create_cloudtrail | false | no | Create CloudTrail trail for AWS API audit |
cloudtrail_multi_region | true | no | Record API calls across all regions |
cloudtrail_log_retention_days | 365 | no | Days to retain CloudTrail logs |
create_waf | false | no | Attach WAFv2 Web ACL to ALB |
create_firewall | false | no | Deploy AWS Network Firewall for FQDN-based egress filtering. Requires create_vpc = true. Cost: ~$0.395/hr/endpoint + $0.065/GB. |
firewall_allowed_fqdns | ["beacon.langchain.com"] | no | Domains allowed for outbound internet traffic when create_firewall = true. Matched against TLS SNI (HTTPS) and HTTP Host header. All other destinations are dropped. |
firewall_subnet_cidr | "10.0.64.0/21" | no | CIDR for the firewall subnet. Must not overlap with private (10.0.0.0/21–10.0.32.0/21) or public (10.0.40.0/21–10.0.56.0/21) subnets. |
sizing_profile | default | no | Helm sizing: production, production-large, dev, minimum, default |
enable_deployments | false | no | Enable LangGraph Platform (listener, operator, host-backend) |
enable_agent_builder | false | no | Enable Agent Builder (requires enable_deployments) |
enable_insights | false | no | Enable ClickHouse-backed analytics |
enable_polly | false | no | Enable Polly AI eval/monitoring (requires enable_deployments) |
enable_usage_telemetry | false | no | Enable extended usage telemetry reporting |
langsmith_deployments_encryption_key | "" | no | Fernet key for LangSmith Deployments |
langsmith_agent_builder_encryption_key | "" | no | Fernet key for Agent Builder |
langsmith_insights_encryption_key | "" | no | Fernet key for Insights |
owner | "" | no | Owner tag applied to all resources |
cost_center | "" | no | Cost center tag for billing |
tags | {} | no | Additional tags applied to all resources |
Quick Reference
First-time setup
cd terraform/aws
make quickstart # generates terraform.tfvars interactively
source infra/scripts/setup-env.sh # creates SSM secrets + exports TF_VAR_* (must be sourced)
make init && make plan && make apply # ~20–25 min
make kubeconfig
make init-values
make deployDay-2 operations
make status # full deployment health check
make deploy # re-deploy after changing Helm values or upgrading chart
make init-values # re-generate values after Terraform changes
make apply-eso # re-sync ESO secrets without redeploying
make ssm # manage SSM Parameter Store secrets interactively
make kubeconfig # refresh cluster credentials5-pass summary
| Pass | What | Command |
|---|---|---|
| 1 | VPC + EKS + RDS + ElastiCache + S3 + IRSA + ESO + cert-manager + KEDA | make apply |
| 1.5 | Cluster credentials + SSM secrets | make kubeconfig |
| 2 | LangSmith Helm | make init-values && make deploy |
| 3 | + LangSmith Deployments (enable_deployments = true) | make init-values && make deploy |
| 4 | + Agent Builder (enable_agent_builder = true) | make init-values && make deploy |
| 5 | + Insights (enable_insights = true) | make init-values && make deploy |
Enable optional addons
# infra/terraform.tfvars — set flags, then: make init-values && make deploy
enable_deployments = true # required for Agent Builder and Polly
enable_agent_builder = true # requires enable_deployments = true
enable_insights = true
enable_polly = true # requires enable_deployments = true + Polly entitlement
enable_usage_telemetry = false # extended usage telemetry (optional)Gateway mode
Set exactly one gateway flag — they are mutually exclusive. All false = ALB native (default, simplest).
# infra/terraform.tfvars — pick ONE:
enable_nginx_ingress = true # ALB → NGINX → pods (ACM TLS at ALB)
enable_envoy_gateway = true # ALB → Envoy proxy:10080 → HTTPRoutes (split dataplane)
enable_istio_gateway = true # ALB → Istio:80 → VirtualService (mTLS, DNS-01 TLS)make quickstart # wizard adds gateway selection (Section 6)
make apply # re-installs gateway controller via k8s-bootstrap
make init-values # regenerates values overlay for new gateway mode
make deploySecurity add-ons
# infra/terraform.tfvars — all opt-in, all default false:
alb_access_logs_enabled = true # ALB traffic logs → S3
create_cloudtrail = true # AWS API audit trail
create_waf = true # WAFv2 on ALB (~$10/mo)
create_firewall = true # AWS Network Firewall FQDN egress (~$0.40/hr)
create_bastion = true # SSM/SSH bastion for private EKS accessGlossary
| Term | Meaning |
|---|---|
| values chain | deploy.sh loads Helm values files in order: base → overrides → sizing → addons. The last file wins on conflicts. |
| sizing profile | A pre-built Helm values file that sets resources, replicaCount, and HPA settings for all LangSmith components. Set via sizing_profile in terraform.tfvars, applied by make init-values. |
| enable_* flags | Boolean flags in infra/terraform.tfvars that tell init-values.sh which addon values files to copy from examples/. No terraform apply needed — just make init-values && make deploy. |
| IRSA | IAM Roles for Service Accounts — EKS pods access S3 using temporary credentials injected by the Pod Identity Webhook. No static AWS credentials in K8s secrets. |
| ESO | External Secrets Operator — syncs SSM Parameter Store secrets into a langsmith-config Kubernetes secret. deploy.sh applies the ClusterSecretStore and ExternalSecret before running Helm. |
| Fernet keys | Symmetric encryption keys for LangSmith Deployments, Agent Builder, Insights, and Polly. Generated by setup-env.sh and stored in SSM. Never rotate after first deploy — changing them permanently corrupts existing encrypted data. |
| values-overrides.yaml | The live, gitignored file generated by init-values.sh. Contains your ALB hostname, RDS endpoint, Redis endpoint, S3 bucket, and IRSA role ARN. Do not edit directly — re-run make init-values. |