I love building software — it's my favorite thing to do. Writing about it is a close second. At Amazon, some of my most enjoyable days were spent writing internal design/consensus-building documents. I'm starting this blog as my personal corner of the Tensor9 blog, here I'll go into a bit more depth and write a lot more personally.
Tensor9 is a platform for deploying cloud-native applications into customer-controlled environments — their AWS accounts, Azure subscriptions, or on-prem infrastructure. For the product overview, see docs.tensor9.com. This post is about the technical internals.
Part 0 sets up the problem. Part 1 walks through how we represent and transform infrastructure — from raw text to a typed graph to compiled output. Part 2 explains how we deploy resources across trust boundaries. Part 3 traces an end-to-end example. Part 4 covers other dialects beyond Terraform.
A few terms used throughout this post:
Most software vendors deploy their software to infrastructure they control. The vendor writes Terraform, runs terraform apply, resources appear in the vendor's AWS account. Straightforward.
Some customers can't use that model. Regulated industries, data sovereignty requirements, security policies — there are legitimate reasons why a customer might say: "We want the software, but it has to run in our environment. Our AWS account. Our Azure subscription. Our data center."
The typical response is to fork the infrastructure code. The vendor takes their Terraform, copies it, and starts modifying. RDS becomes self-hosted Postgres. ElastiCache becomes Redis. Managed Kubernetes becomes on-prem k3s. Six months later — longer if the stack has many managed services, complex IAM policies, or intricate networking — there are two codebases that share almost nothing. Every feature ships twice. Every bug gets fixed twice. The engineering cost scales linearly with the number of deployment targets.
Tensor9 exists because I think there's a better approach. Instead of forking, transform. Define the infrastructure once — the origin stack — and the system transforms it for each target environment. AWS to Azure. Managed services to self-hosted equivalents. Cloud to on-prem.
This requires solving many, many problems of which I will explore two in this post:
aws_elasticache_cluster into docker_container running Redis requires understanding what those resources mean, not just what they say.The rest of this post explains how we solve these problems.
Level 1 is raw text. Level 2 is a parsed AST. These are table stakes. The AST gives you structure — blocks, attributes, nesting — but not meaning. aws_s3_bucket is just a label; the AST has no idea it represents cloud storage or that it'll become an API call to AWS. The interesting part starts at level 3.
STIR — STack Intermediate Representation — is a graph that can represent any infrastructure stack. Terraform resources, Kubernetes Deployments, Helm releases, CloudFormation stacks — they all become nodes in the same graph. Variables, expressions, and cross-resource references become edges. The graph is format-agnostic: you can load a Terraform module, transform it, and emit Kubernetes manifests. At this level, the graph captures structure but not yet semantic types.
Here's a STIR graph for a Lambda function with a generator (for creating multiple instances) and references to other resources:
Resources (green), generators (red), references (blue arrows), expressions, and primitives — all connected by typed edges.
STIR becomes typed when we load a dialect.† What's a dialect? It's a schema for an infrastructure language — Terraform, CloudFormation, Docker Compose, Helm, Kubernetes manifests. The dialect tells us:
Terraform schemas come from Terraform Registry. Every provider publishes its resource types and field definitions.aws_s3_bucket, aws_lambda_function, etc.)With the Terraform dialect loaded, we know more. bucket isn't just a field — it's a required string field that must be globally unique. After creation, the resource will have an arn attribute and a bucket_regional_domain_name attribute.
This enables compile-time validation — unknown fields, type mismatches, missing required attributes all surface before deployment.
Here's a more realistic example — an RDS PostgreSQL instance with multiple typed fields:
Each field has a type from the Terraform dialect schema. The dialect knows identifier is a string, allocated_storage is a number.
The linker resolves symbolic references — var.environment, local.config, module.network.vpc_id — by walking scopes and creating edges from each reference to its definition. After linking, every var.x points to the actual variable node. Every aws_s3_bucket.data.arn is an edge to that bucket's arn output.
Consider:
variable "environment" {
default = "production"
}
resource "aws_s3_bucket" "data" {
bucket = "myapp-${var.environment}-data"
}
At level 3, var.environment is just a scope traversal expression — we know the syntax but not what it points to. At level 5, the linker has resolved it: there's now an edge from that expression to the variable definition node. The graph is fully connected — you can traverse from any reference to its definition, which is what enables impact analysis and safe refactoring.
A customer wants to deploy the vendor's SaaS application, but they don't use AWS — they have their own data center running Redis on VMs. The vendor's application uses ElastiCache.
Here's what the origin stack looks like:
resource "aws_elasticache_cluster" "cache" {
cluster_id = "myapp-cache"
engine = "redis"
engine_version = "7.0"
node_type = "cache.t3.medium"
num_cache_nodes = 1
port = 6379
parameter_group_name = "default.redis7"
}
Here's what that looks like as a STIR graph:
The green resource node with typed field edges. This is the source graph that gets transformed.
And here's what needs to come out the other side — Terraform that provisions a Redis container on the customer's on-prem infrastructure:
resource "docker_container" "cache" {
name = "myapp-cache"
image = "redis:7.0"
ports {
internal = 6379
external = 6379
}
command = ["redis-server", "--maxmemory", "3gb"]
}
The transformation:
cluster_id → nameengine_version = "7.0" → image = "redis:7.0"port = 6379 → ports.internal = 6379, ports.external = 6379node_type = "cache.t3.medium" → --maxmemory 3gb (t3.medium has ~3GB usable)Fields that don't apply (parameter_group_name) disappear. Fields that need synthesis (command) materialize. The graph iterates until stable, then emits valid Terraform for the Docker provider.†
We can represent the stack. We can transform it. Now we need to deploy it.
Consider a Lambda function and an S3 bucket. The Lambda needs to know the bucket's name. In a normal Terraform configuration, this is trivial:
resource "aws_s3_bucket" "data" {
bucket = "myapp-data"
}
resource "aws_lambda_function" "processor" {
function_name = "data-processor"
environment {
variables = {
BUCKET_NAME = aws_s3_bucket.data.bucket // Reference to the bucket
}
}
}
Terraform handles the dependency. But what if the bucket and Lambda need to live in different AWS accounts?
This is exactly what happens when a customer says "I want to run this SaaS in my own AWS account." The vendor's infrastructure needs to get deployed somewhere else. Not a fork. Not a copy. The same product, just... over there.
The vendor runs orchestration infrastructure. The customer has an AWS account where the actual application runs — the appliance.
The vendor never has credentials to the customer's AWS account. An agent running in the appliance executes the projected Terraform.The bucket and Lambda need to be created in the appliance. But the vendor needs to coordinate from their side:
The problem: when writing the Terraform for step 4, the bucket doesn't exist yet. The vendor can't write aws_s3_bucket.data.bucket because that resource isn't in the vendor's Terraform state — it's in the customer's appliance.
Our answer is what we call the projection model. Instead of creating resources directly, we create projector resources that describe what should exist in the appliance.
Projectors in the deployment stack create resources in the appliance
Left side: the deployment stack — projector resources in the vendor's environment. Right side: the appliance — the customer's environment where real infrastructure gets created.
When the bucket projector executes, it creates the bucket in the appliance. The bucket's name flows back as a reflection. The lambda projector uses that reflection in its closure — preserving the dependency across the trust boundary.
Let's see what our simple Lambda + Bucket example looks like with projectors:
resource "tensor9_projector" "bucket" {
template = <<-EOT
resource "aws_s3_bucket" "data" {
bucket = "myapp-data"
}
EOT
}
resource "tensor9_projector" "lambda" {
template = <<-EOT
resource "aws_lambda_function" "processor" {
function_name = "data-processor"
environment {
variables = {
BUCKET_NAME = local.bucket_name
}
}
}
EOT
// The closure binds values from other projectors
closure = {
bucket_name = tensor9_projector.bucket.reflection.bucket
}
}
The closure block references tensor9_projector.bucket.reflection.bucket. Terraform sees this dependency and orders execution correctly. The reflection is how values flow back — the projected bucket's attributes get reflected back to the projector so other projectors can use them. Note the distinction: the template is known at plan time (what to create), while closure values only exist at apply time (outputs from resources that have been created).
Outputs from projected resources (vpc_id, endpoint) flow back to become inputs for dependents
Production applications have deeper dependency chains:
In a single-environment Terraform configuration, these resources form a complex dependency graph. The ECS service needs the database endpoint, the security group IDs, and the load balancer target group ARN. The load balancer needs the VPC subnets. The database needs the security groups and subnets.
Here's what cross-resource references look like in STIR — an S3 bucket with versioning and policy resources that reference it:
Blue reference nodes link resources together. The versioning and policy resources both reference the bucket.
When we project this stack into a customer's appliance, we use closures to bind these dependencies:
// VPC and networking layer (no dependencies)
resource "tensor9_projector" "vpc" {
template = file("templates/vpc.tf")
configuration = { cidr_block = var.vpc_cidr }
}
// Security groups (depend on VPC)
resource "tensor9_projector" "security_groups" {
template = file("templates/security_groups.tf")
closure = {
vpc_id = tensor9_projector.vpc.reflection.vpc_id
}
}
// Database layer (depends on VPC + security groups)
resource "tensor9_projector" "database" {
template = file("templates/rds.tf")
closure = {
subnet_ids = tensor9_projector.vpc.reflection.private_subnet_ids
security_group_id = tensor9_projector.security_groups.reflection.db_sg_id
}
}
// Load balancer (depends on VPC + security groups)
resource "tensor9_projector" "alb" {
template = file("templates/alb.tf")
closure = {
subnet_ids = tensor9_projector.vpc.reflection.public_subnet_ids
security_group_id = tensor9_projector.security_groups.reflection.alb_sg_id
}
}
// ECS service (depends on everything above)
resource "tensor9_projector" "ecs_service" {
template = file("templates/ecs.tf")
closure = {
subnet_ids = tensor9_projector.vpc.reflection.private_subnet_ids
security_group_id = tensor9_projector.security_groups.reflection.app_sg_id
db_endpoint = tensor9_projector.database.reflection.endpoint
db_password_arn = tensor9_projector.database.reflection.password_secret_arn
target_group_arn = tensor9_projector.alb.reflection.target_group_arn
}
}
Each closure block explicitly declares its dependencies. Terraform builds the execution graph from these references: VPC first, then security groups, then database and ALB in parallel, finally the ECS service. The closure values — subnet IDs, security group IDs, the database endpoint — flow from the projected resources in the appliance back through the projectors and into dependent templates.
A more complex deployment: VPC → Security Groups → Database + ALB → ECS Service
When terraform apply runs:
The closure references create Terraform dependencies. The projection model doesn't fight Terraform's execution model — it extends it across the appliance boundary.
Deployment stack orchestrates, appliance executes
Click "Run All" to see the full timeline, or use "Next" to step through. The left side shows the deployment stack (projectors), the right side shows the appliance (projected resources). Watch the step counter as each phase executes.
Step 1: VPC. Step 2: Security groups. Step 3: Database and ALB in parallel (see them both activate at the same time). Step 4: ECS service. Each step completes before the next begins — and at each step, the projected resources appear in the appliance while their reflections flow back to the deployment stack.
Okay, but how does state work when infrastructure lives in two places? The projection model maintains two state files:
Deployment state — Lives in the vendor's infrastructure. Contains the projector resources, closure bindings, and the reflection attributes (the data returned from the appliance).
Appliance state — Lives in the customer's environment. Contains the actual AWS/Azure/GCP resources with their full state.
On each apply, the Tensor9 provider syncs these states. When a database is projected into the appliance, its endpoint flows back to the projector's reflection attribute. When the projector is destroyed, the provider destroys the corresponding projected resource.
The projection model isn't about duplicating resources. It's about maintaining a coherent dependency graph when execution spans multiple trust boundaries. The vendor sees the structure through the deployment stack. The customer owns the projected resources. The dependency graph keeps them synchronized.
The examples so far show static values flowing through closures. But real infrastructure has expressions — values computed at deploy time, not known in advance.
Consider a Lambda function with count:
resource "aws_lambda_function" "processor" {
count = var.num_processors
function_name = "processor-${count.index}"
s3_bucket = aws_s3_bucket.code.bucket
s3_key = "lambda-${count.index}.zip"
}
The count is an expression — var.num_processors. We don't know at compile time how many Lambda functions will exist. That's determined when someone runs terraform apply with a specific value for that variable.
This creates a problem for projection. When we project this Lambda into an appliance, we can't just copy a number. We need to copy the expression — the reference to the variable — so that it resolves correctly in the customer's environment.
It gets more complex. The Lambda references aws_s3_bucket.code for its deployment artifact. That S3 bucket lives in the vendor's account. The Lambda will live in the customer's account. We can't just copy the bucket reference — the Lambda won't have access to the vendor's S3 bucket.
The solution involves creating artifact projector resources that copy deployment artifacts into the customer's environment. For a Lambda function:
count, the blob projector needs the same count expressionThe rewiring happens at the graph level. The original STIR graph has an edge from the Lambda's s3_bucket field to the S3 bucket resource. After transformation, that edge points to a scope traversal expression that resolves to the blob projector's output:
// Before: Lambda → S3 bucket in vendor account
aws_lambda_function.processor.s3_bucket → aws_s3_bucket.code.bucket
// After: Lambda → Blob projector output in customer account
aws_lambda_function.processor.s3_bucket → tensor9_blob.code.storage_details.aws_s3_object.bucket
The expression machinery matters here. tensor9_blob.code.storage_details.aws_s3_object.bucket is a scope traversal expression — a chain of attribute accesses that gets resolved at deploy time. We're not copying a value; we're copying a reference that will resolve to the correct value in the customer's environment.
When a resource has count or for_each, we model it in STIR with generator nodes. A generator is metadata that says "this resource template produces multiple instances."
The generator connects to:
var.num_processors or a for_each mapcount.index, each.key, each.valueWhen projecting a resource with a generator, we must also project the generator. The blob projector needs its own generator with the same iteration expression. If the Lambda has count = var.num_processors, the blob projector needs count = var.num_processors too — so that when Terraform creates 3 Lambdas, it also creates 3 blobs, and the indexing stays aligned.
This is why STIR models generators as explicit graph nodes rather than just resource attributes. When we copy a generator, we're copying:
count.index, etc.)Here's what a Lambda with count looks like in STIR:
The red ellipse is the generator node. It connects to the Lambda template (green), the iteration expression (var.num_processors), and the count.index local variable.
What you're seeing:
count."var.num_processors)The expression references stay intact. var.num_processors doesn't become 3 — it stays as var.num_processors, a reference that resolves at deploy time in the customer's environment. The customer provides variable values through their appliance configuration — either via tfvars files or the Tensor9 control plane — allowing each deployment to have different scaling parameters.
Static values are easy — copy the bytes. Expressions are hard because they encode computation. The expression var.num_processors means "look up this variable and use its value." When we project infrastructure, we must preserve that indirection so the expression resolves correctly in its new home.
This is why STIR graphs include expression nodes, scope traversals, and generator structures. We're not just representing what infrastructure is — we're representing how it computes.
We've covered representation and execution separately. Now let's trace a complete journey: from a vendor's origin stack all the way to a customer deployment.
A vendor has built a webhook processing service. It runs on AWS: an API Gateway receives webhooks, a Lambda processes them, an SQS queue buffers during spikes, S3 stores the payloads for replay, and an Aurora PostgreSQL database persists metadata. It's been running for two years. It works.
Then a customer says: "We want this in our own Kubernetes cluster. We can't send data to an AWS account we don't control."
Without Tensor9, this is a six-month project.† Fork the Terraform, rewrite everything for Kubernetes, debug the differences, maintain two codebases forever.
This estimate comes from conversations with engineering teams who've done AWS-to-K8s rewrites manually. Your mileage will vary based on stack complexity.With Tensor9, here's what happens.
The AWS Terraform goes into the compiler. Each resource becomes a STIR node. The references between resources — Lambda needs the SQS queue URL, API Gateway needs the Lambda ARN — become edges in the graph.
// Origin stack (AWS)
aws_api_gateway_rest_api.webhooks
→ aws_lambda_function.processor
→ aws_sqs_queue.buffer
→ aws_s3_bucket.payloads
→ aws_rds_cluster.metadata
Here's what that stack looks like as a STIR graph:
Five resources with cross-references. The Lambda references the SQS queue URL, S3 bucket name, and database endpoint.
The graph captures not just the resources but their relationships. The Lambda has environment variables referencing the queue URL and bucket name. Those references become typed edges in STIR.
The Aurora database requires special handling. The compiler transforms it into a CloudNative PostgreSQL cluster† for Kubernetes. This is more complex than a simple resource swap:
CloudNative-PG is a Kubernetes operator for PostgreSQL. It's what Aurora/RDS becomes when you target K8s.Multi-resource matching. An RDS database isn't one Terraform resource. It's a cluster of related resources: aws_rds_cluster, aws_rds_cluster_instance (one per replica), aws_db_subnet_group, aws_rds_cluster_parameter_group. The compiler walks the graph's reference edges to identify which resources belong together as a single logical database.
Field name translation. Aurora and RDS Instance use different field names for the same concept. Aurora uses master_username; RDS Instance uses username. The compiler knows both, and maps them to the CloudNative PostgreSQL secret's username field.
Resource sizing. The origin stack specifies instance_class = "db.t3.micro". The compiler parses this into vCPU and memory values, then generates Kubernetes resource requests: requests.cpu = 2, requests.memory = "1Gi". When Aurora clusters have multiple instance types with different promotion tiers, the compiler picks the instance most likely to be primary.
Unsupported features. Read replicas in RDS work differently than replicas in CloudNative PostgreSQL. The compiler currently throws an error if the origin stack uses read replicas. This is an explicit design choice: fail at compile time rather than generate incorrect infrastructure.
The dependency edges stay intact. Resources that referenced the RDS endpoint now reference a Kubernetes service. The edge types change, but the graph structure preserves the relationships.
Each transformed resource gets wrapped in a tensor9_projector. The inter-resource references become closure bindings:
resource "tensor9_projector" "database" {
template = file("k8s/database.tf") // CloudNative PostgreSQL cluster
}
resource "tensor9_projector" "processor" {
template = file("k8s/processor.tf")
closure = {
db_host = tensor9_projector.database.reflection.endpoint
db_port = tensor9_projector.database.reflection.port
}
}
resource "tensor9_projector" "ingress" {
template = file("k8s/ingress.tf")
closure = {
backend_service = tensor9_projector.processor.reflection.service_name
}
}
The dependency graph from Step 1 is now encoded in closure references. Terraform will execute these in the right order: database first, then processor, then ingress.
The vendor runs terraform apply on the deployment stack. The Tensor9 provider:
The customer now has a working webhook processor in their Kubernetes cluster. Same logic, same behavior,† different infrastructure.
"Same behavior" means equivalent functionality, not identical implementation. A Lambda and a Kubernetes pod differ in cold start, concurrency model, and timeout semantics. What's preserved is the application's external behavior — operational tuning (scaling, concurrency limits, resource allocation) requires vendor configuration per target.Let's recap what happened:
Representation — STIR graphs let us understand the origin stack well enough to transform it. We didn't do string replacement. We operated on semantic structure.
Transformation — Dialect schemas told us how to map AWS resources to Kubernetes equivalents. Field by field, type-checked, preserving relationships.
Execution — Projectors and closures let us deploy to the customer's environment while maintaining dependency order. The customer owns the resources. We coordinate the graph.
Properties of this model:
Terraform isn't the only game in town. STIR supports multiple dialects, each with its own type system and structure.
IAM is its own language. AWS IAM policies, GCP IAM bindings, Azure role assignments — they all express the same concepts differently. The IAM dialect provides a cloud-agnostic representation of identity and access management.
The policy references the role via aws_iam_role.lambda.id — that becomes a ref-to edge in the graph.
resource "aws_iam_role" "lambda" {
name = "webhook-processor"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "lambda.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy" "s3_access" {
name = "s3-read-access"
role = aws_iam_role.lambda.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["s3:GetObject", "s3:ListBucket"]
Resource = ["arn:aws:s3:::webhook-data/*"]
}]
})
}
Four primitives: Identity (role, user, service account), Policy (collection of statements), Statement (effect, actions, resources, conditions), Binding (links identity to policy at a scope). Cedar-inspired† — store fine-grained actions as canonical truth, lower to whatever the target cloud expects.
Cedar is AWS's open-source policy language. Its clean semantics make it ideal as a canonical authorization model.Kubernetes manifests have their own type system. Deployments, Services, ConfigMaps, PersistentVolumeClaims — each with a defined schema. The Kubernetes dialect understands these schemas and the relationships between resources.
The Deployment references the ConfigMap via configMapRef — that becomes a ref-to edge in the graph.
apiVersion: v1
kind: ConfigMap
metadata:
name: webhook-config
namespace: production
data:
LOG_LEVEL: info
QUEUE_SIZE: "100"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: webhook-processor
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: webhook
template:
spec:
containers:
- name: processor
image: myapp:v1.2.3
envFrom:
- configMapRef:
name: webhook-config
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: webhook-processor
namespace: production
spec:
selector:
app: webhook
ports:
- port: 80
targetPort: 8080
The Kubernetes dialect loads schemas directly from the Kubernetes OpenAPI spec.† Every field has a type. Every reference is validated. When you transform an AWS stack to Kubernetes, the typed graph ensures you're producing valid manifests.
Kubernetes publishes OpenAPI schemas for every resource type. We parse these to build the dialect.Helm charts add templating on top of Kubernetes. Chart.yaml metadata, values.yaml configuration, template files with Go templating. The Helm dialect represents this structure.
The app references postgresql via helm_release.postgresql.name — that becomes a ref-to edge in the graph.
resource "helm_release" "postgresql" {
name = "db"
repository = "https://charts.bitnami.com/bitnami"
chart = "postgresql"
version = "15.0.0"
namespace = "production"
set {
name = "auth.database"
value = "webhook_db"
}
}
resource "helm_release" "app" {
name = "webhook-processor"
repository = "https://charts.bitnami.com/bitnami"
chart = "common"
version = "2.0.0"
namespace = "production"
set {
name = "database.host"
value = helm_release.postgresql.name
}
set {
name = "database.port"
value = "5432"
}
}
Helm charts are how most Kubernetes applications are distributed. Understanding them at the graph level means we can transform origin stacks into Helm charts, not just raw manifests.
Here's where it gets interesting. A Terraform resource can contain an embedded IAM policy. The result is a graph with nodes from multiple dialects — tf:// for the Terraform resources and iam:// for the policy.
The role_policy has a policy field containing an IAM policy. Green nodes are tf://, the policy subtree is iam://.
resource "aws_iam_role" "lambda" {
name = "webhook-processor"
}
resource "aws_iam_role_policy" "s3_access" {
name = "s3-read-access"
role = aws_iam_role.lambda.id
# embedded IAM policy (iam:// dialect)
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["s3:GetObject", "s3:ListBucket"]
Resource = ["arn:aws:s3:::webhook-data/*"]
}]
})
}
The linker resolves references within each dialect. The role = aws_iam_role.lambda.id becomes a ref-to edge linking the policy attachment to the role. The IAM policy subtree preserves its semantic structure regardless of how it's serialized.
This is how real infrastructure works. You don't have just Terraform or just IAM — you have Terraform resources containing embedded IAM policies, Kubernetes manifests, Helm charts. STIR handles all of it in one unified graph.
Important topics I'll get into another time:
For implementation details, see docs.tensor9.com. For an honest look at what transforms today — and what doesn't — see the Service Equivalents Registry. It lists every AWS service we handle, what it becomes on each target platform, and where the gaps are.
STIR graphs for representing stacks. The projection model for representing execution. Closures for preserving dependencies across environment boundaries. Together, they let us treat infrastructure as source code that compiles and runs correctly across different targets.
-mtp 2025-12-18