Eryn wrote the post I wish I'd had the clarity to write a year ago: Form Factors: How Software Vendors Define Where Their Software Can Run. Read it if you want the full treatment. The short version: a form factor is a structured declaration of where a vendor's software is allowed to run — environment, connectivity, required services, security posture. Tensor9 keeps a registry of service equivalents so it knows what to swap an AWS service for in some other target.
This post is about one, very specific, fun implementation detail sitting underneath that. When a vendor writes
aws_db_instance.engine_version = var.postgres_version, and Tensor9 has
to pick a specific CloudNative-PG chart version for the on-prem form factor, how does
it pick? The value it needs isn't a primitive — it's an expression. Often a
conditional; and often threaded through three modules before it even reaches the resource.
This is the third post in a series. The first walked through how we represent infrastructure as a typed graph (STIR). The second walked through how service dialects let us raise AWS into a canonical form and lower to Kubernetes. This one walks through one specific mechanism inside the Tensor9 compiler: phi tracing.
The problem. To swap an AWS service for its target-environment equivalent, the compiler needs concrete values — which Postgres version, which instance class, which chart. Real vendor Terraform hands it expressions, not primitives.
What phi tracing does. It walks backward through the expression
to collect every value it could take and the Terraform condition under which it
takes each one. Output: one pre-compiled specialization per value, each wrapped
in a count-gated module. Terraform itself picks the active branch
at plan time.
Why it matters. The vendor never rewrites their Terraform. The compiled output reads like the original; the plan is the source of truth; every decision is visible before apply. No runtime magic, no new DSL.
Tensor9 is a platform that lets software vendors take products they built for their own AWS account and deploy them into customer-owned environments: the customer's AWS, the customer's on-prem Kubernetes cluster, the customer's GCP project. The vendor hands Tensor9 the Terraform they already wrote for AWS, and the platform emits the stack the customer needs to run the same application in their environment. The application's behavior is preserved; the underlying infrastructure is translated. A compiler at the heart of the platform does that translation, and this post is about one specific mechanism inside it. You can learn more here: docs.tensor9.com.
Doing that well means the compiler has to reason about real vendor Terraform, not toy Terraform. Real vendor Terraform is full of variables, conditionals, and values threaded through layers of modules. That's what this post is about.
These appear throughout. Most are defined more carefully in Eryn's post and in post one; we'll restate the ones that carry weight here.
aws_db_instance becomes a CloudNative-PG cluster on K8s. aws_elasticache_replication_group becomes a Valkey operator deployment.aws_db_instance resources into their target-specific equivalents.{"14.9", "15.4", "16.2", "17.2"} for Postgres engine versions. Formally a widening operator parameterized by what the service compiler can actually specialize for; not every field has a finite universe.Part 1 frames the problem: what naive service compilers do, why that fails on real stacks. Part 2 is a compact detour through SSA phi nodes, because the terminology is borrowed from there and it's worth being honest about what we borrowed and what we didn't. Parts 3 through 5 walk through the mechanism: backward data flow, symbolic conditions, template specialization. Part 6 lists the four scenarios you see in practice. Part 7 is a short list of adjacent topics we skipped for space.
A vendor's origin stack might have something like this:
variable "postgres_version" {
type = string
default = "15.4"
}
variable "instance_class" {
type = string
// no default; set by the environment
}
resource "aws_db_instance" "app" {
engine = "postgres"
engine_version = var.postgres_version
instance_class = var.instance_class
allocated_storage = 100
}
If the vendor runs terraform apply directly against their own AWS account,
Terraform resolves var.postgres_version at plan time and RDS takes the value.
Terraform doesn't care that the field was an expression; by the time the AWS API call
goes out, it's a string.
When we're compiling this aws_db_instance into a
CloudNative-PG Cluster resource for a Kubernetes form factor, we
need to know the engine version at compile time, not at apply
time.† Why? Because different major
versions map to different Helm chart versions, different CRD shapes, different
default parameter groups. The service compiler has to pick which of those to
emit, and it has to pick before the vendor ever runs anything.
So our naive first pass was: if the field isn't a primitive, fail. Emit a blocking
stack issue, ask the vendor to hardcode the value. We even thought this was
reasonable — Terraform itself does this in lots of places
(count, for_each, provider aliasing). Vendors are used
to it.‡
terraform plan and apply. So "compile time" here means earlier than either plan or apply — when there's no runtime context to draw on.
Reasonable. Also wrong in practice. Vendors use variables precisely so they can change these values without editing code. Telling them to hardcode is telling them to give up the knob.
Here's the problem. Real vendor stacks almost never hardcode. They parameterize. They pass variables through modules. They conditionalize on environment. A more realistic shape of the same stack:
// root module
module "database" {
source = "./modules/postgres"
engine_version = var.customer_env == "prod" ? "15.4" : "14.9"
instance_class = local.instance_class
}
// modules/postgres/main.tf
variable "engine_version" { type = string }
variable "instance_class" { type = string }
resource "aws_db_instance" "app" {
engine = "postgres"
engine_version = var.engine_version
instance_class = var.instance_class
}
Three hops from the literal to the field. A conditional in the middle. A local
pointing at who-knows-what. None of it is a primitive where the service compiler needs one.
As a STIR graph, the same stack looks like this. The field the service compiler is trying to read is the green box at the top right; everything else is the data flow the compiler would have to chase to find an actual value.
aws_db_instance
is the entry point; getting to a primitive means crossing two module boundaries,
walking one RefTo chain, and unwinding a ConditionalExpr.
So the compiler has two options.
Option 2 is phi tracing.
The name is borrowed. It's worth being honest about what we borrowed.
In compiler theory, SSA (Static Single Assignment) form is a graph representation where
every variable is assigned exactly once. When control flow joins — after an
if/else, at the top of a loop — you need a way to say
"this variable is either x1 (if we came from block A) or
x2 (if we came from block B)." That's a phi node:
φ(x1, x2). Classical SSA phi nodes are
positional; they look at which predecessor block you came from. That's enough
when you have a control flow graph.
Our context is different. We analyze data flow through Terraform, not control
flow through basic blocks. The "predecessor" of a value isn't a block — it's a
conditional expression, or a module boundary, or a for_each. And we care
about why a branch was taken, not just which one. A positional phi is the
wrong shape.
What we actually need is closer to GSA (Gated Static Assignment) γ
(gamma) nodes. GSA extends SSA by making control dependence explicit in the data
structure: each branch of a phi carries its own predicate — the condition
that must hold for that branch to be active. γ(cond →
x1, ¬cond → x2).†
We kept the "phi" name because nobody says "gamma nodes" colloquially. But in our
implementation, every branch carries its own PhiCondition tree. That's
the GSA design. More on conditions in Part 4.
The first design had phi as a first-class STIR node type, following the pattern used for count-gated generators. We pulled it out during implementation. Phi analysis results live for microseconds between the tracer that produces them and the specializer that consumes them. They never get serialized, emitted, or inspected by any other pass. Adding a node type would have meant exhaustive handlers across every pass, serialization code, graph image format changes — a lot of weight for something that isn't a durable part of the graph. Phi tracing produces transient analysis results, not graph structure.
Forward data flow starts at declarations and pushes values forward through the program. Backward data flow starts at a use and works out what could have produced it. Phi tracing is backward.
The service compiler is the driver. When it hits a field like
engine_version = var.postgres_version and wants a primitive, it calls into
the tracer with the STIR node representing that expression. That node is the seed.
The STIR graph has a lot of edge types. For phi, only a handful matter:
Val — the "is assigned" edge from a local to its value expression.Field — the edge from a resource to one of its fields, or from a module call to the value it passes to a downstream parameter.ExprArg — the edge from an expression node to one of its sub-expressions (condition, true branch, false branch, function arguments).GenIter — the edge from the iterator local of a for_each/count block back to the collection it iterates over.
Reference edges are used as a bridge, not as a primary traversal.
When we hit a scope reference like var.x, we jump to
x's definition node and continue the trace from there. The bridge
keeps the tracer focused on what produces values, not on how names
resolve.†
Every trace returns one of three things. The relationship is containment: Resolved (a single known value) sits inside Bounded (a finite set we can enumerate), which sits inside Unbounded — the outer region of possibilities we couldn't narrow down. There's a fourth state the animation makes visible: a branch set that is finite but has blown past the size limit — Bounded, but too large — which the tracer widens to Unbounded with that specific reason, rather than emitting a specialization per branch. Keeping analysis and output both finite is non-negotiable for a compiler that has to fit in a reasonable wall-clock budget. The animation starts tight and widens outward: from the one value we proved, to the few we know the selector chooses from, to the ones we counted but gave up on, to everything else we couldn't pin down.
the three possible trace results, widening outward as analysis loses precision
"15.4" after following through modules.
Results flow through function calls without losing precision. Applying
lower() to a Resolved value stays Resolved with the value lowercased.
Applying it to a Bounded result stays Bounded with each branch value lowercased.
Applying it to Unbounded stays Unbounded. Every transfer function the tracer
applies — string ops, equality, arithmetic — is monotone
with respect to this containment order, which is what makes composition sound.
The running example, from Part 1, is the engine_version field on the
aws_db_instance inside the ./modules/postgres module, where
the root module passes a conditional.
The tracer starts at the field's value expression. Step by step:
The ConditionalExpr is where the fork happens. The tracer recurses down
ExprArg("true") with an accumulated path condition of
Existing(env=="prod"), and down ExprArg("false") with
Not(Existing(env=="prod")). Each branch terminates at a primitive, each
with its own condition.
The result is a Bounded value with two branches. Each branch carries the value it would produce and the symbolic condition under which it would apply. The selector — the expression the branches disagree on — is stored once, separately, so the specializer can reuse it when gating.
The running example is the happy path. In real stacks, the tracer has to deal with:
lower(var.x),
tostring(var.n), try(var.a, var.b),
coalesce(...), lookup(map, key, default). Each
function has a per-function trace strategy that tells the tracer how to
push results through. For a pure transform like lower, the
tracer traces the argument and applies lower to each resolved
value. For try, the tracer takes an arm-union: it specializes
both arms when their static resolvability is the same, and widens to the
union when they diverge.† The underlying
function semantics are shared with the graph evaluator so the tracer and the
evaluator can't drift.
Why arm-union, not "first resolvable arm": try(a, b) in real Terraform catches eval-time errors (null traversals, type coercion failures), not static unresolvability. Picking the first statically-resolvable arm would silently specialize a when a would have errored at apply and b would have fired. Arm-union is wider than the vendor's original intent, but it's always sound.
for_each/count iterators. each.value inside a for_each block traces back to the collection being iterated over, and the tracer extracts one branch per collection entry, gated by Eq(selector, entry_key). Dynamic blocks take the same path with one wrinkle: the iteration keys must be plan-time-known (Terraform already enforces this), but the per-iteration content expressions evaluate in each iteration's scope. The tracer keeps the iteration axis and the value axis separate so the two don't conflate.Eq(selector, v). If no universe, Unbounded.visited set catches cycles. A depth limit (default 20) catches runaway recursion through deeply nested modules. A size limit on the branch set (default 16) bounds the output: once the bounded set grows past the threshold, the tracer widens to Unbounded with the reason "bounded, but too large to specialize". This is the classical abstract-interpretation move for keeping analysis finite, and it's also the mechanism that prevents specialization blow-up in downstream output (see the "Bounded, but too large" phase in the animation above). Any limit violation returns Unbounded with an actionable reason.
For the tracer to be sound, every function it pushes values through must be
monotone with respect to the containment order Resolved ⊂ Bounded ⊂
Unbounded. The easy cases are pure element-wise transforms like lower
and tostring: apply to each branch value; result size unchanged. The
interesting cases are the ones that can blow up or lose information:
contains(list, x),
concat, merge). When both operands are Bounded,
the tracer computes the cross-product of branch pairs subject to the size
limit: if |A|×|B| exceeds the threshold, we widen to
Unbounded with the "too large" reason rather than emitting a specialization
per cell. Sound, finite, and bounded-output.jsondecode,
yamldecode). The tracer doesn't descend into the decoded
structure. If the argument is a literal string, the decoder runs at trace time
and the result is Resolved. If the argument is anything else, the decoder's
output is Unbounded — we can't enumerate the shape of the structure
without executing it, and executing it against arbitrary Bounded inputs is
potentially unsafe (decoders can throw). A vendor who wants specialization
through a decoded blob can refactor to pass the decoded fields as separate
variables.timestamp,
uuid, bcrypt). These are not plan-time-stable —
they re-evaluate on every plan. The tracer refuses to use them as a gate
selector and returns Unbounded with the reason "plan-stability
violation". Using them as a pass-through value (the right-hand side of an
expression, not the selector) is fine and stays Resolved.data.aws_rds_engine_version.latest.version
behaves exactly like var.x with no default: the tracer asks the
service compiler for a universe for the surrounding field and, if one is
supplied, emits one branch per universe value gated
Eq(data.…, v). Data-source values evaluate at Terraform
refresh (before plan), so using them as gate selectors is plan-time-stable.
If no universe is supplied, the result is Unbounded — the same fallback
as a variable without conditionals. There's nothing special-cased about data
sources.New Terraform functions get added to the strategy registry, not to the tracer core. Each registration carries the monotonicity proof obligation: show that the function maps the containment order forward. In practice that's a two-line argument per function family.
One design principle we enforced hard: the tracer is read-only with respect to the graph.
The reason is prosaic. Tracing gets called a lot, from a lot of places. A service compiler might trace one field, look at the result, decide it needs to trace a sibling field, look at that result, decide to back off, and never produce any output at all. If every trace mutated the graph by synthesizing condition nodes, we'd pile garbage into STIR that had to be cleaned up. We'd also have subtle double-counting if the same trace ran twice.
So the tracer builds condition trees symbolically. A PhiCondition
is one of four shapes, each describing a way a gate can be expressed without committing
any graph nodes:
Existing reuses a node that's already in the graph (typically the
condition expression from a ConditionalExpr). The other three are
synthesized during the trace — but only as descriptions, not as graph nodes.
One invariant that falls out of this: once a PhiCondition pins an
Existing node, that node is frozen for the rest of the
pass.† Later specialization can still wrap
it, index into it, or reference it, but it can't rewrite it in place. If a pass
genuinely needs to rewrite, it clones first. Without that rule, the
read-only-analysis story would leak — a later mutation would silently
change the semantics of every materialized gate pointing at the same node. The
enforcement isn't discipline: STIR nodes are immutable data classes, and the
Kotlin compiler refuses any attempt to write through a reference. The "clone
first" path goes through a dedicated constructor that takes ownership of the
new copy.
Path conditions during tracing always compose via conjunction. If we're in the true branch
of A == prod, then in the false branch of B == us, the path
condition is And(Existing(A=="prod"), Not(Existing(B=="us"))). There is no
disjunction during the trace; each branch corresponds to exactly one path, and paths
accumulate via AND.
A three-way example. Nested Terraform conditional:
local.x = var.env == "prod" ? "m5.xlarge"
: (var.region == "us" ? "t3.medium" : "t3.small")
The tracer walks both conditionals and produces three branches, each with its own symbolic path condition:
The specializer is where graph mutation happens — it's the component that actually
rewrites the STIR graph into a specialized, count-gated form. When the specializer needs
a concrete graph node for a condition (because count = cond ? 1 : 0 needs
cond to be a real expression node in the graph), it calls a function that
walks the symbolic PhiCondition tree and synthesizes the equivalent
concrete STIR expression nodes:
And/Not/Eq combinators.
Existing is cheap — reuse the node that's already there.
Eq becomes a BinaryOp("==") between the selector and a
primitive literal. Not wraps an inner materialization in a
UnaryOp("!"). And wraps two sub-materializations in a
BinaryOp("&&"). The whole thing is a straight recursive descent.
This function is the only place where phi-derived conditions turn into graph structure. It runs exactly once per specialized branch, so there's no garbage and no double-counting. And it lets us do cheap things with conditions before committing: coalescing branches that share a value, negation normalization, shared-subtree detection.
Analysis passes and graph mutation are different kinds of work: analysis is exploratory, while mutation is committing. Mixing them — synthesizing nodes during exploratory analysis — makes passes non-idempotent and piles up garbage. Symbolic conditions let the tracer reason (building, combining, simplifying) without touching the graph. Materialization is where the analysis hands its final artifact to the graph, at the moment the graph needs it. Everything between is pure computation over immutable inputs.
The tracer hands the service compiler a Bounded result: N branches, each with
a value and a symbolic condition. Now what?
The service compiler's existing logic already knows how to compile for one known value. That's the function it had before phi tracing existed. Phi doesn't change that function; it just calls it N times.†
This is classical offline partial evaluation — specifically the first Futamura projection. The compile function is the interpreter; each branch value is a statically-knowable binding; specializing the compile call against each binding produces residual code, and the count gate re-dispatches the residuals on the dynamic selector at plan time. Jones, Gomard, and Sestoft's Partial Evaluation and Automatic Program Generation is the canonical reference.The entry point the service compiler calls takes four things: the expression it wants resolved, the universe of values it knows how to handle, a compile function that turns one value into a list of graph nodes, and a fallback for the unbounded case. Everything else is the phi system's job.
The specializer runs four steps per branch:
count = cond ? 1 : 0.[0] index.count and for_each. It takes a child node (a resource, a module call) and a count/iteration expression, and at lowering time emits the underlying Terraform with the appropriate count = or for_each = attached. Gen nodes are how we represent "zero-or-one" and "one-per-key" in a single place instead of threading those concerns through every pass.
Output:
module "db_v15_4" {
count = var.customer_env == "prod" ? 1 : 0
source = "./modules/postgres-v15-4"
instance_class = local.instance_class
}
module "db_v14_9" {
count = var.customer_env == "prod" ? 0 : 1
source = "./modules/postgres-v14-9"
instance_class = local.instance_class
}
As a STIR graph, the specialized output looks like this. The original ModCall is gone;
in its place are two count-gated Gen nodes, each wrapping its own
specialization. The count field on each Gen is the
materialized branch condition — the same selector as the original conditional,
just with its truth direction flipped per branch.
count expressions reference the same shared selector; the branches
disagree only on which side of the boolean they activate.
One of these resolves to count = 1 at plan time; the other to
count = 0. Terraform produces exactly one deployment. The vendor's
original conditional logic is preserved in the gate expression, but now it selects
between two properly specialized compilations instead of trying to thread a single
specialization through both versions.
All phi gate conditions are guaranteed to be known at Terraform plan time, not apply time. That is the invariant that makes this whole scheme work.†
Terraform's rule:count and for_each values must be known at plan time. If a gate depended on, say, aws_db_instance.db.arn — a value that only exists after apply — you'd get the classic error: "The `count` depends on a value that will not be known until apply."
The tracer enforces this by construction. It follows data flow only through variables, locals, conditional expressions, and data sources (which evaluate at refresh, before plan) — never through resource attributes. Variables resolve at plan time. Locals are just expressions over variables. Conditionals evaluate over those. None of the sources of a phi condition is something Terraform has to call a cloud API to learn and apply to provision.
The enforcement isn't a convention in the tracer code. STIR distinguishes
plan-time-stable expressions from apply-time values at the type level: a resource
attribute like aws_db_instance.db.arn is a different node kind than a
variable or local, and the tracer's pattern matching simply doesn't have an arm that
recurses through it. If a vendor expression tries to use a resource attribute as a
selector, the tracer returns Unbounded with the reason "selector depends on an
apply-time value" and the vendor gets a stack issue before any output is emitted.
This design decision has a sharper consequence than it might sound: it's the reason we
don't need a second "apply-time phi" mechanism. Every specialization that phi produces
can be decided at plan. The vendor reads the plan, sees which module is coming up with
+ count = 1, and reviews the actual specialization that will run.
Terraform treats count-gated resources as lists. If the module module.db
used to be referenced elsewhere as module.db.output, once count
is attached, the reference must become module.db[0].output.
The specializer handles this automatically. It walks every reference edge pointing
at a count-gated node, finds the corresponding scope-traversal expression, and
splices an index-zero step into the reference path right after the root reference.
This composes through reference chains: only direct references to the gated node
need [0]; references that go through other nodes carry the index
along for the ride.
Hand-authored depends_on edges get the same treatment, with one
important rule: the rewrite depends on every branch module, not just the
active one. When a vendor writes
depends_on = [aws_db_instance.app] and the specializer has split the
resource into module.db_v14_9, module.db_v15_4, and
module.db_v16_2, the rewritten reference is
depends_on = [module.db_v14_9, module.db_v15_4, module.db_v16_2]. Inactive
branches have count = 0, which Terraform treats as an empty list — a
valid dependency that contributes no edges — so the downstream resource waits
exactly for the active branch. Picking just one branch to depend on would fail on every
plan where a different branch was active. User-declared ordering survives
specialization; silently dropping those edges would be the kind of correctness bug
that turns into an overnight page.
A subtlety worth flagging: specializing aws_db_instance.app into
module.db_v15_4.aws_db_instance.app is a resource address change.
Without care, existing Terraform state would see the old address disappear and
the new one appear — a destroy-and-recreate on first apply after
adoption, which is a very bad surprise for a database.†
The same address-stability question applies across compiler versions: a Tensor9 upgrade that changed generated module names would show up as a state diff on the customer's next plan, which is unacceptable for production infra.‡
The compiler emits Terraformmoved {} blocks alongside each specialization so state migrates in place. The first post-compile plan shows a "moved" diff with no destroy/create, and apply is a no-op against unchanged infrastructure. Provider aliases on the original resource are carried onto the specialized modules through the same mechanism.
Tensor9 guarantees generated resource address stability across compiler versions. The specialization naming scheme is a public contract. If a future version needs to change it, the compiler ships a migration artifact (moved {} blocks again) so the first plan after upgrade is a no-op. Customers never have to choose between upgrading Tensor9 and getting a clean plan.
If the tracer returns Resolved, there's only one value; no branching, no
gating. The specializer just calls compile(value) and returns the result
directly. Similarly, if Bounded came back with branches that all carry the
same value (a pathological but possible case), the specializer dedups to one module and
drops the gating. No wasted modules in the output.
Not every service compiler output is directly countable. If the template returns a
field of an already-gated resource, or a node that already has its own
count/for_each, wrapping it in Gen directly
doesn't compose. The specializer falls back to synthetic module wrapping:
each branch's output is placed inside a generated sub-module, which is
countable, and the parent count-gates the sub-module. The service compiler never sees
the distinction; it just gets a list of nodes back.
The nested-count semantics are what you'd expect. When the parent gate evaluates to
count = 0, the whole sub-module is inert — the inner gate never
evaluates. When the parent gate evaluates to count = 1, the inner gate
fires on its own selector exactly as if no wrapping had happened. There is no case
where the two gates "disagree" at runtime: the outer gate is an on/off switch for
the inner, not a competing condition.
The debugging question operators actually care about: "Postgres 14.9 just got deployed; it was supposed to be 15.4 — why?" At 3am, you don't want to reverse-engineer the answer from the generated Terraform.
The specializer emits provenance in two places, by default, for every specialization:
terraform plan output, not sidecar files, and the comment is
sitting right where they're already looking.The inline comment looks like this:
# phi: engine_version (aws_db_instance.app) — gate: Not(Existing(var.customer_env == "prod")) — trace: modules/postgres/main.tf:12-18 — compiler: tensor9 1.8.4
module "db_v14_9" { count = (var.customer_env != "prod") ? 1 : 0 ... }
The companion JSON record carries everything the inline comment trims for brevity:
{
"module": "db_v14_9",
"source_field": "aws_db_instance.app.engine_version",
"branch_value": "14.9",
"gate": "Not(Existing(var.customer_env == \"prod\"))",
"trace_path": [
"Field aws_db_instance.app.engine_version",
"NameRef var.engine_version",
"cross-module: caller's ModCall field \"engine_version\"",
"ConditionalExpr false-arm (path: Not(Existing(...)))",
"Prim \"14.9\""
],
"source_span": "modules/postgres/main.tf:12-18",
"compiler_version": "tensor9 1.8.4"
}
compiler_version is a load-bearing field, not decoration. When an
operator is forensically reading a provenance record a year after the compile that
produced it, the first question they ask is which Tensor9 version generated it
— because the specializer's decisions and the naming scheme are versioned.
Two things fall out of having provenance in both forms:
terraform plan
output sees which trace produced each count-gated module and why its gate
evaluates the way it does. No reverse-engineering required.The records are small (typically a few hundred bytes per specialization), written alongside the generated Terraform, and not consulted at apply time — they exist purely for human and tooling consumption after the fact.
In practice, what the tracer hands the specializer falls into four buckets. Roughly in this order of prevalence, in our experience.
The most common. A vendor passes engine_version = "15.4" from the root
module, maybe through two or three layers of modules, maybe transformed once or twice
by a function like tostring. Each hop is traversable, and at the end, a
primitive. The tracer returns Resolved("15.4"). The specializer emits one
module, no gating.
Most parameterization in real stacks is of this form: vendors use variables so they can change values later, but in the current deployment they're threading a single concrete value through. Phi tracing for this case is "follow the yarn to the spool," and the cost is small.
The running example. var.env == "prod" ? "15.4" : "14.9". Two branches,
each with a concrete value and a path condition. Specialize twice, gate on the condition
expression. The vendor's original intent — "use 15.4 in prod, 14.9 elsewhere"
— survives compilation as two modules gated on the same boolean.
The interesting subcase: nested conditionals. A three-way conditional produces three branches with nested AND/NOT conditions. The specializer emits three modules, each with its own gate. Branches that happen to share a value across different paths get coalesced via disjunction so the output doesn't carry duplicate specializations.
A vendor has variable "instance_size" { type = string } — no default,
no conditional setting it in this stack. The vendor wants customers to provide the value
at deployment time.
This is the case that motivated the universe parameter the service compiler supplies when it invokes the phi-aware entry point. The service compiler usually knows the domain of the field it cares about, even if the vendor's HCL doesn't say. Instance sizes are a small closed set. Engine versions are a small closed set. TLS versions are a small closed set.
The service compiler passes the universe in. The tracer, on hitting a Param with no default,
falls back to the universe: one branch per universe value, each gated with
Eq(selector, v). The specializer generates N specializations. At apply
time, whichever value the customer picks for instance_size — provided
it's in the universe — keeps exactly one specialization alive.
// universe = {"small", "medium", "large"}
// var.instance_size has no default
module "db_small" {
count = var.instance_size == "small" ? 1 : 0
...
}
module "db_medium" {
count = var.instance_size == "medium" ? 1 : 0
...
}
module "db_large" {
count = var.instance_size == "large" ? 1 : 0
...
}
If the customer supplies a value outside the universe, the compiler emits a
Terraform-native validation {} block asserting
contains(universe, var.instance_size), so the customer gets a clear error
at plan time rather than a silent no-op.
The expression is a variable with no default, no conditional driving it, and no universe the service compiler knows about. Or it's a function call whose semantics we don't have a strategy for. Or the trace hit the cycle/depth/branch-count safety limits.
The tracer returns an unbounded result. Rather than guess, the compiler stops and emits a stack issue that points at the exact expression and tells the vendor how to unblock it:
⚠ Blocking: can't determine the value of engine_version
The engine_version field on aws_db_instance.app is set to
var.postgres_version, but we couldn't narrow that variable down to a
specific value or a small set of values. We need to know the possible
values up front so we can pick the right PostgreSQL version for each
of your customer deployments.
Why:
var.postgres_version has no default value, and no validation
block lists the allowed values.
Fix (any one of these works):
1. Give the variable a default:
variable "postgres_version" {
type = string
default = "15.4"
}
2. Constrain the variable with a validation block:
validation {
condition = contains(["14.9", "15.4", "16.2"], var.postgres_version)
error_message = "postgres_version must be 14.9, 15.4, or 16.2"
}
3. Hardcode the value if it doesn't need to be configurable:
engine_version = "15.4"
Option 2 is usually the right choice — your customers can still pick
between versions, and we'll compile a specialization for each one.
The issue points at the specific expression and offers concrete, copy-pasteable fixes. That matters: a stack issue that tells a vendor which field, which variable, which module — and how to fix it — turns into a five-minute edit. A generic "this field isn't a primitive" turns into a support ticket.
| Scenario | Trace result | Specialization | Gate |
|---|---|---|---|
| One known value | Resolved(prim) |
1 compilation, no gate | — |
| Conditional | Bounded — 2-3 branches, each with symbolic condition |
N compilations | the original conditional |
| Bounded universe | Bounded — one branch per universe value, Eq-gated |
N compilations | var.x == "v", one per value |
| Unbounded | Unbounded (with a human-readable reason) | none — blocking stack issue with a concrete fix | — |
Phi tracing has enough surface area to fill a small book. A few adjacent topics we skipped in this post, which we may come back to later:
var.env == "prod" ? "m5.xlarge"
: (var.region == "us-east-1" ? "m5.xlarge" : "t3.small")
Three branches become two modules, not three. The algorithm for turning a
conjunctive-branch forest into a coalesced DNF is worth its own post.
aws_db_instance.app becomes
module.db_v15_4.aws_db_instance.app. When the compiler itself
changes (different specialization strategy, different sub-module structure),
addresses can shift across compiler versions. The compiler emits
moved {} blocks with each output so the first plan after upgrade
is a no-op, but that handles the easy case. A customer who skipped three
Tensor9 versions, a vendor whose stack hits a renamed sub-module path, a
universe element removed between compiler versions: each of these is its own
design decision. Worth a post on its own.
validation {} blocks
(contains(["s","m","l"], var.size) and friends). The lifter recognizes
these and feeds the constraint into the tracer as an implicit universe, which is a
nice way to let the HCL itself drive specialization without the service compiler having
to enumerate valid values. The implementation details get into AST traversal and
constraint inference that didn't fit here.
small,
medium, large) and specializes per bin, with a stack
warning to the vendor that fine-grained tuning requires making the attribute known
at compile time. The interesting design question is where the bin boundaries come
from — service-compiler-provided, inferred from resource pricing tiers, or
vendor-configurable.
Eryn's post explained what form factors are — the contract that pins down where a vendor's software is allowed to run. For that contract to hold, the compiler has to perform service replacements correctly for each form factor. Service replacements depend on specific values of resource fields — engine versions, instance classes, memory sizes. Real vendor stacks don't hand the compiler primitives; they hand the compiler expressions.
Phi tracing is how we bridge that gap. Backward data flow analysis over the STIR graph, producing one of three trace results (Resolved / Bounded / Unbounded) with symbolic condition trees on each branch. When bounded, the specializer calls the service compiler's existing compile logic once per branch value and wraps each output in a count-gated module. The plan-time-knowability invariant falls out of only tracing through variables, locals, and conditionals, never through resource attributes. The tracer is read-only; the specializer owns graph mutation; condition materialization is the narrow bridge between them.
The effect, from the outside: a service compiler that used to require primitives now handles parameterized stacks without changes to its core compile logic. It wraps its existing compile function in a phi-aware entry point. The phi-tracing system does the rest.
Next, maybe I'll take a deeper look at how the compiler decides which target service to reach for in the first place, and how that choice plays with the version-picking mechanism described here.
— mtp