Concept

Benchmarking against incumbents

analyticalstructural

Overview

Benchmarking is comparing a finished analysis’s headline outputs — levelized cost, carbon intensity, energy intensity, capex per tonne — against the known values of characterized incumbent or competitor processes, to test whether the result is plausible and to see how the analyzed process stacks up. It is an output check on a built model, distinct from borrowing an incumbent’s data as an input anchor.

Body

What it is. Benchmarking takes the model’s key computed outputs and sets them beside the same metrics for real, well-characterized incumbents — the dominant existing route, a best-in-class plant, the market leader. It answers two questions at once: plausibility (does my number land where reality does?) and competitiveness (does the analyzed process beat, match, or lose to the incumbent on the metric that decides the case?).

Input anchor vs. output check. This is the distinction that separates benchmarking from a reference / comparable process. A comparable seeds the model — its measured data is borrowed as the starting figure a cost or performance estimate is built from. Benchmarking tests the finished model — placing its computed output against the incumbent’s. The same incumbent plant can serve both roles, but seeding a model from a datum and checking the model against it are different uses, and using one datum for both at once is circular (see Limits).

What gets benchmarked. The handful of headline metrics the analysis exists to produce: levelized cost in $/t, carbon intensity in CO₂e/t, energy intensity, capex per tonne of capacity, sometimes yield. Each is compared metric by metric against the incumbent’s value for the same quantity.

Like-for-like is the requirement. A benchmark means something only when both numbers share a basis — the same boundary, cost year, capacity definition, and scope. A cradle-to-gate cost set against a published gate-to-gate one, or a current estimate against a decade-old incumbent figure, measures the basis difference rather than the process difference, exactly as a levelized cost is non-comparable across mismatched bases.

One incumbent vs. a class. Benchmarking against a single plant gives one comparison point; benchmarking against the spread of comparable plants is stronger, because incumbents of the “same” type vary widely — the reference-class view of where the analyzed process sits in a distribution, not just against one exemplar.

Two jobs at once. Benchmarking is both the anchor-comparison form of an order-of-magnitude gut check — does the output sit where real processes sit? — and a headline result in its own right, since “process X is competitive with the incumbent on $/t” is often the conclusion the whole analysis was built to support.

Limits & typical error

Basis mismatch is the dominant error. Comparing outputs computed on different boundaries, scopes, cost years, or capacity definitions compares unlike things, and because the incumbent’s basis is usually unstated the mismatch is silent — the same non-comparability that afflicts any levelized cost or carbon intensity read across bases.
Circularity — benchmarking against your own anchor. When the incumbent value used as the benchmark is the same datum that seeded the model as its comparable process, the check is circular: the model was constructed to match that figure, so agreement is guaranteed and tests nothing. A benchmark only validates when it is independent of the model’s inputs.
Incumbent figures are moving and selectively reported. A competitor’s quoted cost is often a best case, a target, a press-release number, or a different vintage; benchmarking against an optimistic or stale value mis-sets the bar. The benchmark inherits the source quality of the incumbent datum.
One incumbent understates the spread. Comparing against a single plant — frequently the flagship or best-in-class — gives a point, not the real range incumbents occupy, so the analyzed process can read as uncompetitive against the best plant and competitive against the median. A point comparison hides which it is (the single-comparable limit of reference-class thinking).
Different maturities are not like-for-like. Benchmarking a first-of-a-kind plant’s cost against a decades-optimized incumbent compares different points on the learning and scale curves; the gap then conflates “this route is fundamentally worse” with “this route is early,” which are different claims with different implications.
Benchmarking the wrong metric. Matching an incumbent on cost when the decision turns on carbon intensity (or the reverse) produces a real comparison on a metric that does not govern the conclusion — accurate and irrelevant at the same time.

Mini-example

Benchmarking green ammonia’s levelized cost ($800/t, from the running example) against the incumbent gray route. On a comparable production-cost basis, conventional gray ammonia is on the order of **$200–400/t**, gas-price-dependent (a round market band, not a sourced figure). Green therefore benchmarks at roughly 2–3× the incumbent cost at typical gas prices — both a plausibility pass (a ~2–3× green premium matches the widely reported gap, so ~$800/t is believable) and a competitiveness result (green is not yet cost-competitive on unsubsidized $/t; that gap is the space policy credits must close). On the metric that often decides the case, though, the benchmark flips: green’s carbon intensity (~0 on renewables) decisively beats gray’s (~1.6–2.4 t CO₂e/t), so which process “wins” depends entirely on whether cost or carbon governs.

Separately, to show circularity: if the green plant’s capex had been anchored to a specific published gray-ammonia plant cost as its input comparable, then “benchmarking” green’s capex per tonne against that same plant only re-confirms the anchor — the model was built from it, so it cannot disagree. The benchmark must come from outside the model’s own inputs to test anything.

And to show basis mismatch: setting green’s cradle-inclusive cost beside a gray gate-to-gate cost that excludes upstream gas production compares two different boundaries — the apparent gap is partly the scope difference, not the process difference, until both are put on one basis.