TEA Handbook

Concept

Data sourcing & validation

cross

Overview

Data sourcing and validation is the build-time discipline of giving every number in a model a recorded origin and a validation state — finding a value, noting where it came from and on what basis, rating it against the source hierarchy, and flagging it as unconfirmed until a stronger source or an expert checks it. It is the practice that produces provenance during the build, before any outside reader ever asks.

Body

The loop, per input. Each number passes through four moves: source it (find a value and record its origin and basis), rate it against the source hierarchy and the accuracy class it implies, flag it as unvalidated when it rests on a weak or assumed source, and clear the flag when a better source or an expert confirms it. The running list of open flags is a validation queue — the model’s own record of what it has not yet confirmed.

Sourcing records origin and basis. A value without its basis — the cost year, the scale, the system boundary it was measured on — is only half-sourced: it is attributed but not transferable to the case being modeled. The record has to carry both, or the figure cannot be safely reused.

Validation is distinct from sourcing. Sourcing records where a number came from; validation confirms it is adequate against a stronger or independent source. The two come apart: a number can be fully sourced (traceable) yet unvalidated (its source not yet judged good enough), and that gap is exactly what the flag marks.

The flag is information, not an embarrassment. Marking a figure unvalidated states plainly that the result currently leans on an unconfirmed input, and it localizes where confirmation effort would actually change how much the answer can be trusted. An unflagged model is not more finished — it has merely hidden which of its numbers are still guesses.

The state of the result tracks the state of its drivers. Because the answer rides on a few drivers, the validation state of those inputs sets the validation state of the conclusion; a fully-sourced model with one unvalidated driver has an unvalidated result.

Limits & typical error

See also

Mini-example

Building the green-ammonia model, three load-bearing inputs get different treatment. The electricity intensity (10 MWh/t NH₃) is sourced to electrolyzer datasheets and rated a strong tier — close to validated. The electricity price ($40/MWh) is recorded as a stated market anchor and flagged unvalidated, pending a real market or PPA source to confirm the level and its range. The total capex (~$1.0bn) is flagged pending a vendor quote. Read together, the open flags say the ~$800/t headline currently rests on the price and the capex anchors — pointing confirmation effort exactly where it would change the result’s trust, rather than at the already-solid intensity.

Separately, to show sourcing without validation: marking the ~$540/t clean-hydrogen credit “sourced” to the policy’s headline rate, while leaving its eligibility conditions (the carbon-intensity tier, hourly-matched clean power) unflagged, records the value but not the condition the plant must meet — a number that is sourced yet not validated for whether it can actually be earned.

See also