04 Jun 2026

What Works in NZIF 2.0 Implementation

Teams whose sovereign net-zero ratings hold up over time tend to do a recognizable set of things well - and most of them sit upstream of any data question.

The good practices are mostly decisions, not datasets

The instinct, when an NZIF-rating turns out fragile, is to look for a better data source. That is rarely where the problem started. The teams that build durable ratings are not the ones that found the right database (e.g. ASCOR, CAT, CCPI); they are the ones that decided what the rating was for before they built it, and let every later choice follow from that. The data question matters, but it sits downstream. What separates the work that holds up is a sequence of strategic decisions, made deliberately and in the right order. The good practices below follow that sequence.

Set the priorities before anything else

The teams that avoid most later trouble start by answering two questions: Does fair share matter to this mandate, and how much? And will net-zero alignment be pursued through portfolio tilting, through engagement with sovereign debt issuers, or both? Neither question is about data, and both shape everything that follows.

This comes first because the answers set requirements that cannot all be met at once. A team building toward engagement needs granular output that shows which specific measures would move a country up; an aggregate category is too coarse to hold a conversation around. A team tilting a portfolio needs clear separation between countries, so capital can move toward the better performers. A team focused on reporting needs stability over time above all. These pull in different directions, and a team that has not ranked them ends up making the trade silently, usually by accident, at the data stage. Naming the priority early is what keeps the rest of the process coherent.

Let the country universe drive the approach, not the reverse

A common early mistake is to choose a method first and discover its country coverage afterward. The teams that work the other way round - establishing which countries are actually in scope, then asking what it takes to assess them well - avoid building a rating that quietly excludes part of the portfolio or leans on a thin assessment for the countries that are hardest to cover.

This is also where the build-or-adopt question belongs: whether to develop a bespoke methodology or apply an existing framework as published. There is no general right answer. A team with a narrow, stable universe and a reporting purpose maybe well served by adopting a standard approach closely; a team with a broad universe and a tilting mandate usually needs more control over how countries are separated. The good practice is not the choice itself - it is making the choice against a known universe and a named purpose, rather than inheriting it from whichever method was nearest to hand.

Treat the first version as a draft

Across the teams we drew on, one observation is near-universal: the first version of a sovereign rating is rarely satisfactory. The teams that end up with something durable expect this and plan for it. They treat the first run as a diagnostic - something to interrogate, not to ship - and budget time for the iteration that follows. The teams that struggle are the ones that took an early, plausible-looking output as finished, because the problems in a sovereign rating almost never appear in the first version. They surface in validation.

Validate on three axes, not one

Validation is where defensible ratings are separated from fragile ones, and it is also where teams most often stall, because it is slow and demanding and tends to reveal problems rather than confirm success. The teams that do it well check three distinct things before committing to a final version, and treat a weakness in any one as a reason to go back.

1. Does the rating separate countries that are genuinely different?

If most countries land in the same one or two categories, the rating is not yet doing its job. Investors cannot tell a steady improver from a country that merely clears the floor. The good practice is to look for differentiation that reflects real differences in trajectory, and to be suspicious of both extremes: a rating where everyone clusters at the bottom, and one where spread has been manufactured to look discriminating. Where finer resolution is genuinely needed, the stronger teams add it as a layer of sub-distinctions inside the categories - showing how close a country sits to the next level, or how fast it is moving - rather than overstating where a country actually stands.

2. Does it behave sensibly over time?

A sovereign rating is meant to show whether a country is improving or slipping, which only works if this year's assessment can be compared to last year's. Teams that get this right test for the kinds of movement that have nothing to do with a country's real position: ratings that swing because an underlying benchmark was recomputed, countries that flip back and forth at a category boundary on small input changes, or scores that move because the bar was set low early and tightened later. The discipline is to fix the reference points that should stay fixed, so that when a rating moves, it moves for a reason.

3. Does it distribute fairly across country groups?

The teams that distribute fairly do not assume fairness emerged on its own; they check it explicitly, comparing how the categories fall across developed and emerging economies. Where a part of the assessment proves systematically harder for one group - a disclosure expectation that ignores longer reporting cycles, or an ambition test that recognizes only a formal net-zero target and misses genuine climate effort expressed another way - they treat that as a defect to correct, not a result to accept. The aim is to compare countries on their climate effort, not on their development level.

A few principles that survive every approach

Underneath the differences in how teams build, a small number of choices recur in the work that holds up. They are worth stating as principles rather than recipes, because the right way to apply each depends on the mandate.

A. Measure against a fixed bar, not against the peer group

A rating that ranks countries against each other cannot show improvement over time, because a country can get better and stay in the same place while its peers improve alongside it. Teams that want to track progress measure each country against a stable external reference instead. This is a strategic choice about what the rating is for - monitoring change versus ranking a field - long before it is a question of which input to use.

B. Combine inputs only where they do not overlap

Reaching for several sources to widen country coverage is the most common way methodological breaks enter a rating: a country can end up assessed against two different climate scenarios, or two different notions of fair share, inside a single number. The teams that stay coherent anchor on one consistent basis and extend it only onto themes that basis does not already cover, and only with comparable inputs, not competing verdicts. The principle is consistency of method, not breadth of sourcing.

C. Carry fair share through every criterion

Building fair share into one part of the assessment and not the others is enough to skew the result. The teams that distribute well make sure the logic runs through each criterion - ambition, policy, targets, disclosure - rather than hoping it survives at the aggregate. This is less a data decision than a design decision about where responsibility and capability enter the assessment.

D. Choose a pathway the portfolio can act on

The strictest scientific pathway, held literally, tends to push nearly every country into the lower categories - defensible as science, close to useless for allocation or engagement. A pathway that stays consistent with the Paris goals while remaining realistically achievable gives a rating that distinguishes countries and can actually be acted on. The strategic question is how to stay scientifically credible without producing a rating no portfolio can use, and a stable, achievable reference usually answers it better than the most demanding one. Whichever pathway a team picks, fixing it so it does not shift from year to year is what lets the rating stay comparable over time.

E. Differentiate through the framework, not around it

The teams that separate countries well use the framework's own cascading structure, keeping the entry criteria broadly achievable so countries can climb into the middle categories, while keeping the higher ones genuinely demanding. The result is a spread along a real transition path, rather than a cluster at the floor or a spread invented to look discriminating.

Steer the portfolio; do not just measure it

A rating, however good, changes nothing on its own. One caution runs through the evidence: without active management of portfolio weights, year-on-year improvement in net-zero alignment is hard to achieve. Teams that only monitor - producing the rating, reporting it, and leaving allocation untouched - tend not to move their alignment at all. The teams that do move it close the loop, letting the rating feed tilting and engagement, and treating the rating and the portfolio decision as one process rather than two.

The balance is the whole point

The familiar failure is a rating that is either scientifically immaculate but uninvestable, or financially sound but not climate-credible. The teams that succeed hold both at once: separating strong performers from weak, staying realistic enough to act on, and fair enough across country groups to withstand scrutiny. That balance is not the product of one good decision. It comes from making the connected decisions - priorities, scope, iteration, validation, and the principles that survive every approach - in the right order, with the trade-offs in view.

That is the harder path, and it is the one the durable work takes. The earlier companion note set out what goes wrong when these decisions are made by default. This one sets out what they look like when a team makes them on purpose.

What Works in NZIF 2.0 Implementation

The good practices are mostly decisions, not datasets

Set the priorities before anything else

Let the country universe drive the approach, not the reverse

Treat the first version as a draft

Validate on three axes, not one

1. Does the rating separate countries that are genuinely different?

2. Does it behave sensibly over time?

3. Does it distribute fairly across country groups?

A few principles that survive every approach

A. Measure against a fixed bar, not against the peer group

B. Combine inputs only where they do not overlap

C. Carry fair share through every criterion

D. Choose a pathway the portfolio can act on

E. Differentiate through the framework, not around it

Steer the portfolio; do not just measure it

The balance is the whole point

Monthly implementation insights

Related articles

Before You Report a Country Alignment Share

Why Relative Climate Scores Break Net-Zero Tracking

What Works in NZIF 2.0 Implementation