Property-Sector Tax Compliance: A Proven Instrument

Research · Tax Compliance

Property-Sector Tax Compliance: A Proven Instrument

A hypothetical Crown investment case, built entirely from public evidence — and the fiscal return that evidence supports.

First Edition — 11 May 2026 · Greg Williams · steko.co.nz/thinking

A note on provenance and intent

This paper makes a single argument: that a properly specified property-sector tax-compliance capability (a reconstituted Property Compliance Programme) would, on a deliberately conservative basis, return materially more than it costs, and would open a route to fiscal objectives on a basis the public evidence supports. The case is quantified rather than asserted.

It is offered as an independent analytical contribution. The author was part of the original Property Compliance Programme team at Inland Revenue between 2007 and 2010; that work is the origin of a long-running interest in compliance economics, and it is disclosed here without reservation. The analysis draws only on publicly available material: historical programme data from the 2007–2014 period, a Monte Carlo model calibrated against published Treasury benchmark work, and a method that separates direct revenue yield from wider behavioural-compliance effects. There is no current engagement with Inland Revenue, no political affiliation, and no commercial interest in any particular path to implementation.

The analysis was provided first to the Minister of Revenue's office, on 11 May 2026, and held under a three-week embargo before this public release. That sequence is recorded here purely as a matter of provenance and courtesy. Nothing in this paper should be read as reflecting any view, decision, action, or response on the part of the Minister, the Minister's office, or any official; the work, and any opinions within it, are the author's alone.

A final word on scope. This is an argument about what a well-designed compliance capability could achieve, and what it would take to build one. It is not a critique of Inland Revenue's current operations, nor of any policy decision taken or not taken. It is offered for its analytical merit, to whoever finds it useful.

---

What this paper found

—The scale is large and sustained. Inland Revenue's own OIA disclosure (OIA 25OIA1945, 19 March 2025) records $1.493 billion in additional tax assessed from property compliance across sixteen years FY2009 to FY2025, with the current annualised run-rate at approximately $204 million: IRD's published output record, not a modelled projection.

—The methodology has a track record. The original programme delivered $83.1 million in assessed revenue against a $66.2 million target, and a single risk treatment targeting Loss-Attributing Qualifying Companies generated $637.6 million in ongoing annual compliance benefit (Home and Kelly 2010). Published return on investment ranged from 7.5 to 1 (Hon P Dunne, NZICA, November 2011) to 7.8 to 1 (Budget 2015), reported under successive Cullen, English, and McClay administrations.

—The forward model clears the Government's benchmark at every iteration. A Monte Carlo model run across 10,000 iterations produces a conservative benefit-cost ratio of 18.2 to 1 at the P25 position ($862 million net Crown revenue over four years) and 19.5 to 1 at the central estimate ($924 million). The probability of clearing the Government's stated $8 to $1 benchmark (Hon S Watts, Tackling New Zealand's Rising Tax Debt, 22 May 2025) is 100 per cent (model v0.8).

—Artificial intelligence is producing signals, not cases. Inland Revenue's December 2025 Ministerial report records that compliance-focused AI is at the stage of "identifying use cases" within Customer Compliance Services, with a business value map in development (IR2025/442 §7). AI surfaces lead material; methodology converts it into structured compliance outcomes. The Department holds the first without yet deploying the second.

—The capability is a statutory monopoly, and the analysis advances a revenue hypothesis. Only Inland Revenue holds authority under section 16 of the Tax Administration Act 1994 to compel LINZ residential transaction data and attribute it to individual taxpayer records within the section 81 secrecy framework. No private sector firm can replicate this. The analysis separately advances a hypothesis (explicitly hypothesis-grade, not a claim) that approximately $350 million in cumulative revenue may have gone uncollected between FY2011 and FY2025 during periods when the methodology-driven compliance capability was not fully exercised (BC §2.4, §3.5.3).

New Zealand's largest under-enforced asset class

New Zealand's residential property market sits at roughly NZ$1.15 trillion in total asset value (Statista Real Estate — New Zealand outlook, 2025), approximately six times the entire NZX-listed equities market, which stood at NZ$184.6 billion across all 179 securities in February 2025 (NZX Limited reporting). The mortgage stock alone (NZ$381 billion at Q3 2025; Reserve Bank of New Zealand, housing statistics M10) is more than double the entire NZX listed-equities market. These are not numbers that invite modest ambition about enforcement.

New Zealand tax-base scale comparator: residential property (~NZ$1.15T), residential mortgage stock (NZ$381B), NZX total market capitalisation (NZ$184.6B), and the hidden-economy annual tax-gap estimate.

Comparisons with other compliance domains sharpen the picture. The hidden economy has been estimated historically in the low-single-digit billions annually for New Zealand (Giles, "Modelling the Hidden Economy and the Tax-Gap in New Zealand," Empirical Economics, 1999, tax-gap band 6.4–10.2% over 1968–1994); Inland Revenue's own 2014–15 hidden-economy discrepancy identification was $146 million. The property compliance universe is an order of magnitude larger. Allocating finite compliance-enforcement attention across tax-base domains is, in that light, a first-order fiscal question.

The compliance output data bear this out. Inland Revenue's own OIA disclosure (OIA 25OIA1945, 19 March 2025) records NZ$1.493 billion in additional tax assessed from property compliance across the sixteen years FY2009 to FY2025, with a current annualised run-rate of approximately $204 million. That same disclosure contains a finding that rarely gets the attention it deserves.

"A review of forms completed in the 2016 and 2017 tax years found that the majority of sales recorded were taxable under other property taxing provisions, and only 19% of the sales were taxable under the bright-line rule." — OIA 25OIA1945, Part 3

Eighty-one per cent of the captured property-tax base does not rest on the bright-line test. It sits under the pre-existing trading and speculating rules: the same rules the original Property Compliance Programme enforced with specialist methodology. Policy interventions have added new taxing mechanisms, but they have not displaced the underlying compliance universe.

Property tax additional assessments FY2009--FY2025, three phases: FY2009-14 establishment floor (~$47.7M/yr average), FY2015-17 first scaling (~$85.9M/yr), FY2018-25 sustained plateau (~$118.6M/yr, reaching $156.8M in FY2024).

The scale of the opportunity grows when you consider what is taxable, not just what is currently captured. The compliance universe for a third-generation programme includes 178,950 taxpayers who claimed $2.019 billion in residential property interest expense in the 2022–23 tax year (OIA 25OIA1249, 12 September 2024), a population directly comparable in scale to the 144,000 Loss-Attributing Qualifying Companies the original programme addressed. Annual residential transaction value stands at approximately $63 billion, derived from REINZ 2025 full-year transaction volume (80,655 sales) at the national median price of $795,000 (REINZ, February 2026). In 2017, Inland Revenue's own public reference figure for residential transaction value was $40 billion (IRD, "Property compliance at Inland Revenue," 19 June 2017). The compliance universe has grown by approximately 60 per cent in dollar terms since the last dedicated property programme funding tranche.

Political context adds urgency to the scale question. The Crown carries a tax debt of $8.5 billion at end-2024, an increase of more than 50 per cent since 2022 (Hon S Watts, Tackling New Zealand's Rising Tax Debt, 22 May 2025). The Government has committed Budget 2025 funding of $35 million per year for additional tax compliance activity, with a published expected return of $4 to $1 in year one rising to $8 to $1 in year two and beyond (Watts, 22 May 2025). That $8 to $1 benchmark is the Government's own stated performance floor. Against a residential property market more than six times the NZX, generating hundreds of millions of dollars per year in additional assessed tax when actively pursued, the question of whether a dedicated programme is warranted is less a matter of novel proposition than of arithmetic.

A proven instrument — and a three-government record

At the heart of this analysis is not the claim that a property compliance programme could work. It is that one did work: the public record says so in detail.

The original Property Compliance Programme ran from a Cullen Budget 2007 allocation through a Budget 2010 re-funding under English and Dunne, to a Budget 2015 property-specific tranche under English and McClay: ten years of cross-governmental continuity across the Cullen, English, and McClay administrations. The outcomes are documented in peer-reviewed academic work. Home and Kelly (2010), published in an OECD-sponsored volume (ISBN 978-1-921701-29-0) by the Investigations Manager (Assurance) and the National Manager Research at Inland Revenue, reports that the programme delivered $83.1 million in assessed revenue against a $66.2 million target, exceeding expectation by 25 per cent. One risk treatment alone, targeting Loss-Attributing Qualifying Companies (LAQCs), generated $637.6 million in ongoing annual compliance benefit (Home and Kelly 2010).

Return on investment figures are part of the Ministerial record. The Hon Peter Dunne reported a 7.5 to 1 ROI at his November 2011 address to the New Zealand Institute of Chartered Accountants (Dunne 2011). Budget 2015 updated that to 7.8 to 1 (Beehive release, Budget 2015). These are the programme's track record: a consistent performance band across two published ratios. They sit below the current Government's Watts $8:$1 benchmark (Hon S Watts, Tackling New Zealand's Rising Tax Debt, 22 May 2025), which is a forward-looking policy target. The historical ROI gives the forward modelling its credibility; the Monte Carlo output is the investment case. The two should not be conflated.

The methodology behind those numbers

The compliance framework the programme operated is Inland Revenue's own adaptation of the OECD 2004 Managing and Improving Compliance diagnostic model, documented in peer-reviewed work and OECD reference notes (Home and Kelly 2010; BC §5.2).

Compliance pyramid per Home and Kelly (2010) Figure 4. MK III focal operation at tiers 3–5.

The pyramid spans five tiers. At its base: make it easy to comply; assist to comply (the voluntary-compliance tiers serving most taxpayers). The programme's distinctive methodology, and its documented performance, concentrated at tiers three through five: deter by detection, enforce by action, full force of the law (Home and Kelly 2010, Figure 4; BC §5.2).

Engaging the top of the compliance pyramid and the bottom are two categorically different propositions — different methodology, different workforce profiles, different evidentiary thresholds, different success criteria.

That categorical distinction matters. Tax-technical adversarial investigation at tiers 3–5 and customer-centric engagement at the base tiers are not two degrees of the same thing. They call on different skills, produce different outputs, and serve different compliance objectives. Both are valuable. Neither substitutes for the other (BC §5.2; deck notes, Slide 11).

Control-group discipline — why the ROI figures were internally contestable

Treatment design within the compliance cycle incorporated control-group structures: a treatment was applied to a target population cohort; a matched control group was observed without the treatment, making causal attribution testable against the counterfactual (BC §5.3). The OECD 2004 Compliance Sub-Group notes on programme evaluation reference this approach; the original PCP operationalised it at practice grade. Each treatment's return could be challenged on its own terms. MK III inherits this design feature: treatment efficacy is reported at treatment-cohort-plus-control grade rather than activity-volume grade (BC §5.3).

Intelligence at the centre

Cited in Home and Kelly's footnote references, the Compliance Performance Management Framework Guide states: "Intelligence is at the heart of the Cycle and will be used to inform our compliance management" (BC §5.4). Running continuously with intelligence as the hub, the cycle (Intelligence, Analysis, Treatment, Evaluation, Feedback) converts AI signal detection on Inland Revenue's Data Intelligence Platform into measurable compliance outcomes; evaluation feeds results back to refine future detection (BC §5.4; Home and Kelly 2010, Figure 10).

That methodology is unchanged. The technology it runs on has improved substantially. What MK III restores is the methodology layer and the dedicated programme focus that converts intelligence into structured outcomes at upper-pyramid scale.

AI produces leads. Investigators produce cases.

Inland Revenue's investment in artificial intelligence is real, substantial, and producing results. The March 2026 information release documents a capable contemporary stack: Data Intelligence Platform running Snowflake with Cortex AI, Microsoft 365 Copilot, and the START core tax system for case management (IR Use of AI -- Information Release, March 2026, §4). In the debt-collection domain, this infrastructure has delivered $166 million recovered across approximately six months (IR2025/442 §4). That is a genuine and instructive precedent.

AI signal detection and methodology-driven treatment: MK III operates in the intersection.

The question this analysis addresses is narrower: what does the AI-led capability not yet do in the compliance domain?

The answer is in the Crown's own published record. Inland Revenue's December 2025 report to the Minister is explicit (IR2025/442 §7):

"A workshop was held with leaders and analysts within Customer Compliance Services (CCS) to identify key business processes where AI could enhance efficiency and boost productivity. Key themes from the workshop have been identified, and the process of determining and prioritizing potential AI use cases is in progress."

That language is developmental. Across all four quarterly reports in the March 2026 information release, there is no reference to property compliance as an AI focus area, LINZ land-transaction data integration does not appear, and the OECD Compliance Risk Management model that underpinned the original Property Compliance Programme is absent (BC §2.3). The gap is not adversarial inference; it is visible in the department's own authoritative documentation.

The aggregation problem. The IRD stack is capable of generating compliance-lead material at scale. What it cannot do on its own is aggregate that material into a deployable investigation case. That aggregation is a judgement about the weight of evidence, witness availability, the credibility of the chain from transactional data to statutory breach, and it sits with experienced tax-technical investigators (BC §2.3). AI produces the lead material. The investigator produces the case. Without both, the stack outputs signals that do not convert into recovery or settlement. The debt-collection programme illustrates the pattern: AI surfaces the signal; dedicated staff execute the structured intervention; outcomes are measurable because the methodology step follows the detection step (BC §2.3).

The agentic layer requires investigator validation. Looking forward, a well-formed agentic compliance architecture depends on validation from investigators who have actually carried cases through dispute resolution or court. Court failure on an inadequately researched or formed case is not merely operational waste: it sets adverse precedent for future compliance work and creates reputational harm to the department's capacity to act at scale (BC §2.3). That validation is not an AI-engineering discipline. It is an investigator-judgement discipline. Without it, an agentic compliance layer risks shipping perceived-value output that does not stand up to the tests that matter.

From days to seconds, and what that changes. The original programme ran on SAS and ArcGIS, requiring multiple days of integration work for each analytical step. The contemporary contrast is striking: Cortex AI natural-language queries now deliver results with "ad-hoc analysis completed in seconds instead of hours or days" (IR Use of AI -- Information Release, March 2026; BC §6.2). MK III methodology runs on infrastructure that is orders of magnitude more capable than what the original programme operated.

MK III's analytical proposition is that AI signal-detection and methodology-driven compliance are complements, not substitutes. The programme operates in the intersection: existing AI infrastructure surfacing leads; dedicated tax-technical investigators converting them into structured, measurable outcomes. The detection half is already funded and operational. The methodology half is what the programme proposes to add.

18.2 to 1 — and why the conservative estimate is the one worth publishing

The headline number from the Monte Carlo model is 18.2 to 1. That is the gross benefit-cost ratio at the 25th percentile, the result when roughly three-quarters of all 10,000 model runs do better. The central estimate is 19.5 to 1. The probability that any single iteration clears the Government's stated $8 to $1 benchmark (Hon S Watts, Tackling New Zealand's Rising Tax Debt, 22 May 2025) is 100 per cent. Every run clears it.

Those numbers invite scepticism, and that scepticism is healthy. It is worth being explicit about where they come from, and about every place the model was deliberately held back from going higher.

Monte Carlo v0.8 gross BCR distribution across 10,000 iterations, with the Watts $8:$1 benchmark marked; P(BCR ≥ 8:1) = 100 per cent.

A narrow distribution

The full percentile table: P5 gives a gross BCR of 16.7 and four-year net Crown revenue of $784 million; P25 gives 18.2 and $862 million; P50 gives 19.5 and $924 million; P75 gives 20.8 and $990 million; P95 gives 22.7 and $1,084 million. The interquartile range spans 18.2 to 20.8 (roughly 14 per cent of the central value), tight by Monte Carlo convention. It reflects six core parameters anchored to public-source evidence rather than free-form assumption. Independent model iterations from v0.6 through v0.8 have converged in the 16.5–22 band with headline figures stable throughout.

The six parameters

The methodology uplift factor is a triangular distribution from 1.5 to 2.0 times at a mode of 1.75 times, a conservative translation of the published PCP record: 7.5 to 1 (Hon Peter Dunne, NZICA, November 2011) and 7.8 to 1 (Budget 2015 release). The 2.0 times upper bound is anchored at the OIA 25OIA1945 empirical 1.8-times step-change when Budget 2015 re-funded the programme. Baseline revenue runs triangular from $180 million to $220 million per year at a mode of $200 million, bracketing the FY2025 annualised run-rate of approximately $204 million. The assessment-to-collection ratio is triangular from 0.70 to 0.90 at mode 0.80. Year-one ramp efficiency is uniform from 0.55 to 0.80; years two to four steady-state from 0.88 to 1.00. Programme cost is fixed at $50 million over four years, profiled $10 million / $13 million / $13 million / $14 million.

Benchmark validation

Model P50 of 19.5 to 1 is validated against four Crown-published reference ratios: the Watts $8 to $1 year-two-plus benchmark (+2.4 times); the IRD Tax Working Group estimate of $5 to $1 from 2009 (+3.9 times); Dunne's 7.5 to 1 (+2.6 times); and Budget 2015's 7.8 to 1 (+2.5 times). The probability of clearing the Watts benchmark is 100 per cent.

One distinction matters. The Dunne and Budget 2015 figures are the original programme's historical track record, sitting just below the Watts $8:$1 line, not above it. It is the MK III model output (applied to a larger compliance universe on contemporary infrastructure) that clears the benchmark, at every iteration.

Why P25 is the right figure to publish

A methodologically aggressive reading of the public evidence base could support materially higher BCR centroids. The v0.8 model does not use the aggressive readings.

The analysis is published at P25 because the model was built to understate, not overstate. The Home and Kelly (2010) single-treatment LAQC outcome ran at approximately 43 to 1 on that treatment alone; using it as an upper anchor would produce dramatically higher centroids; the model declines to do that. The $50 million programme cost sits at the upper end of the appropriation-precedent envelope. The baseline lower bound of $180 million is positioned conservatively for continuing-at-FY2024 scenarios even as the current run-rate exceeds $200 million.

Even at P5 (the bottom five per cent of all iterations), the model returns a gross BCR of 16.7 and net Crown revenue of $784 million over four years. A methodologically aggressive reading of the same public evidence base could support materially higher BCR centroids. The v0.8 model does not use those readings. The P25 figure is the one worth publishing precisely because it is the one that has been worked hardest against.

A hypothesis about 15 years of foregone revenue

The three-phase compliance output record described earlier has a noteworthy feature: its phase boundaries align precisely with documented changes in programme structure. That alignment anchors this section.

The hypothesis runs as three conditionals: if the compliance surveillance capabilities built during the original Property Compliance Programme were not faithfully maintained after the programme was absorbed into business-as-usual; and if the public-domain revenue trajectory is consistent with that being a root cause of reduced collection; then a hypothesised cumulative revenue gap can be extrapolated from the ensuing years. This is a testable proposition, not a claim (the Commissioner holds the authoritative information to confirm, qualify, or disconfirm).

Layer 1: BAU absorption (FY2011–FY2014)

Between the original programme's closure (Home and Kelly 2010; OAG 2011) and Budget 2015's re-establishment of dedicated funding, the OIA 25OIA1945 series averaged $47.7 million per year: the floor phase. Against the subsequent first-scaling rate ($85.9 million per year), the four-year arithmetic gap is $146.9 million; against the sustained plateau rate ($118.6 million per year), $277.7 million. These bounds define a public plausibility window of $147 million to $1,079 million, computed from Inland Revenue's own OIA series. The hypothesis centres Layer 1 at $200 million cumulative, a practitioner-informed centroid inside that window.

Layer 2: Business Transformation-era capability substitution (FY2018–FY2025)

The evidentiary foundation here is Inland Revenue's own documentation. The IRD "Delivering our Transformation" page records that in 2018, more than 3,900 staff moved into new roles across three groups: Customer and Compliance Services (Individuals), Customer and Compliance Services (Business), and Information and Intelligence Services, with the strategic intent of moving toward "data-driven, intelligence-led compliance." The IRD Annual Report 2021–22 characterises the resulting workforce as operating in "capability-based roles, based around transferable skills such as customer service, digital literacy and data and analytics."

This analysis makes no characterisation of that shift as mistaken. What it observes (documented in the published Home and Kelly (2010) compliance pyramid) is a categorical structural consequence: engaging the upper tiers (deter by detection, enforce by action, full force of the law) and the lower tiers are categorically different propositions, requiring distinct methodology, workforce profiles suited to adversarial investigation rather than customer engagement, and evidentiary thresholds that differ accordingly. A workforce reconfigured toward customer-centric engagement does not substitute for tax-technical adversarial investigation capability.

Layer 2 counterfactuals are bounded by the FY2015–2017 first-scaling data. A conservative maturation-factor trajectory implies $83 million cumulative; the strong-form 1.8× step-change-replication trajectory implies $291 million, sitting at approximately the P90 tail, a narrative reference rather than the centroid. The hypothesis centres Layer 2 at $150 million cumulative.

Hypothesised capability-decay across Layer 1 (BAU absorption FY2011–2014) and Layer 2 (BT-era hypothesised capability substitution FY2018–2025). Combined hypothesised gap $350M cumulative.

Combined: $350 million, FY2011–FY2025

The two layers are additive and non-overlapping; the FY2015–FY2017 first-scaling ramp is excluded from both. Combined, the hypothesised cumulative gap is $350 million (mode: $200M + $150M), within a publicly-bounded range of $321 million to $1,479 million.

The $350 million figure is the central estimate of a hypothesis-grade quantification — bounded entirely by Crown-published data, subject to the Commissioner's authoritative advice.

Four alternative interpretations are engaged: market conditions depressing the FY2009–2014 floor; policy interventions reducing the addressable gap; measurement-regime change making the OIA series non-comparable; and AI absorption replacing PCP-era methodology. None defeats the hypothesis. The fourth is hypothesis-consistent rather than contradicting: the IR Use of AI Information Release (IR2025/442 §7) places compliance-focused AI methodology at the "identifying use cases" stage as of December 2025.

The hypothesis holds firm limits: it does not claim capability has in fact decayed, assign responsibility to any individual decision, characterise any personnel as less capable, or treat the $350 million as an identified fact. The figure is a publicly-bounded sensitivity range with a practitioner-attributed centroid, subject throughout to the Commissioner's authoritative test.

Crucially, the forward-looking MK III investment case ($50 million over four years, BCR 18.2:1 at P25, clearing the Watts $8:$1 benchmark at 100% probability (Hon S Watts, Tackling New Zealand's Rising Tax Debt, 22 May 2025)) does not depend on the hypothesis being correct. The Monte Carlo model runs on forward programme parameters alone. The hypothesis adds analytical context: the revenue trajectory is consistent with a pattern in which dedicated structure produces step-change uplift, BAU absorption produces a floor, and workforce reconfiguration toward customer-engagement capability produces a plateau. The Minister does not need to accept any of that history to fund MK III, but the Commissioner's authoritative advice on the underlying capability state would equip the Minister to make a fully informed decision.

$50 million. Four years. No new legislation.

Budget establishment of PCP MK III at $50 million over four years, operating entirely within the existing Income Tax Act 2007 and Tax Administration Act 1994. No new legislation. No new tax policy. The programme restores the investigative and analytical capability that delivered the original programme's outcomes while preserving the customer-centric service approach Inland Revenue has developed since. The Department does not choose between enforcement and service. It does both, as it used to.

Built on what already exists

MK III does not require new infrastructure investment (BC §6.2). Inland Revenue's Data Intelligence Platform, Cortex AI, START case management, and its existing AI-governance tooling are the operational substrate. A programme-dedicated analytical workspace within the DIP holds LINZ property transaction data received under section 16 of the Tax Administration Act 1994, attributed taxpayer records, Cortex AI risk scores, and compliance case metadata.

What a dedicated programme adds is not technology: it is the coordination architecture that converts signals into structured outcomes at scale. The original programme demonstrated that cross-function collaboration (Communications, Policy Advice, Assurance, Customer Insight) is achievable at programme grade. A dedicated programme reduces internal friction rather than creating it.

Organisation

Three capability streams under Programme Manager; 55–79 FTE steady-state establishment.

The Programme Manager is a senior official reporting to the Deputy Commissioner, Customer and Compliance Services (BC §7.2). Three capability streams sit beneath: Intelligence and Analysis; Methodology and Treatment (the load-bearing stream of dedicated property compliance investigators executing the six-phase treatment process at upper-pyramid grade); and Measurement and Improvement, which closes the compliance cycle. Steady-state establishment runs between 55 and 79 FTE.

Organisational placement of MK III (Investigations-Unit model or integrated-structure under the contemporary operating model) is a design variable for the Commissioner's advice, not a fixed assumption (BC §7.3). The fourth Ministerial Request invites the Commissioner to advise which architecture can deliver the programme at the level of autonomy and accountability the appropriation would warrant.

Governance autonomy

A programme at this scale cannot be governed like a business-as-usual workstream. Four elements together constitute the structural-fitness requirement (BC §7.5): a Sponsor at Commissioner level (or Deputy Commissioner as a minimum floor); a Steering Committee chartered for the programme with cross-silo authority; line-management financial delegations scaled to programme operation; and explicit insulation from quasi-internal governance bodies whose remit sits outside the programme's funded objectives. These elements are integrated in a published annual accountability matrix.

Four years, profiled

Implementation Roadmap summary: Year 1 ramp, Years 2–4 steady-state, end-Year-3 continuation decision.

Year 1 is a ramp, covering workforce recruitment toward full investigator establishment, DIP workspace provisioning, Cortex AI model training, and case-selection pipeline initialisation. The cost profile reflects this: $10 million in Year 1, $13 million in Year 2, $13 million in Year 3, and $14 million in Year 4: $50 million total (BC §8.3). Years 2 through 4 operate at steady-state across the full compliance pyramid, with an end-of-Year-3 continuation decision within the Budget process.

Four transition-to-BAU safeguards are built in to prevent the capability-decay pattern from recurring (BC §7.8): ring-fenced funding; published annual capability-state reporting at treatment-to-outcome traceability grade; independent methodology-evaluation at programme midpoint and funding-cycle boundaries with defined Steering Committee escalation; and an explicit sunset clause with re-authorisation requirement. These are internal design elements. No legislation required.

Programme-close produces the observational data that future analysts will need to confirm or refute the capability-decay hypothesis this analysis advances.

The limits of the argument

Any analysis advancing a case this strongly has an obligation to state (plainly, not as footnote) what it is not claiming. The Business Case sets five explicit limits, and they are structural features of the framework, not disclaimers added as afterthought (BC §3.7).

Capability decay is a hypothesis, not an assertion. The analysis holds that the public-source trajectory is consistent with capability decay. Only the Commissioner's authoritative advice would confirm, qualify, or disconfirm it.

No person or decision is characterised as mistaken. The Business Transformation choices were publicly made within the Department's statutory discretion. This analysis observes structural consequences visible in the yield data; it makes no motivational or evaluative claim about the decisions or the people who made them.

No inference is drawn about personnel capability. Any inference at individual or cohort level is outside both the evidence base and the methodological honesty of this analysis.

Reduced reporting granularity is not attributed to intent. The observation is structural (the public reporting architecture operates at aggregate grade rather than capability-type-FTE grade), not a characterisation of motive.

No specific gap figure is presented as an identified fact. The $350 million combined centroid is a public-source-bounded sensitivity range with a practitioner-attribution centroid, framed throughout as a proposition subject to the Commissioner's authoritative test (BC §3.5.3).

Two further methodological limits apply. The Monte Carlo model is at iteration v0.8 and has not been through L4 Crown Financial Modeller peer review: this analysis is pre-peer-review, and v0.9 is reserved for peer-review findings only. The evidence base is exclusively public-domain: the analysis makes no claim to completeness, only to public-source defensibility (BC §9.4).

None of these limits weaken the forward investment case. As the Business Case states directly, the MK III programme clears the Government's stated $8:$1 benchmark (Hon S Watts, Tackling New Zealand's Rising Tax Debt, 22 May 2025) at 100% probability in the Monte Carlo distribution without the Minister needing to accept any of the hypothesis-grade historical observations. The capability-decay hypothesis provides analytical context; the forward case stands independently (BC §9.4).

Four questions for the Commissioner

The forward-looking investment case stands without the historical hypothesis: the Monte Carlo model clears the Government's $8:$1 benchmark at 100 per cent on MK III's own merits. But a Minister deciding whether to commission a programme of this scale would want to know the current state of the underlying capability. Standard Ministerial process provides the mechanism, and this analysis commends four specific advice requests to the Minister's consideration (BC §13.2).

The first asks the Commissioner to advise the current tax-technical investigator establishment, total FTE, recruitment criteria, tax-technical qualification profile, and deployment pattern across compliance functions, with particular reference to the capability required to execute a programme of the scale and methodology this proposal envisages. Four public-source retrieval pathways were attempted during the research phase and none surfaced publicly disclosed specific-FTE data (BC §3.6).

The second asks the Commissioner to advise how customer-engagement personnel recruited during the Business Transformation period have been configured, trained, and deployed for adversarial compliance engagement with sophisticated taxpayers, explicitly distinguishing that from the customer-service-oriented engagement for which the workforce was originally designed. The compliance pyramid (Home and Kelly 2010) frames this as a categorical distinction, not a spectrum: upper-tier adversarial investigation and lower-tier service engagement operate under different methodologies, with different evidentiary thresholds, and against distinct success criteria.

The third asks whether post-PCP public reporting granularity is sufficient for the Minister and the House to assess the department's adversarial compliance capability, or whether a more granular reporting regime would assist Ministerial and parliamentary scrutiny of the compliance function's ability to address the property compliance universe documented in OIA 25OIA1945. The current reporting architecture does not disaggregate investigator FTE by capability type (BC §3.6). This is a structural observation, not an inference.

The fourth addresses programme governance as it would need to be, not capability as it stands. It asks the Commissioner to advise how a dedicated programme at MK III scale would be structured to deliver with appropriate autonomy, specifying Sponsor seniority (Commissioner or Deputy Commissioner level), the charter and cross-silo authority of the governing Steering Committee, line-management financial delegations, and the mechanism by which the programme would be insulated from quasi-internal governance bodies whose remit is not aligned with the programme's funded objectives. This is a structural-fitness question, not a yes/no capability question (BC §13.2).

All four requests follow standard Ministerial process and impose no burden beyond the Commissioner's existing accountability obligations. The responses would equip the Minister to make a fully informed decision on whether to commission or decline a programme of the scale proposed.

Methodology and sources

The public-source firewall

The author worked on the Property Compliance Programme at Inland Revenue during the 2007–2010 funding period. Sixteen years separate that engagement from this analysis. The statutory tax-secrecy obligations under the Tax Administration Act 1994, and the contractual confidentiality obligations signed with Inland Revenue at the time, carry no statute of limitations. They have been treated as persisting in full, and the research has been produced accordingly.

The evidence base is exclusively public-domain: Inland Revenue OIA releases, the March 2026 "Inland Revenue Use of AI" Information Release, Office of the Auditor-General published reports, peer-reviewed academic publications (Home and Kelly 2010; McLisky 2011), Treasury and Beehive Budget releases, and Inland Revenue Annual Reports. Every factual claim carries an inline public-source citation at the point of claim; Appendix A of the business case consolidates the full citation set for audit purposes. No taxpayer data, no case-specific information, no audit outcomes, and no internal investigation methodology appears in this analysis.

Parameter calibrations within publicly-bounded sensitivity ranges are informed by general professional expertise applied to public inputs, not by confidential information acquired in engagement duties. Individual names of former Inland Revenue staff do not appear in body text; roles are referenced at position-class level; peer-reviewed bibliographic citations are retained per standard academic citation practice. This compliance position has been formally reviewed against the signed instruments and applicable provisions of the Tax Administration Act 1994.

How the numbers were produced

The quantitative model uses Monte Carlo simulation at 10,000 iterations with a fixed seed (seed 42), implemented in Python. The fixed seed makes the model fully reproducible from its build script. Headline figures are reported at the P25 (conservative) position, with P50 (central estimate) alongside.

Three explicit limitations apply. The model is pre-peer-review: a further iteration is reserved for Crown Financial Modeller findings, and any material finding would require a new iteration before peer-review-final status. The evidence base claims public-source defensibility, not completeness. And the capability-decay hypothesis is framed as a testable proposition because the Commissioner of Inland Revenue holds authoritative information that public sources alone cannot supply.

This analysis was produced with substantial AI assistance under Steko Consulting's structured methodology framework, covering research synthesis, quantitative modelling, draft generation, and structural composition. All material has been reviewed by the named author, who takes full professional responsibility for its contents. The production model is noted in the colophon.

Accessing the full analytical compilation

The Ministerial Briefing Deck is available on request. Beyond it, the full analytical compilation produced for this work is held offline and released by written request to thinking@steko.co.nz, assessed for good purpose. Each part fed a distinct layer of the case:

The Strategic Business Case: the consolidated investment case this article condenses.
The Quantitative Model: the Monte Carlo simulation that produced the benefit-cost distribution this article headlines, including the P25 and P50 ratios and the net-benefit figures, reproducible from a seeded run.
The Solution Architecture: the technical operating model the cost envelope rests on.
The Organisational Design: the workforce structure behind the staffing assumptions.
The Implementation Roadmap: the four-year delivery sequence over which the case is costed and returned.

Requests should identify the requestor, describe the purpose for which the material is sought, and note acceptance of the intellectual-property terms embedded in the compilation and the practice's publication standard. The author responds personally to each request following individual assessment for good purpose; this is not an automated download function.

This is the published analysis. The supporting compilation goes deeper.

The Ministerial Briefing Deck is available on request. The deeper analytical compilation — the Strategic Business Case, the Monte Carlo Quantitative Model, the Solution Architecture, the Organisational Design, and the Implementation Roadmap — is held offline and released by written request, assessed for good purpose. If this analysis is relevant to work you are doing, request access below.

Request the full paper →

Sources and provenance

The evidence base for this article is exclusively public-domain. No taxpayer data, no case-specific information, no audit outcomes, and no internal investigation methodology appear here.

Primary sources drawn on across the analysis include: Inland Revenue OIA releases (25OIA1945, 19 March 2025; 25OIA1249, 12 September 2024); the Inland Revenue "Use of AI" Information Release, March 2026 (IR2025/068; IR2025/229; IR2025/365; IR2025/442); Office of the Auditor-General, Inland Revenue Department: Making it easy to comply (OAG 2011); Home and Kelly, "Building Bridges to Compliance — Inland Revenue New Zealand Property Case Study" (2010, ISBN 978-1-921701-29-0); McLisky, doctoral thesis, Massey Research Online (2011); Treasury and Beehive Budget releases across three changes of government (Budget 2007 through Budget 2025); Inland Revenue Annual Reports; Inland Revenue, Delivering our Transformation (ird.govt.nz); Statista Real Estate — New Zealand outlook (2025); Reserve Bank of New Zealand Housing statistics M10; NZX Limited market value reporting; Giles, "Modelling the Hidden Economy and the Tax-Gap in New Zealand," Empirical Economics vol. 24 no. 4 (1999); IRS Publication 5869; and HMRC, Measuring Tax Gaps 2025.

The quantitative model (Monte Carlo v0.8, 10,000 iterations, seed 42) is reproducible from the seeded build script. Every factual claim in this article can be traced to an identified public source on request.

Edition: First Edition, 11 May 2026.

How this article was produced. This article was produced under Steko's structured research-publication method, a governed production method for research articles. The method requires argument definition, source inventory with gap analysis, structured section architecture, draft production, source-faithful resonance checking, and a multi-pass adversarial review (hostile-reader archetypes, substance traceability, register and brand compliance, and AI-fingerprint mitigation) before publication.

What the practitioner brought. The author was part of the original Property Compliance Programme team at Inland Revenue between 2007 and 2010; that experience grounds the analytical posture and the selection of the public-source evidence base. The statutory and contractual confidentiality obligations from that engagement were treated as persisting in full and firewalled from the analysis. The argument, the architectural framing, editorial judgement, adversarial challenge through full remediation, and publication approval rest with the author throughout.

What the production engine brought. Synthesis of the public-source evidence base into publication-class prose; per-section drafting to architectural specification; structural consistency across sections; substance-traceability scoring against cited sources; mechanical AI-fingerprint diagnostics; and first-pass hostile-reader archetype generation for adversarial review.

Powered by Claude Opus 4.7 (1M context), for the research and analytical production, and Claude Opus 4.8 (1M context) with Claude Sonnet, for the publication and editorial production.

Hostile-reader review. Six archetypes tested. The political-misuse and confidentiality-firewall findings were remediated to DEFENSIBLE; methodology critiques of the underlying quantitative model are addressed in full in the gated analytical compilation, which the published article represents faithfully.

Substance traceability. 100 per cent of high-stakes factual claims verified against named public-domain sources.

Practitioner spot-check. The author reviewed each load-bearing claim through staged ratification across the production gates.

Register compliance. PASS: a consistent, evidence-led editorial voice and brand consistency maintained throughout.

AI-fingerprint mitigation. All six diagnostics within threshold.

We have made best efforts to ensure the accuracy and integrity of this article. If you believe any claim, citation, or finding requires correction, we welcome that feedback at thinking@steko.co.nz and will undertake to review and respond accordingly.

← Back to Thinking