The Specification Advantage

Research · Methodology

Why AI Adoption Is a Governance Decision, Not a Technology Decision

First Edition — 28 March 2026 · Greg Williams · steko.co.nz/thinking

For professions whose value is built on judgment and whose liability is personal, AI adoption is not a technology decision — it is a specification decision. The firms that will adopt AI most successfully are not the ones moving fastest, but the ones whose professional discipline already requires them to define what "correct" looks like before they act. Quantity surveying practice — where every payment certificate, variation assessment, and extension of time recommendation carries personal professional liability — illustrates what that specification discipline looks like when applied to a domain where the consequences of getting it wrong are material, personal, and legally enforceable. This paper argues that specification is not a prerequisite for AI adoption. It is the adoption.

What this paper found

—The firms that will adopt AI most successfully are those whose professional discipline already requires them to define what "correct" looks like before they act. Specification is not a prerequisite for AI adoption — it is the adoption.

—Three structural failure patterns — confident fabrication, anchoring bias, and surface-level quality checks — are validated by clinical AI research and apply directly to professional services.

—A governed trust framework calibrated to consequence and earned through evidence replaces binary trust with proportional governance. The difference is not the tool — it is the consequence of being wrong.

—Smaller teams with governed AI workflow can cover scope that previously required much larger groups — but the binding constraint is correctness, not volume.

The governance maturity model in Section 11 shows three horizons achievable within 60–90 days without new technology investment — and why the compliance baseline alone is a robust professional position.

The Regulatory Moment

The Royal Institution of Chartered Surveyors published its Responsible Use of AI in Surveying Practice global standard in September 2025. The compliance window closed on 9 March 2026. It is now mandatory for all RICS members and regulated firms worldwide.

The standard is built around four obligations: governance and risk management, professional judgment and oversight, transparency and client communication, and ethical development. It is a conduct standard, not a technology guide — it governs professional behaviour and will be taken into account in RICS regulatory, disciplinary, and legal proceedings.

The standard arrived into a profession that is, by its own data, largely unprepared. The RICS AI in Construction 2025 report found that forty-five percent of organisations report no AI implementation at all. Fewer than one percent have AI embedded organisation-wide. Meanwhile, a separate RICS skills survey found that nearly seventy percent of quantity surveying professionals believe AI will help them deliver greater value in the future.

The gap between intent and capability is the central challenge

These numbers describe an industry that overwhelmingly believes AI will transform its practice but has not built the methodology infrastructure to make that transformation safe, auditable, or professionally defensible. The skills gap, data quality problem, and integration challenge are real — but they are engineering problems, not fundamental obstacles. Firms that treat them as engineering problems solve them. Firms that treat them as reasons to wait do not.

Professional Judgment at Stake

The case study that runs through this paper is drawn from quantity surveying practice. Not because the principles are confined to that profession, but because quantity surveying illustrates them with unusual clarity. Any profession where judgment is the product, liability is personal, and the consequences of error are material will recognise its own situation in what follows.

A quantity surveyor governs the financial integrity of a construction project from feasibility through to final account. The role extends well beyond cost estimating to include payment assessment and certification, variation valuation, extension of time assessment, procurement advice, contract administration, and final account settlement. Every payment certificate, every variation valuation, every extension of time recommendation carries personal professional liability. The individual who signs is professionally and legally accountable for its accuracy, regardless of what tools or processes were used to produce it.

A cost plan that is wrong by ten percent on a fifty-million-dollar project is a five-million-dollar problem. Missed notice obligations under NZS 3910 are a primary source of construction disputes. Quantity surveying earns its place as a case study because the quality of the work is ultimately verified not by the firm's own assessment but by the courts, the regulator, and the client's willingness to return.

The Lifecycle Seams

AI's value in a quantity surveying context sits at particular moments across the development and construction lifecycle where information needs to move quickly, accurately, and in a form that reduces rework rather than creating it.

Three seams matter most. From board endorsement to quantity surveyor engagement, the prize is iteration speed — structured brief templates, instant scenario analysis, and shared data environments that eliminate the translation step where information is re-entered, reformatted, or lost. From feasibility to investment decision, data provenance matters most — every rate, every multiplier, every professional fee percentage traceable to a source, a date, and a methodology. Under NZS 3910 contract administration, the quantity surveyor's role transitions from advisory to contractual — payment claim processing, variation management, extension of time assessment, and notice obligation monitoring, all high-volume, high-consequence, time-critical work.

Each of these capabilities is valuable. None of them is safe without the governance layer that tells the quantity surveyor when to trust the output and when to verify it independently.

Specification as the Foundation

How a professional firm integrates AI is primarily a methodology question, not a technology one. How does the firm govern the way AI is used so that the professional quality of its output is maintained, and demonstrably so, at every stage?

The answer begins with specification. Before any AI tool is deployed on any work type, the firm must define what correct looks like for that output. Not a new burden, but a formalisation of discipline the profession already practises.

Self-contained problem statements

The test for a self-contained problem statement is whether the task can be solved without the AI filling gaps with assumption. AI fills gaps with statistical plausibility, which means it guesses in ways that are often subtly wrong and professionally unacceptable.

For quantity surveying practice: a self-contained brief for a payment claim assessment includes the contract value, the assessment period, the schedule of quantities, the claim documentation, and the applicable contract clauses. A brief that references "the usual approach" or "as per last time" is not self-contained and will produce unreliable output.

Acceptance criteria

The test for acceptance criteria is whether three statements can be written that an independent professional could use to verify the output is complete and correct, without asking the author any questions. If those statements cannot be written, the task is not well enough understood to be delegated — to AI or to anyone else.

For quantity surveying practice: the acceptance criteria for an extension of time assessment include that all critical path activities affected by the delay event have been identified and assessed, that the methodology applied is consistent with the contract's extension of time clause, and that the recommended entitlement is supported by programme evidence referenced in the assessment. These criteria are what the partner reviews before signing off, whether AI was involved or not.

Constraint architecture

Four categories that every specification must address: what the AI must do, what it must not do, what the firm's preferences are, and what triggers escalation to senior professional review. Every constraint must earn its place. If removing it would not cause a mistake, it should not be there.

For quantity surveying practice: a variation assessment specification must include the constraint that the AI must not recommend a value outside the firm's delegated authority range without escalation, must not cite contract clauses without verifying they exist in the applicable contract version, and must flag any item where the scope description is ambiguous rather than proceeding on an assumption.

Decomposition and evaluation design

Complex outputs are broken into independently verifiable components, each with clear input and output boundaries. For each component, define the test cases that confirm correct execution. Run those tests before the component goes into production use, and re-run them after any change to the AI tool or the underlying data.

The firms that move fastest to genuine AI capability in professional practice will not be the ones who adopt tools first. They will be the ones who do the specification work first — because when a firm knows exactly what good looks like, it can tell within minutes whether AI is producing it. A firm with specifications in place can deploy a new AI capability to a work type in days, because the acceptance criteria already exist and verification is immediate.

There is a secondary benefit that matters regardless of AI. The discipline of specifying what correct looks like for each output type improves the firm's professional quality whether or not AI is ever deployed. The specification work is not contingent on AI proving adequate. It strengthens the firm's practice either way.

When AI Gets It Wrong

Every professional considering AI adoption needs to understand three patterns of failure that are not bugs to be fixed but structural characteristics of how large language models interact with professional work. Research published in Nature Medicine in 2026, examining AI deployment in clinical settings at Mount Sinai Health System, validated these patterns empirically. The structural patterns transfer directly to any setting where AI interacts with professional judgment.

The confident fabrication. Every large language model will occasionally produce confident, well-structured nonsense. Prevention is not realistic. What matters is catching it before it reaches a client. A governed workflow addresses this: source documents are provided as inputs, not assumed; the task carries a specification that defines what correct looks like; verification checks specific claims against specific sources. The consequences of not catching fabricated output are now well documented — at least one major professional services firm has faced public inquiry, financial penalties, and partner departures after AI-generated content in a government contract contained fabricated citations that passed internal review.

The anchoring trap. Once you have seen a number, an answer, or a draft, your judgment shifts toward it whether you intend it to or not. If the professional sees the AI output before completing their own assessment, the comparison is worthless. The structured approach is to run both streams blind — the traditional work product and the AI work product built independently, compared only after both are complete.

The surface-level check. Quality mechanisms that trigger on surface-level patterns — professional formatting, appropriate terminology, logical structure — rather than on actual substance are the most insidious failure pattern. The antidote is specificity in verification. Not "review this for quality" but "verify that the variation instruction reference exists in the contract register, that the valuation clause cited is applicable to this variation type, and that the arithmetic reconciles."

Trust as Governance

Most people treat trust as binary. Either you trust the tool or you don't. In practice, trust operates on a spectrum that varies by task, by consequence, and by the evidence accumulated through verified use.

You might reasonably trust AI to draft a preliminary cost summary from structured data while requiring full independent review before that same output informs a payment certificate. The difference is not the tool. It is the consequence of being wrong.

A governed trust framework does three things that gut feeling cannot. It makes the evidence requirements explicit. It degrades predictably — when something goes wrong, the framework responds proportionally rather than collapsing into "don't use AI for anything." And it accumulates: as the firm builds a track record of verified outputs across specific task types, confidence grows on evidence rather than hope.

This maps directly to how professional firms already manage risk. You do not let a graduate sign a payment certificate. You do not let a senior surveyor certify a final account without partner review on complex contracts. AI governance follows the same logic. The only difference is that the competence being assessed belongs to the tool-plus-workflow combination, not a person.

Why AI Feels Like Extra Work

Ask any professional who has tried AI in their workflow and most will report the same experience: it created more work, not less. The AI drafts something. The professional reads it. Then they re-read the source material anyway because they do not trust the draft. Then they edit. The net result is three passes where there used to be one.

The problem is workflow design, not AI. The duplication happens because the AI was asked to produce a finished output without anyone specifying what "finished" looks like for that task. Without that definition, the professional has no choice but to check everything.

The fix is to specify before generating. When acceptance criteria are defined before the AI runs, the review changes character entirely. Instead of re-reading the whole document to see if it "looks right," the reviewer checks specific criteria against specific outputs. The review is faster, more reliable, and catches real errors rather than pattern-matching against a general sense of unease.

Specification in daily practice looks like the difference between "check this for me" and "verify these five things are correct." The first instruction produces duplication. The second actually saves time.

Smaller Teams, Larger Mission

Conventional framing for AI in professional services is efficiency: the same work, fewer people, lower cost. That framing is incomplete in a way that leads to poor decisions.

The mathematics of team coordination have been understood since the 1970s. Fred Brooks demonstrated that communication pathways in a group grow by the formula n(n-1)/2. A team of five has ten pathways. A team of ten has forty-five. A team of twenty has one hundred and ninety.

At five people, everyone holds the full context. Beyond that, meetings multiply to compensate.

Robin Dunbar's research on cognitive group size limits and Brooks's software engineering work reached complementary conclusions: at around five people, every member of a team can hold the full context in their head. Beyond that threshold, shared understanding degrades and meetings multiply to compensate.

What AI changes is not the number. What changed is the output per person. When each professional on a team is producing significantly more value with AI assistance, the coordination cost of adding the next person rises in proportion.

For a quantity surveying firm, this reframes the team structure question entirely. The goal is not to reduce a team of twelve to six and pocket the savings. The goal is to recognise that a team of five senior professionals, equipped with governed AI workflow, can cover the scope that previously required a much larger group. The remaining professionals form additional small teams pointed at work the firm could not previously resource: new service lines, deeper client relationships, the kind of proactive advisory work that gets squeezed out when everyone is buried in production.

AI made volume cheap. The binding constraint is correctness. A team of five can maintain the shared context required to verify correctness. A team of fifteen cannot, regardless of how much AI tooling they have, because the shared mental model fragments and the meetings required to resynchronise it consume the time AI was supposed to free up.

When a Senior Professional Leaves

Every professional firm has experienced the departure of a senior practitioner. The immediate concern is capacity. The deeper loss is harder to see and harder to replace.

A senior quantity surveyor with fifteen or twenty years of experience carries judgment that exists nowhere in the firm's systems. They know which subcontractors to trust on complex packages. They know the patterns in a cost plan that signal trouble before the numbers confirm it. That knowledge leaves with them regardless of how well the handover is managed.

The Platform Vendors

Platform vendors are already embedding AI into their product suites. Specialist tools for automated measurement, cost classification, and predictive analytics are appearing across the construction software market. Within the next two to three years, AI features will be standard in most quantity surveying software packages, not optional extras.

For a firm that has invested deeply in its primary software platform, the natural question is: can we wait for the vendor to deliver AI and simply adopt what they ship?

Charting the Course

AI adoption in a professional services firm follows a governance maturity progression, not a technology implementation plan. Each layer requires the one beneath it.

The compliance baseline

For any RICS member or regulated firm, the first horizon is immediate and achievable within sixty to ninety days without any new technology investment. Five actions constitute the baseline: an AI use register, a material use threshold, a client disclosure protocol, a data handling policy, and a staff guidance framework.

These five actions constitute the foundation that everything else is built on. They are also, on their own, a robust compliance position.

The specification and governance foundation

The second horizon is where the substantive methodology work happens. Output-type specifications written for the firm's primary deliverables, each with acceptance criteria and constraint architecture. Evaluation test suites with three to five test cases per output type, using known correct outputs, tested and signed off by senior professionals. Blind parallel testing on two to three live projects, where AI and traditional work products are built independently and compared only after completion. And a partner-level intent framework that explicitly documents escalation triggers, delegation boundaries, and firm non-negotiables.

Expect this horizon to take time. Not because the technology is complicated, but because the process of specifying what good looks like, at the level of rigour required for professional liability, demands the attention and judgment of the firm's most experienced practitioners.

The knowledge architecture

The third horizon is the long-term asset: a knowledge architecture where the firm's cost intelligence, contract precedents, methodology documents, and professional standards are structured and maintained so that AI can reliably access them, and so that the departure of a senior practitioner does not deplete the firm's capability. This is where the specification work and the knowledge retention work converge. The firm that has specified what correct looks like for its output types has, in the process, externalised the professional judgment that makes those outputs reliable.

Where This Goes

The argument of this paper is that specification is the adoption. Firms whose professional discipline already requires them to define what correct looks like before they act are structurally closer to governed AI capability than they may realise. The specification work formalises the discipline that built the firm's reputation in the first place. AI did not create the need for it. AI made the need visible.

The firms that have survived previous technology transitions, from manual measurement to electronic takeoff, from paper-based contract administration to digital document management, did not succeed because they bought the best software. They succeeded because they understood what the software was for and built their workflows around it with discipline.

AI follows the same pattern. The tools will change. The professional obligation will not. The risk is not moving too slowly. Firms that take the time to build a solid governance foundation and then adopt AI confidently across their practice will outperform firms that rush into unstructured adoption and spend years cleaning up the errors and rebuilding client trust. The risk is not moving at all, and finding in three years that competitors have built the capability that clients now expect as standard.

The most useful first step is rarely a technology purchase. It is a clear-eyed assessment of the current state: what AI tools are already in use and under what governance, what the firm's document estate and data architecture actually look like, where the knowledge is concentrated and where it is fragile, which output types present the highest liability exposure and the highest efficiency opportunity. That diagnostic work is the foundation that determines what a credible, sequenced programme looks like. It is also, for most firms, the moment when the gap between where they are and where they need to be becomes concrete enough to act on.

This is the summary. The full analysis goes deeper.

The complete research paper includes the detailed specification framework, per-output-type worked examples, the full trust governance model, team structure analysis, knowledge architecture design, and the methodology for applying this approach to your own firm. If this thinking is relevant to what you're working on, the full paper is available on request.

Request the full paper →

Sources & Provenance

Brooks, F. P. (1975). The Mythical Man-Month: Essays on Software Engineering. Addison-Wesley.

Dunbar, R. I. M. (1992). Neocortex size as a constraint on group size in primates. Journal of Human Evolution, 22(6), 469-493.

Ramaswamy, V., Klang, E., Nadkarni, G., et al. (2026). AI performance in clinical triage settings. Nature Medicine. DOI: 10.1038/s41591-026-04297-7.

RICS (2025a). Responsible Use of Artificial Intelligence in Surveying Practice. 1st Edition, September 2025. Royal Institution of Chartered Surveyors.

RICS (2025b). AI in Construction 2025. Royal Institution of Chartered Surveyors.

RICS (2025c). RICS Skills Survey Q2 2025. Royal Institution of Chartered Surveyors.

Standards New Zealand. NZS 3910:2023 — Conditions of Contract for Building and Civil Engineering Construction. Standards New Zealand.

Colophon

Edition: First Edition — 28 March 2026

How this article was produced

This article was produced under the Stillwaters Research Publication Protocol (RPP-001 v0.1.0), a governed production method for research articles. The protocol requires argument definition, source inventory with gap analysis, external enrichment, structured section architecture, adversarial review, and substance traceability verification before publication.

What the practitioner brought: The practitioner directed the argument reframe (from discussion paper to generalist thesis with QS as case study), shaped the editorial constraints (including the standing rule that no identifiable firm or individual appears in the text), selected the section architecture, conducted independent citation verification of three key claims, caught a missing quality gate on AI fingerprint markers that the production engine's own review process had missed, and approved every design decision from story selection through publication.

What the production engine brought: Research synthesis across five external source searches (RICS standard verification, RICS survey data, Brooks and Dunbar verification), source inventory and gap analysis, draft production at publication depth, concurrent bibliography construction, structured three-pass adversarial review with three hostile reader archetypes, substance traceability scoring, and AI fingerprint mitigation.

Powered by Claude Opus 4.6 · RPP-001 v0.1.0

Hostile reader review	3 archetypes tested (AI-First Director, Sceptical Partner, Academic Commentator). All DEFENSIBLE.
Substance traceability	94.1% of factual claims verified against cited sources.
Practitioner spot-check	3 citations independently verified by practitioner. All confirmed.
Register compliance	PASS — lens-not-subject, psychological register, brand consistency.

We have made best efforts to ensure the accuracy and integrity of this article. If you believe any claim, citation, or finding requires correction, we welcome that feedback at [email protected] and will undertake to review and respond accordingly.

← Back to Thinking