The Confidence Compression Problem

AI didn’t make us more ignorant. It collapsed the time between an under-informed decision and the bill arriving.

May 06, 2026

Picture a regional SME-lending fintech in Jakarta or Manila. The CEO is in the board pre-read meeting on Monday morning. Slide twelve says the new AI-assisted underwriting flow is shaving forty percent off application time and is now handling the long tail of book of business that used to sit in an analyst’s inbox for three days. A non-executive director asks the obvious question. “Can you walk us through how it decides?” The CEO says yes. He’s not lying on purpose. He knows the integration works, the latency numbers, the cost per call, the vendor’s SOC report. He doesn’t know what is actually happening inside the foundation model that is now declining loans for thousands of small business owners every week. By Friday he will be sitting in a different meeting with the regulator, and the question will not be friendly.

This is how every regional fintech, every retail platform, every healthcare scheduler, every recruiting tool currently deploys AI in 2026. The thing nobody says out loud is that almost nobody approving these systems can explain them, and the people who can’t approve them won’t admit it, because admitting it sounds like admitting you’re behind the times.

The standard reading of this is that we are collectively dangerous. We are shipping things we don’t understand. The knowledge illusion has gone operational, and somebody is eventually going to pay for it on a stage they did not pick.

That reading is correct, and almost completely useless to the CEO sitting in the meeting.

The Argument You’ve Heard Already

Steven Sloman and Philip Fernbach’s The Knowledge Illusion made the case nine years ago that humans don’t actually understand most of what they think they understand. We mistake collective access to knowledge for personal possession of it. The community of knowledge: your neighbor knows how toilets work, the engineer down the street knows how power grids work, somebody at the FDA knows how vaccines work, and you outsource the cognition to all of them while feeling like you understand the world.

This was true in 1820. It is more true now. None of this is news.

What’s news is the speed.

For most of human history, the gap between an under-informed decision and its visible consequence was measured in years. The bridge would be built. The treaty would be signed. The crop would fail. Time gave the community of knowledge a chance to catch up. Engineers would inspect, regulators would review, journalists would investigate. Yuval Noah Harari’s Nexus reads, on one level, as a 500-page argument about information networks. Underneath, it’s a story about latency. Every information technology since cuneiform has compressed the gap between decision and consequence, and every compression has produced a class of catastrophic mistakes that the previous latency would have caught.

AI is the steepest of those compressions so far, and the genuinely novel thing is that we’re shipping it through a procurement process designed for a SaaS subscription, not for a system that decides who gets a loan.

What Actually Changed

It is not new that operators ship things they don’t understand. That has been the job description since the first VP of Engineering signed off on a database migration she couldn’t have personally written. Stanley McChrystal’s Team of Teams makes the case bluntly: in any sufficiently complex environment, no single person has the context. The whole reason institutions exist is to distribute cognition across people, processes, and time. Demanding deep individual comprehension before action isn’t humility. It’s the corporate-architect disease that kills startups.

What changed isn’t comprehension. It’s three things, all moving in the same direction:

The first is opacity. Earlier production systems failed in legible ways. A database migration breaks tables; you grep the logs, find the join, fix it. A machine-learning system in 2018 was opaque, but you could at least audit the training data and the feature set. A foundation-model integration in 2026 is opaque the way the weather is opaque. There’s no log to grep. The system you shipped on Monday and the system handling your Friday traffic are mathematically the same model behaving differently because the prompt distribution shifted by half a percent. The post-hoc audit isn’t slow. It’s structurally absent.
The second is scale-on-deploy. Twenty years ago you shipped something to 50 users, then 500, then 5,000, with months between each step and a complaints inbox in between. Now a Stripe-flavored integration ships you straight to your full traffic on Tuesday afternoon. The customer-correction loop that used to run continuously across the rollout has been replaced with a release-notes paragraph and a Slack channel.
The third is the cost of the small mistake. Nassim Taleb’s Antifragile divides systems into convex (gain from disorder) and concave (lose from disorder). Most production decisions used to be roughly linear: small errors produced small problems, large errors produced large problems, and you got time in between to course-correct. AI systems running at deploy-day scale are concave with the curve cranked steep. Small errors accumulate invisibly until a threshold, at which point they crystallize into something like a regulatory action, a class-action lawsuit, or a 60 Minutes segment.

All three together do something the knowledge illusion alone never did. They take the thing humans have always done, operate systems they don’t individually understand, and remove the time the community of knowledge needed to assert itself.

This is the problem. Not ignorance. The compression of confidence into action.

The Turn: Maybe All of This Is Just Senior-Manager Cope

I don’t trust this argument fully and you shouldn’t either.

Here’s why. Every company I’ve worked with in my career has shipped things they didn’t fully understand. That is literally the job description of running a company. The MVP is, by definition, the thing you ship before you understand it, because if you understood it, you’d already be Stripe. Demand for “deep comprehension before action” is the language of the corporate VP who has fifteen direct reports, four committees to consult, and a quarterly review that punishes mistakes more than it rewards velocity. It is not the language of someone who has ever shipped a product that mattered.

If you take the argument I’ve been making to its logical conclusion, you arrive at the position that companies in 2026 should slow down. Stop shipping AI features until they can fully audit them. Wait for explainability tooling to mature. Convene cross-functional review boards. Run every model through a 90-day risk pilot.

This is exactly what the incumbents are doing. It is exactly why they are losing.

The pattern shows up in every banking implementation I’ve watched. There’s always a tier-one bank with a fully built risk apparatus, three layers of model governance, and a working group that meets every other Wednesday to discuss generative AI. There’s always a tier-three challenger with a CTO, a junior engineer, and a Slack channel for customer complaints. Five years from now, one of these will still exist as an independent business and one will not, and any honest read of the last decade tells you which is which.

So the founder objection isn’t theoretical. It’s the only argument that has ever consistently produced things worth using.

Latency, in this view, isn’t safety. It’s death.

The Synthesis That Doesn’t Tie a Bow

Both ends of this are correct, and the discourse keeps treating them as a fight.

Here’s the reconciliation that took me longer to see than it should have. Most decisions live happily inside compression. A homepage CTA, a pricing experiment, a copy variant on the welcome email. The compression of confidence into action is fine for them, because the cost of being wrong is small or the cost of finding out you’re wrong is fast. The startups are right about this and the doom literature has nothing useful to say about it. Ship it, watch it, fix it.

The trouble is the smaller class of decisions where compression is the entire problem. The ones where the cost of being wrong only shows up at scale, the ones where the model’s behavior on Friday isn’t the model’s behavior on Monday, the ones where the regulator finds out before you do because you can’t sample fast enough to catch the drift. Underwriting. Pricing for regulated products. Anything that touches money movement at scale, identity, healthcare access, hiring, model outputs appealed in front of an authority.

The mistake operators are making in 2026 isn’t shipping things they don’t understand. It’s failing to notice when shipping speed and correction speed have decoupled, and continuing to ship at the speed of confidence rather than the speed of consequence.

This decoupling happens far more often now than it did five years ago, because AI integrations are fungible. Last quarter your customer support bot was a side feature. This quarter it is deciding when to escalate fraud cases. Same vendor, same code path, blast radius up by an order of magnitude. Nobody in the org noticed the migration because the integration spec didn’t change.

That’s the operator-relevant part of the knowledge illusion. It isn’t that you don’t understand the model. It’s that you don’t notice when your decision has moved into the class where understanding is supposed to be slowing you down. The recognition is the work. The risk register is the artifact, not the discipline.

The interesting thing about working in fintech across ASEAN is that you watch the regulators in Singapore, Indonesia, and Malaysia getting this right by accident, and the operators in the US and EU getting it wrong on purpose. The MAS sandbox in Singapore, OJK’s piloting structures in Indonesia, BNM’s regulatory licensing tiers in Malaysia. These read to most Western company leaders as regulatory drag. Slow, paternalistic, frustrating. They aren’t. They’re staged-rollout architecture imposed from the outside because the regulators don’t trust the operators to do it themselves. And they keep being right about that. The company that fights the sandbox is usually the one that needed it.

I’m not telling you ASEAN regulators have figured out morality. I’m telling you that any external mechanism that imposes structural slowdown - a sandbox, a staged rollout, a kill-switch threshold, a customer-volume gate - functions the same way regardless of why it exists. If you don’t have one of those imposed on you, you have to build it. Otherwise you find out at scale, and it isn’t your engineering team that learns first.

The 15-Minute Confidence Audit

Most operators I work with overestimate how many of their AI decisions they could actually defend in an external review. By a lot, and in the direction that hurts. The audit shrinks that gap to something you can fix on Friday. Set a timer.

5 minutes: Find the AI surface that drifted.

Write down the last ten places in your product where a model output is on the critical path of a decision that affects a customer. Not “AI helped us write the spec.” Translation flows, support routing, document parsing, lead scoring, fraud escalation, underwriting signals, content moderation, hiring screens. The deployments you stopped paying attention to because they shipped clean six months ago.

For each one, answer the question that almost no operator can answer cleanly: if the underlying model behavior shifted after a silent vendor patch and your decisions started skewing fifteen percent in a new direction, how would you find out? If the answer routes through “we’d check the dashboard,” you’re describing a lag indicator that the regulator has access to faster than you do. If it routes through “customer support volume would spike,” you’re describing a feedback loop that fires after the harm has compounded. Write the honest answer next to each row, in the language you’d use if your CTO weren’t reading.

5 minutes: Mark the migrations.

Most rows on your list are inside compression and should stay there. Small effects, fast feedback, ship-and-watch wins. Move past them. The interesting rows are the two or three where the model’s bad decision today is the regulator’s letter in six months. The ones where the cost of being wrong arrives in a different quarter than the wrongness itself.

Look at those rows hardest. The trap in 2026 is that they don’t look like new decisions. They are old features your team migrated into AI-load-bearing territory without anyone in the org noticing the integration spec hadn’t changed. Customer support bot is now an escalation engine. Lead scorer is now a redlining surface. Document parser is now an underwriting input. The recognition is the entire deliverable from this audit. If you can’t say which rows have crossed, your audit didn’t work, and you go back to step one with a different person doing it.

5 minutes: Put time back into the loop.

Pick the row with the worst answer to the question in step one. Define one structural mechanism that the foundation-model layer makes specifically necessary. Generic risk theatre - approval committees, change boards, vendor management - already exists. The point of this step is the AI-native instrumentation those forums never had reason to demand.

Pin the prompt-and-model version in the codebase, alert on every change including silent vendor upgrades. When GPT-4o-mini becomes GPT-4o-mini-2026-05-12 inside your provider’s regular cadence, your decision behavior shifted whether or not your release notes did. If your stack can’t tell you which model and prompt produced last week’s adverse decision, you cannot defend it.
Decision reconstruction on demand, replayable from input plus prompt plus model version plus retrieval context. If the customer or the regulator asks why account X was declined on April 14, you can replay the exact pipeline that produced the decision and show your work in hours, not weeks. This is the single thing a non-AI vendor process never had to do, because a human reviewer can be subpoenaed and a model cannot.
A canary prompt-set the model is graded against on every silent vendor change. Not synthetic public test sets the foundation model has effectively memorized in training. Real edge cases from your own production history, scored automatically, with a kill-switch when the canary disagreement rate crosses a written threshold.
A pinned eval bench against the human decision-maker the model replaced. Whoever was approving these last year still spends an hour a week on a sampled batch, blind, with their decisions recorded against the model’s. The disagreement rate is the only output-monitoring metric that survives the substitution test, because it measures something only an AI deployment can drift on - the gap between the system you signed off on and the system that’s running today.

Kill rule: if you can’t write down the threshold or the rota, the mechanism isn’t real. The decision goes back into the queue until you can. “I’ll know it when I see it” is what every postmortem starts with, and the postmortem is the artifact this audit exists to prevent.

Substitute “outsourcing” for “AI” anywhere in this section. Prompt-version pinning, model-version drift, retrieval context, shadow evaluation against the human decision-maker - none of it carries over, because outsourcing is at least legible to the people you outsourced to. The model’s drift on Friday is a class of opacity that didn’t exist as a regulated surface five years ago, and the structural mechanisms have to match it.

Questions worth sitting with

Which AI decision did your team ship last quarter that you cannot reconstruct the rationale for? Not the technical implementation. The decision. Why we built it, who approved it, what we expected to see, what we’d do if it broke. If the answer is “I’d have to ask around,” you have the answer.

Who in your org has the standing to slow a launch on the basis of a model risk concern that doesn’t yet have a postmortem attached to it? Have they used that standing in the last six months? If the role doesn’t exist, or the person exists but hasn’t used it, the answer about your AI deployment posture is already written. You’re just hoping the regulator reads it after you do.

If your largest customer asked you, in person, in a meeting this Thursday, to explain the model behind a decision their account just got, would you read from a script written by your vendor’s marketing team? Notice the answer before your head catches up. Most operators find out in that half-second how much of the explanation is actually theirs.

The senior person on your team you trust most about AI - does that person spend more time noticing which old features have quietly become AI-load-bearing, or producing the next demo of something the model can do? Both are valuable. Only one of them is the work you can’t outsource. Which one are you actually paying for?

Gary Orendi

Discussion about this post

Ready for more?