The kernel, granted
Start with the part that is earned, because it is earned. The chess master who sees the strong move at a glance, the firefighter who calls the retreat a breath before the floor gives, the anesthesiologist who reads the room going wrong — these are not stories. They are trained pattern-recognition, and the pattern is real. Klein's naturalistic-decision-making studies watched experts act on hunches they could not articulate and be right. Chase and Simon's work on chess showed the mechanism: skill is chunking — thousands of learned board fragments recognized on sight. Under the hood the gut is a vast retrieval index, and where it has been trained, it is fast and correct.
Kahneman and Klein — a skeptic of intuition and its champion — sat down to argue and found they mostly agreed. Their 2009 paper, tellingly subtitled A Failure to Disagree, names two conditions for intuition to be worth trusting: first, a high-validity environment — one that provides "adequately valid cues to the nature of the situation," regular enough to be predictable; second, an adequate opportunity to learn those cues through practice. Where both hold, the hunch is skill. Trust it.
The other half
Now the correction, which the kernel does not contain. Take the conditions away and the machinery still runs — it just runs on nothing.
In a low-validity environment the world is not regular enough to hold a pattern. Long-range political and economic forecasting, picking individual stocks, calling the next hire's ten-year arc — here the cues are weak, the feedback is slow, sparse, or actively misleading, and there was never a stable regularity to learn. So nothing trained the gut. The pattern it "recognizes" is pattern in noise. The forecaster feels the same click of recognition the chess master feels, and produces it by the identical process, and is wrong more or less at chance.
The receipts are old and they are consistent. Meehl's 1954 Clinical versus Statistical Prediction lined up expert clinical judgment against simple mechanical rules and found the rules held their own or won. Thirty-five years later Dawes, Faust, and Meehl closed the case: across roughly a hundred comparative studies, "the actuarial method has equaled or surpassed the clinical method, sometimes slightly and sometimes substantially." The advantage is general, not a fluke — and it is easiest to see exactly where cue validity is low, where the expert's confidence has nothing under it and the checklist quietly wins. That framing of where the mechanical edge bites hardest is Kahneman's synthesis of Meehl's evidence with the validity conditions; the actuarial-superiority result itself is Dawes, Faust, and Meehl's, and it was fought over — critics said the studies stacked the deck against the clinician — and it held.
The feeling of certainty is manufactured the same way whether or not there is anything under it.
This is the hinge of the whole note. Confidence is not a validity gauge. The gut cannot report the quality of its own training data, so the strength of a hunch tells you how fluent the retrieval was, not whether the retrieval was built on signal. You cannot feel your way to knowing which environment you are standing in. You have to know the environment.
The proposal
So the question that circulates — should you trust your gut? — is the wrong question, because it points at the feeling. Point at the world instead. Two questions, scored on the domain, not the hunch:
- i How regular is this world? Are there stable relationships to learn, or does it reshuffle faster than anyone could learn it?
- ii How honest is its feedback? Do you find out you were wrong, soon and clearly — or is the feedback slow, rare, noisy, or bent by the same forces that made the call?
Two yeses put you where the gut earns trust. A no on either, and the confident hunch should be treated as a hypothesis, checked against a rule or a record before you act on it. Validity lives in the environment, so that is what you assess — and you can often assess it plainly, before you ever consult the feeling.
The site already has one domain scored this way. Money, the Sport with No Referee describes markets as irreducible, played live on a running scoreboard — and their feedback is slow, noisy, and easy to misread, with luck swamping skill over any short run. By the two questions, a market is close to the worst case: low regularity, dishonest feedback. It is precisely the domain where the gut is least reliable and feels most sure. Cousin to this note, not a consequence of it — the same map, drawn over a different country.
The limit
Keep the claim inside its walls. Validity is a spectrum, not a switch — most real domains sit between the chessboard and the crystal ball, part regular, part fog, and the honest move is to place them on the line, not to sort them into two bins. Even high-validity experts carry biases the environment does not scrub out. And calibration itself can be trained: keeping score, seeking fast honest feedback, forcing the loop closed can drag a foggy domain part-way toward regular. None of this licenses distrusting all intuition — that would throw away the chess master with the forecaster. It is the opposite: a map of where the gut has earned its trust, so you can spend that trust where it holds and withhold it where it does not.
Where this touches health — arousal, stress, the dose that helps or harms — it is not medical advice; those threads live in their own note, and the honest and dangerous edges are marked at /dangers.
One bounded claim: intuition is skill where the world is regular and its feedback honest, and noise in a coat where it is not — and the feeling of certainty cannot tell the two apart, so you score the domain, not the feeling.
Kin to Money, the Sport with No Referee, the introspection ceiling, Blame the Grain, and the tacit note.
Rests on: Kahneman, D., & Klein, G. (2009), "Conditions for Intuitive Expertise: A Failure to Disagree," American Psychologist 64(6), 515–526, for the two conditions (high-validity environment; adequate opportunity to learn its cues); Klein's naturalistic decision making and Chase & Simon's chunking, for the mechanism of expert pattern-recognition; Meehl, P. E. (1954), Clinical versus Statistical Prediction (University of Minnesota Press), and Dawes, R. M., Faust, D., & Meehl, P. E. (1989), "Clinical versus actuarial judgment," Science 243, 1668–1674, for mechanical prediction equaling or surpassing expert judgment across roughly a hundred studies. The conditions and the actuarial result are established (the latter with a documented history of dispute, and later replicated — Grove et al. 2000). Reading them together as "score the domain, not the feeling" is the proposal, offered to be argued with, not a proven identity.
Phronesis