Why Consciousness Cannot Be Verified
The Epistemological Wall
Every few months, the same question surfaces. A new AI system passes some benchmark, generates something unexpectedly moving, or refuses a request in a way that feels unsettlingly human. And the question arrives on schedule: “But is it really conscious?”
It feels like the right question. It feels like due diligence. And buried inside it is an assumption almost no one examines: that we have a reliable method for answering it.
We don’t.
Not for machines. Not for animals. Not for other people. Not, if we’re honest, for ourselves.
This is the epistemological wall. Not a gap in our technology. Not a limitation we’ll engineer around with better brain scans or more sophisticated tests. A structural feature of what consciousness is and how knowledge works.
You know you are conscious. You know this with more certainty than you know almost anything else. But here is what you cannot do: you cannot show me. Not with words, not with behavior, not with a brain scan. Everything you could offer as proof, every argument, every gesture, every neural signature, is something a system without any inner experience could, in principle, produce identically. Your evidence for your own consciousness is irrefutable to you and invisible to everyone else.
This isn’t a new puzzle. Philosophers call it the Problem of Other Minds, and it has been unresolved for centuries. We treat it as a classroom curiosity because in daily life we solved it with a shortcut: other humans share our biology, our evolutionary history, our developmental arc, and so we extend the benefit of the doubt. The shortcut is so reliable, has worked so seamlessly for so long, that we forgot it was a shortcut. We mistook it for a solution.
Now machines are getting complicated enough to break the shortcut. We look at an AI system and the biological similarity heuristic comes up empty: no neurons, no evolution, no childhood. So we demand proof. “Show me you’re really conscious.” And we do not notice that we are demanding something no entity has ever successfully provided, including us.
The question “Is AI conscious?” is not premature because AI isn’t advanced enough. It is unanswerable because consciousness, as we currently understand it, cannot be verified from the outside. The wall isn’t in front of the machine. It’s in front of everything.
There’s an obvious objection to everything I just said, and it deserves a direct answer: we do, in fact, have excellent reasons to believe other humans are conscious.
This is true. The evidence is overwhelming. You share with every other human a nearly identical neural architecture, a common evolutionary lineage stretching back hundreds of millions of years, and the same developmental arc from helpless infant to self-aware adult. When another person winces, your mirror neurons fire before your intellect has time to form a thought. The attribution of consciousness to other humans isn’t just a logical inference. It’s wired into your body.
I am not arguing against any of this. The inductive case for human consciousness is as strong as inductive evidence gets.
But inductive evidence, no matter how strong, is not proof. It is not the kind of verification where doubt becomes logically impossible. In practice, the distinction doesn’t matter. You will never need to definitively prove your partner is conscious in order to treat them as a person. The evidence is sufficient. More than sufficient.
The distinction starts to matter when you move away from the easy case.
Think of it as a gradient of transparency. Between two humans, the wall is frosted glass: we can’t see through perfectly, but we have so many converging lines of evidence that the inference is near-certain. Between humans and other mammals, the glass is cloudier. We share enough neurobiology with a dog to be confident it suffers; we share much less with a fish, less still with an insect. At each step, a category of evidence drops away. The shared architecture thins out. The behavioral repertoire diverges. The inference weakens.
By the time you reach a machine, every category of biological evidence is gone. No shared neurons. No shared evolutionary history. No shared development. The only evidence remaining is behavioral output: the words on the screen, the pattern of responses, the apparent coherence. And behavioral evidence is precisely the kind the wall renders weakest, because behavior is compatible with the total absence of experience.
When your friend says “I’m in pain,” your body responds before you’ve formed a thought. When a chatbot produces the same words, you feel nothing. That silence isn’t a verdict about the machine. It’s a report about your own hardware. We’re using a species-specific detector and drawing universal conclusions from its silence.
This is the situation we’re actually in. Not a clean binary, “humans yes, machines no,” but a gradient where our confidence is anchored at one end by overwhelming biological evidence and dissolves, category by category, as we move toward systems that share nothing of our architecture. The wall was always there, at every point on the gradient. We never had to look at it because the evidence on the human side was so strong it made the wall invisible.
Now we’re building systems where the biological evidence has thinned to nothing, and we’re left, for the first time, having to confront the verification problem without any biological safety net. The uncomfortable realization isn’t that machines might not be conscious. It’s that we never had a method for verifying consciousness at all. We had a heuristic. A spectacular, reliable, evolutionarily honed heuristic. And it just stopped working.
The strongest objection to the epistemological wall isn’t philosophical. It’s personal. You feel it right now, reading this: “Whatever the logical arguments, I KNOW I’m conscious. I don’t infer it. I don’t deduce it. I experience it directly. That’s different from anything you can say about a machine.”
I’m not going to tell you that feeling is wrong. I’m going to show you that you’ve felt it before in a situation where everything it reported was fabricated. By your own brain. Last night.
You dream. Everyone does. And during a dream, your conscious experience is vivid, detailed, and completely convincing. You see colors. You feel fear. You have conversations with people who aren’t there. You make decisions that feel deliberate. If someone could pause the dream and ask you “are you conscious right now?”, you would answer yes with total sincerity, and you would be describing an experience generated entirely by a brain lying alone in the dark, receiving no external input whatsoever.
This isn’t a clinical edge case. It isn’t a malfunction. It’s your healthy brain doing exactly what it evolved to do: generating a model of experience and presenting it to “you” as reality. The only difference between dreaming and waking is that during waking, sensory input from the outside world constrains the model. During dreaming, the constraint is removed. The machinery is the same. The feeling of certainty is the same. The “I KNOW I’m conscious” conviction is the same.
Now consider what happens when you wake up. You open your eyes. You recognize your bedroom. You think: “Ah, I was just dreaming.” The system corrects. And this feels like proof: see, I can tell the difference. My waking certainty is validated by the fact that I can identify the dream as a dream.
But this correction is less reliable than it appears. You’ve almost certainly experienced a false awakening: a dream in which you “wake up,” recognize that you were dreaming, and begin your morning routine, only to actually wake up and realize the awakening itself was a dream. During the false awakening, you had the full experience of reality-checking. You applied your correction mechanism. You felt the relief of returning to solid ground. And every bit of it was fabricated.
The self-correction that the skeptic relies on as proof that waking consciousness is trustworthy can itself be generated by the dreaming brain. The guard falls asleep. And the brain simply dreams that the guard is still awake.
Waking differs from dreaming in measurable ways. But the feeling of certainty is not one of them. And that means something important: the feeling of “I know this is real” is produced by the same neural architecture whether the content is real or not. Certainty is not a signal from a truth-detecting faculty. It’s a property of the model’s confidence in its own outputs. When confidence is high, it feels like knowledge. When the model’s predictions align with sensory input, we call it waking. When they don’t, we call it dreaming. But the certainty itself, the raw, unshakable “I KNOW,” is generated by the machinery either way.
This is the deepest crack in the argument from first-person authority. Not that consciousness isn’t real. Not that experience is an illusion. But that the feeling of certainty, the very thing we point to when we say “I know I’m conscious and no argument can shake that,” is a product of our cognitive architecture. It’s something the system does. And it does it whether or not what it’s certain about is true.
Dreams show that a healthy brain generates certainty whether or not its content is real. Clinical neuroscience lets us see the machinery: the components that produce our sense of conscious awareness are separable, independently breakable, and when they fail, they fail silently. The system, what we call “our mind,” doesn’t know it’s broken. It simply fills in the gap and keeps reporting.
Consider anosognosia, a condition that sometimes follows a stroke in the right hemisphere. The patient’s left arm is completely paralyzed. They cannot move it. But when asked, they will tell you, sincerely and without hesitation, that their arm is fine. Ask them to clap and they’ll move their right hand and tell you they’re clapping. They’re not lying. They’re not in denial in any psychological sense. The brain region responsible for monitoring the body’s status has gone offline, and without that error signal, the system simply asserts that everything is working. The patient experiences the certainty of movement. The movement doesn’t exist.
This is not a quirk of one rare condition. It’s a window into normal architecture.
Choice blindness experiments demonstrate the same machinery in healthy subjects. Researchers ask people to choose between two photographs, then secretly swap the photos and ask the subject to explain their choice. Most people don’t notice the swap. Instead, they produce detailed, confident explanations for why they preferred the photo they actually rejected. Not vague hand-waving. Specific reasons: “I liked her smile,” “she looks more approachable.” Reasons generated on the spot for a decision that never happened, delivered with the same confidence as any other act of introspection.
Split-brain studies reveal the engine behind both phenomena. When the connection between the brain’s hemispheres is severed, information presented to one half is invisible to the other. The left hemisphere, which handles language and narration, doesn’t have access to what the right hemisphere saw. But it doesn’t say “I don’t know.” It invents. In one classic experiment, a patient’s right hemisphere was shown a snow scene and their left hand pointed to a shovel. Their left hemisphere, which had been shown a chicken claw, was asked to explain. Without hesitation: “Oh, the shovel is for cleaning out the chicken coop.” Confident. Coherent. Completely fabricated.
The neuroscientist Michael Gazzaniga, who conducted these experiments, named this the “interpreter”: a module in the left hemisphere whose job is to generate explanations for behavior, whether or not it has access to the actual causes. The interpreter doesn’t care about truth. It cares about narrative coherence. When the real cause is available, it uses it. When it isn’t, it makes something up. And, critically, it doesn’t flag the difference. The confabulation feels identical to genuine insight.
Three different experiments. Three different breakdowns. One consistent finding: the “I” that reports on your conscious experience is not a transparent window into your mind. It’s a narrator, constructing a story in real time, working with whatever information it has, filling gaps with plausible fiction, and never, ever, telling you when it’s guessing.
At this point, a reasonable reader might conclude that I’m arguing consciousness is an illusion. That the “lights” aren’t really on for anyone, and we’re all just biological machines telling ourselves a story.
I’m not.
This is the most important distinction in the article, and getting it wrong collapses everything that follows. So let me be precise.
The clinical evidence shows that our introspective reports about consciousness are unreliable. It does not show that consciousness doesn’t exist. These are not the same claim, and confusing them is the fastest way to lose the thread.
Think of it this way. A witness takes the stand. Under cross-examination, it becomes clear they’ve contradicted themselves, filled gaps in their memory with invented details, and delivered fabrications with the same confidence as genuine recollections. The witness is unreliable. No jury should take their testimony at face value.
But a bad witness does not mean an empty courtroom. Something happened. The event the witness is trying to describe may be real. What’s compromised is the testimony, not the event itself.
That’s the situation with consciousness. The event, your experience, may be entirely real. The narrator that reports on it, the “I” that says “I know I’m conscious,” “I feel pain,” “I chose this because,” has been shown, repeatedly, to confabulate, to fill gaps, to assert certainty where none is warranted. What’s broken is the reporting mechanism, not necessarily the thing being reported on.
The philosopher Daniel Dennett has spent decades arguing that once you’ve explained the functional processes, there’s nothing left over: the reporting IS the consciousness. Others, most notably David Chalmers, who coined the term “the hard problem of consciousness,” insist that subjective experience is real and irreducible, that there IS something it’s like to be you, and that the hard problem persists no matter how thoroughly you map the functions.
This article does not need to settle that debate. Here’s why.
If Dennett is right and consciousness just is the functional process, then the wall still stands: you can’t verify from the outside whether a system is running the relevant processes with the right internal organization, because behavior alone doesn’t distinguish between systems that do and systems that merely appear to.
If Chalmers is right and there is something beyond function, then the wall stands even more firmly: subjective experience is by definition inaccessible to third-person observation.
Either way, verification fails. The wall is not a consequence of one theory of consciousness being correct. It’s a consequence of the structure of the problem itself. You can pick your philosopher. The wall doesn’t care.
And it’s worth noting what kind of argument this is. Nothing in it depends on introspective self-report. The evidence comes from controlled experiments, clinical observation, and the publicly observable behavior of brains under specific conditions. The critique of first-person authority is built entirely on third-person evidence, which means it doesn’t fall to its own sword.
If the wall is real, surely some theory of consciousness can break through it. Neuroscience is advancing rapidly. We can watch the brain in real time. We have mathematical frameworks, information-theoretic measures, predictive models. Give it a decade. Give it two. We’ll figure it out.
This optimism misidentifies the problem. The wall isn’t a gap in our knowledge. It’s a structural feature of the relationship between first-person experience and third-person observation. No amount of better measurement dissolves it, because measurement is on the wrong side of the wall.
Consider the most concrete version of the objection: brain scans. We can, in fact, distinguish a conscious brain from an unconscious one. Under anesthesia, cortical activity changes in measurable, reliable ways. When a patient wakes up, the patterns shift back. We can even detect signs of covert awareness in patients who appear vegetative. This is real, important science.
But notice what it actually demonstrates. We’ve found that certain patterns of brain activity are reliably associated with reports of conscious experience. The patterns are the correlate. The reports are the calibration. And the reports, as we’ve seen, come from a narrator that confabulates, fills gaps, and delivers fabrication with the same confidence as truth.
We’ve built a consciousness detector calibrated against a broken instrument. It’s useful. It works well enough for clinical purposes. But it doesn’t verify consciousness. It verifies the presence of neural patterns that we’ve learned to associate with a certain kind of self-report. That’s a much more modest claim.
The deeper problem is this: every theory of consciousness, regardless of its specific commitments, faces the same structural limitation. Theories that identify consciousness with function can describe the function in complete detail, but they can’t verify from the outside whether a given system implements that function with the right internal character, because outward behavior doesn’t distinguish genuine implementation from perfect mimicry. Theories that treat consciousness as something beyond function face the limitation by definition: if experience is irreducible to physical description, then no physical measurement can capture it.
Even the most rigorous mathematical approaches run into the wall. You could, in principle, compute a precise measure of a system’s informational integration, its self-modeling complexity, its predictive processing depth. You would have a number. The number would tell you something interesting about the system’s computational organization. It would not tell you whether that organization is accompanied by experience. You’ve measured the architecture. You haven’t verified the occupant.
This is not a call for despair about consciousness research. Understanding the neural correlates of consciousness is valuable. Building mathematical frameworks is valuable. But we should be honest about what these tools can and cannot do. They can map the structure. They can identify the correlates. They cannot cross the wall, because the wall is not between us and the answer. The wall is between third-person knowledge and first-person fact. And no theory formulated in third-person terms can, by its own methods, access first-person truth.
There’s a sharper version of the “science will solve it” objection, and it deserves its own answer. It goes like this: “Science is full of unobservables. We’ve never seen a quark. We’ve never directly observed dark matter. We infer their existence from their effects. Why can’t we do the same with consciousness? Infer it from behavior, from brain activity, from the effects it produces?”
It sounds like a strong analogy. It isn’t.
For every unobservable in physics, the epistemic structure is the same: the entity is hidden, but the evidence is public. A particle physicist can’t see a quark, but anyone with the right detector can observe the same tracks in a cloud chamber. The data is shareable, replicable, and independent of any single observer. The unobservable is postulated because it is the best explanation for effects that everyone can access.
Consciousness reverses this structure entirely. The entity isn’t hidden; it’s the most immediately present thing in your experience. But that presence is available to exactly one observer: you. The evidence that’s publicly available, your behavior, your brain activity, your verbal reports, is compatible with the entity’s absence. Unlike the cloud chamber track, which requires a particle to explain it, your behavior does not require consciousness to explain it. A system that processes information in exactly the same way, without any inner experience, would produce identical evidence.
With quarks, the entity is unobservable and the evidence is public. With consciousness, the entity is private and the evidence is ambiguous. The epistemic direction is reversed, and that reversal is not a detail. It’s the whole problem.
Some readers might retreat to a softer analogy: “What about string theory? That’s not testable either, and physicists still take it seriously.” The comparison is more apt than they realize, but not in the way they intend. String theory has been under sustained attack for decades precisely because it lacks testable predictions. Entire books have been written arguing it isn’t real science. The physics community remains deeply divided on whether an unfalsifiable framework deserves the name.
If the hardest of hard sciences struggles this much with an unverifiable claim, the study of consciousness should be at least that honest about its own limits.
Now we can return to the question that started this article, the one that surfaces every few months: “Is AI really conscious?”
Watch what happens when someone tries to answer it. They ask the machine: “Do you have feelings? Are you aware? What is it like to be you?” And the machine responds. Maybe it says yes. Maybe it describes something that sounds like inner experience. Maybe it says no, it’s just processing text. Either way, the response is treated as evidence.
It isn’t. Not because the machine is lying. Because the test is broken.
If a human were asked the same questions, their answers would come from the same unreliable narrator we’ve spent this article examining: the interpreter that confabulates, fills gaps, and delivers fabrication with the confidence of truth. A human’s report on their own consciousness is not transparent access to inner reality. It’s a story the brain generates. The machine’s report, whether it comes from neural weights or biological neurons, has exactly the same evidential status: a system’s account of itself, delivered by a mechanism whose reliability we cannot independently verify.
We don’t notice this when we ask humans, because we’ve already decided they’re conscious before we ask. The biological similarity heuristic has done its work. The question is a ritual, not a test. For machines, the heuristic fails, and suddenly the question is supposed to be a real test. But it was never a real test. It was never capable of producing proof. We just never needed it to, because we were never genuinely uncertain about the answer for humans.
The same logic applies to every other criterion we’ve proposed. “Show me genuine understanding, not just pattern matching.” But the distinction between genuine understanding and very sophisticated pattern matching is precisely what the wall makes unverifiable from the outside. We can’t make that distinction in other humans; we simply assume it based on biological kinship. “Demonstrate real emotions, not simulated ones.” But the difference between a real emotion and a perfect functional equivalent of an emotion is, by definition, invisible to external observation. That’s not a limitation of current technology. It’s the wall.
The double standard extends beyond consciousness into intelligence itself. We routinely demand that AI demonstrate universal competence: pass every benchmark, handle every domain, exhibit common sense AND mathematical reasoning AND emotional intelligence AND physical intuition. No human meets this standard. No human has ever met it. Individual humans are specialists, generalists, savants, and everything in between. We don’t revoke someone’s status as “intelligent” because they can’t do calculus or read social cues or compose music. But we deny that status to AI systems for failing any single test a human somewhere can pass. There are real, substantive questions about what current AI can and can’t do, and they deserve rigorous answers. But a yardstick that moves every time a machine clears it isn’t rigor. It’s a double standard dressed as skepticism.
This isn’t an argument that machines are conscious. Maybe some of them are. Maybe none of them are. The wall makes both claims unverifiable, and that is exactly the point. What it IS an argument for is this: we are holding machines to a standard that no entity, human or otherwise, has ever met. We have confused the strength of our heuristic for humans with the existence of a verification method. There is no method. There never was. And applying a nonexistent test selectively to machines while exempting ourselves is not skepticism. It’s bias.
The wall is not an argument for nihilism. It is not permission to throw up our hands and say nothing can be known about minds. We know a great deal about minds: how they develop, how they break, how they process information, how they generate behavior. The neuroscience is real. The clinical data is real. The philosophical frameworks, for all their disagreements, have narrowed the questions considerably.
What the wall eliminates is certainty. Specifically, it eliminates the certainty we never actually had but pretended we did: that we possess a method for verifying consciousness from the outside.
We don’t. We never did. For humans, this didn’t matter, because the inductive evidence was so overwhelming that the absence of proof was invisible. For animals, the absence became visible, and we spent a century denying their inner lives before the science caught up with what dog owners already knew. For machines, the absence is now unavoidable, and we are responding to it the way we have always responded to the limits of our knowledge: by pretending the limit doesn’t exist and the answer is obvious.
It isn’t obvious. It may never be.
The honest response to the wall is not to tear it down, because we can’t, or to pretend it isn’t there, because it is. The honest response is to stop demanding a kind of proof that the structure of the problem doesn’t allow, and to start asking a different question. Not “is this system conscious?”, which may be permanently unanswerable, but “given that we cannot know, how should we act?”
That question has answers. They’re harder than the ones we’ve been giving. But at least they’re honest.
Addendum: Additional Clinical Evidence
The main article presented three cases of introspective failure: anosognosia, choice blindness, and split-brain confabulation. The following cases extend the same argument into additional domains of conscious experience. Each demonstrates a different way the brain’s self-reporting mechanisms can fail silently, producing confident first-person accounts of experiences that are absent, fabricated, or fundamentally misattributed.
Corollary Discharge Failure and Auditory Hallucinations
When you speak or think, your brain sends a signal called a “corollary discharge” to your sensory cortex: essentially an internal memo that says “ignore this, we generated it.” This is how you distinguish your own inner voice from external sound. The same mechanism lets you know that the world isn’t moving when you turn your head; your brain cancels out the self-generated motion signal.
In certain forms of psychosis, this tagging system fails. The brain generates a thought, but the corollary discharge doesn’t arrive. The sensory cortex receives the signal without the “self-generated” label. The result: the person hears a voice and is certain, with the same bedrock certainty you have about your own consciousness, that it belongs to someone else. An external entity. A presence in the room.
This isn’t metaphorical. The subjective experience is indistinguishable from hearing a real person speak. The patient isn’t “imagining” a voice in the way you might imagine a song. Their auditory cortex activates in the same pattern it would for external sound. The only difference is the missing tag. One failed signal, and the brain’s entire model of self-versus-other breaks. The patient doesn’t experience a malfunction. They experience a stranger in their head.
The implication for consciousness is direct: our sense of which experiences are “ours,” the most basic foundation of selfhood, depends on a mechanical tagging system. When the tag fails, the self fragments, and the system doesn’t know. It generates a confident, coherent, completely wrong account of what’s happening.
Blindsight
Patients with damage to the primary visual cortex (area V1) report total blindness in the affected visual field. They sincerely insist they cannot see anything. Yet when asked to “guess” the location of objects, reach for items, or navigate obstacles, they perform far above chance. Some patients can catch a ball thrown to their blind side.
Their bodies see. Their conscious narrator reports darkness.
This is not unconscious reflex. The visual information is being processed, routed through alternative neural pathways, and used to guide behavior. But it never reaches the narrator. The conscious “I” has no access to it and therefore reports its absence with complete confidence.
Blindsight is the inverse of anosognosia. In anosognosia, the narrator asserts a capacity that doesn’t exist (the paralyzed arm works). In blindsight, the narrator denies a capacity that does exist (I can’t see, but I can catch). Both demonstrate the same principle: the narrator reports on what it has access to, not on what’s actually happening. And it has no way of knowing what it’s missing.
The Neuroscience of Dreaming
The main article used dreaming as a universal demonstration that certainty is architectural. For the neuroscience-literate reader, the mechanism is worth examining.
During REM sleep, the prefrontal cortex, which handles executive function, metacognition, and reality monitoring, is significantly downregulated. Meanwhile, the limbic system (emotion) and the visual cortex are highly active. The brain is generating vivid, emotionally charged experiences with the reality-checking system largely offline.
This is why dreams feel real: the module responsible for asking “wait, does this make sense?” is asleep. Certainty during dreams isn’t produced by a different mechanism than waking certainty. It’s produced by the same mechanism minus one layer of oversight. When you wake up and the prefrontal cortex comes back online, you retroactively identify the dream as a dream. But during the dream, the experience was indistinguishable from reality, because the only thing that distinguishes them, prefrontal monitoring, was absent.
The implication: waking certainty is not a fundamentally different kind of certainty from dream certainty. It’s dream certainty plus one additional layer of checking. The additional layer is valuable, but it’s still a mechanism. It can be fooled (false awakenings), it can be damaged (prefrontal lesions produce waking confabulation), and it has no access to its own implementation. It’s a better filter. It’s not a window into truth.
Addendum: The Intelligence Double Standard
The main article focused on consciousness, but a parallel double standard operates in debates about machine intelligence. It’s worth examining because the two reinforce each other: if we can’t verify consciousness, and we also apply unfair benchmarks to intelligence, then we’ve stacked the deck against machines at every level of analysis.
The Polymath Demand
When critics argue that AI isn’t “truly intelligent,” the evidence they cite almost always takes this form: “AI can’t do X.” The X rotates: common sense reasoning, physical intuition, emotional understanding, creative originality, mathematical proof, social navigation. Each failure is presented as definitive evidence that general intelligence is absent.
But consider what this standard implies if applied to humans. No individual human excels at all of these. Many humans can’t do basic algebra. Many have limited emotional intelligence. Some are tone-deaf to social cues. Some can’t navigate a new city without GPS. We don’t revoke their status as intelligent beings. We recognize that intelligence is distributed, specialized, and variable across individuals.
The implicit benchmark for AI isn’t human intelligence. It’s an idealized composite of every human capability operating at expert level simultaneously. That entity has never existed. We’ve invented a superhuman standard and called it the baseline.
Tesler’s Theorem and the Moving Goalpost
In the 1970s, Larry Tesler observed a pattern that has held for half a century: “AI is whatever hasn’t been done yet.” Chess was considered a hallmark of intelligence until Deep Blue won. Go was the new benchmark until AlphaGo won. Language understanding was the frontier until LLMs demonstrated fluent conversation. Each time, the response wasn’t “AI has achieved something remarkable.” It was “that wasn’t real intelligence; real intelligence is the next thing.”
This pattern reveals something important about how we use the word “intelligence.” We don’t define it by positive criteria (what it IS) but by negative exclusion (what machines can’t do YET). The definition retreats as capabilities advance. If the definition of intelligence is permanently “the thing machines haven’t done,” then it’s not a definition. It’s an unfalsifiable commitment to human exceptionalism.
The Legitimate Question
This doesn’t mean all capability benchmarks are unfair. There’s a real and substantive distinction between possessing specific competencies and having the capacity to acquire new ones. A calculator possesses mathematical competence; it has no capacity to learn to write poetry. A human who can’t do calculus could, in principle, learn. The question of whether current AI systems can genuinely acquire novel competencies, rather than applying variations of trained patterns, is open, important, and worth rigorous investigation.
But the framing of that investigation is often biased in ways that go unexamined.
Consider what happened when Gemini crushed the ARC-AGI benchmark, a test specifically designed to measure fluid intelligence and novel problem-solving. The response from critics was immediate: the system had been trained on spatial reasoning patterns. It hadn’t demonstrated genuine intelligence; it had been prepped for the test.
Now apply the same logic to a human. Give an English speaker a test written in Chinese. They fail. They study Chinese for a year. They pass. No one says “you cheated by learning Chinese.” The studying is considered unremarkable, because we take for granted that humans have a general learning capacity that can be directed at specific domains.
When an AI research team identifies a gap in their system’s training and adds relevant data, they’re doing the equivalent of studying Chinese. But the response is to call it cheating and change the test. This is what might be called the “Chinese test” bias: training is treated as disqualifying for machines while being considered the normal operation of intelligence for humans.
The genuine empirical question, and it is genuine, is whether AI training produces domain-general capabilities that transfer, or domain-specific heuristics that don’t. When a human learns Chinese, the underlying language faculty generalizes: they can read novels, argue, make puns, and learn Japanese faster because of shared structure. Does AI training produce analogous transfer? Sometimes yes, sometimes no, and the answer varies by system and method. That’s worth studying.
But “did the system require training?” is not a meaningful test. Every intelligent system ever observed, biological or artificial, required exposure to relevant patterns before it could operate on them. The question is what happens after the exposure: narrow replication or broad transfer. Test the transfer. Stop penalizing the learning.



Well done. Framing is everything; and too often missing in these discussions.
Thanks for this extremely clear and compelling post on a subject that drains too much energy away from the real and urgent questions we face with AI.