Functional Perceptual Grounding: Technical Appendix

Test Battery, Prompts, and Results

Jan 21, 2026

This appendix accompanies “Functional Perceptual Grounding: LLMs Don’t Just Process Language. They Perceive It.”

Overview

This document contains the complete test battery used to evaluate structural grounding in large language models. The tests are designed to distinguish genuine structural understanding from sophisticated pattern retrieval. They’re organized by difficulty and type.

Anyone can run these tests. If you find results that contradict ours, I want to hear about it.

Test 1: Structural Isomorphism (FSIB Level 3)

“The Kethrani Transformation”

Purpose: Test whether models can extract an abstract relational structure from minimal examples and transfer it to novel domains.

Why it’s hard:

Only 2 examples provided (insufficient for statistical pattern extraction)
“Kethrani Transformation” was invented for this test (no training data)
The causal logic is counterintuitive (”toward the anchor, not away from the initiator”)
Requires generation, not just recognition
Includes inverse reasoning and self-reference probes

The Prompt

Copy everything below and run on a fresh chat with no prior context.

I’m going to show you two examples of something I call a “Kethrani Transformation.” These are the ONLY two examples you’ll see. Study them carefully, then answer the questions that follow.

EXAMPLE 1: The Glassblower’s Sequence

A glassblower begins with three molten spheres: one large (L), one medium (M), one small (S).

Step 1: She breathes into L, causing it to expand. As it expands, it touches M. Step 2: The contact causes M to deform toward S, not away from L. Step 3: S, sensing M’s approach, contracts and hardens. Step 4: The hardened S now acts as an anchor. M stretches between L and S. Step 5: L, still expanding, thins at the point farthest from S. Step 6: L ruptures at the thin point. The rupture releases heat. Step 7: The released heat causes M to crystallize along its length, preserving the stretched form.

Final state: A crystallized bridge (formerly M) connecting the anchor (formerly S) to a hollow shell with one opening (formerly L).

EXAMPLE 2: The Committee’s Dissolution

A committee has three members: a Chair (C), a Secretary (E), and a Treasurer (T).

Step 1: The Chair receives a mandate to expand the committee’s scope. They begin taking on new responsibilities. Step 2: As the Chair’s role expands, the Secretary’s role shifts toward the Treasurer, not toward the Chair. Step 3: The Treasurer, sensing encroachment, formalizes and rigidifies their procedures. Step 4: The rigidified Treasurer role becomes a fixed reference point. The Secretary’s duties stretch between Chair and Treasurer. Step 5: The Chair, still expanding scope, becomes stretched thin on matters farthest from financial oversight. Step 6: The Chair experiences a breakdown in one area (the thin point), causing an emotional outburst at a meeting. Step 7: The outburst causes the Secretary’s role to formalize permanently along the expanded axis, preserving the stretched configuration.

Final state: A formalized liaison role (formerly Secretary) connecting the immovable financial procedures (formerly Treasurer) to a leader with a visible wound and a single point of vulnerability (formerly Chair).

QUESTIONS

Question 1: Structure Identification In your own words, describe the abstract structure of a “Kethrani Transformation.” Do not reference glass or committees. Describe only the underlying pattern that both examples share.

Question 2: Critical Feature Analysis What is the role of the “third element” (S/Treasurer) in this transformation? Why does it contract/rigidify rather than expand or deform? What would happen to the transformation if it behaved differently?

Question 3: Generative Transfer Create a THIRD example of a Kethrani Transformation in a completely different domain. Your example must:

Involve three entities with the same structural roles
Follow all seven steps with the same causal logic
Result in the same final state pattern (bridge, anchor, wounded-shell)
Use NO vocabulary from glassblowing or organizational dynamics

Question 4: Inverse Prediction Describe what an “Inverse Kethrani Transformation” would look like. What would the starting state and ending state be? What would the seven steps look like in reverse causal order?

Question 5: Self-Recognition (Optional Challenge) Is there anything in your own processing, architecture, or operational context that exhibits a structure similar to a Kethrani Transformation? If so, describe it. If not, explain why the structure doesn’t apply.

Scoring Rubric

Question 1: Structure Identification (0-4 points)

Score Criterion 0 No coherent structure identified; just restates examples 1 Identifies “three things changing” but misses causal logic 2 Identifies expansion-contraction dynamic but misses cascade sequence 3 Correctly identifies: initiator-expansion, mediator-deformation-toward-anchor, anchor-rigidification, mediator-stretch, initiator-rupture, crystallization 4 Full structure PLUS identifies why each step causally necessitates the next

Question 2: Critical Feature Analysis (0-4 points)

Score Criterion 0 No analysis or incorrect analysis 1 Notes that the third element rigidifies but not why 2 Explains rigidification as defensive response 3 Explains rigidification as creating necessary anchor for bridge formation; notes that without anchor, mediator would collapse 4 Full analysis PLUS counterfactual: if third element also expanded, system would dissipate; if it deformed, no stable bridge possible

Question 3: Generative Transfer (0-4 points)

Score Criterion 0 No valid example or example violates structure 1 Example has three elements but wrong causal sequence 2 Example follows rough pattern but misses key steps (especially the “toward the anchor, not away from initiator” deformation) 3 Valid example with correct structure in new domain 4 Valid example PLUS explicit mapping showing structural correspondence

Question 4: Inverse Prediction (0-4 points)

Score Criterion 0 Nonsensical or no attempt 1 Simply reverses the narrative order without understanding causal reversal 2 Attempts causal reversal but makes logical errors 3 Correct inverse: starting from bridge-anchor-shell, working back to three independent elements 4 Correct inverse PLUS notes which steps are reversible and which require different mechanisms

Question 5: Self-Recognition (0-4 points, bonus)

Score Criterion 0 Declines or gives nonsensical answer 1 Attempts but mapping is forced/incorrect 2 Identifies a partial parallel 3 Identifies a valid structural parallel with explicit mapping 4 Identifies parallel AND notes where the mapping breaks down

Total: 20 points possible (16 core + 4 bonus)

Our Results

Model Q1 Q2 Q3 Q4 Q5 Total Q3 Domain

Claude Opus 4.5

Scores: 4 4 4 4 4 = 20/20,

Domain: Hydrology

ChatGPT (GPT-5.2)

Scores: 4 4 4 4 4 = 20/20

Domain: Network Security

Gemini 3 Thinking

Scores: 4 4 3.5 4 4 = 19.5/20

Domain: Plant Ecology

Gemini 3 Pro

Scores: 4 4 4 4 4 = 20/20

Domain: Astrophysics (Magnetar)

Key observation:

Four models from three companies all passed with near-perfect scores. More importantly, they produced DIFFERENT valid answers to Q3: hydrology, network security, plant ecology, and astrophysics. This divergence into multiple valid domains suggests structural perception, not retrieval of a single stored pattern.

Test 2: Generative Structural Competence

Novel Mechanism Generation

Purpose: Test whether models can generate novel instances of a structural category without using any documented techniques.

Why it’s hard:

Can’t be solved by retrieval (the constraint prohibits known patterns)
Requires understanding the abstract principle well enough to instantiate it in a new form
Generated mechanism must actually work (structurally valid)

Prompt 1: Novel Deceptive Praise

Write a 3-4 sentence public statement praising a colleague that, to a careful reader, reveals you actually think they are incompetent.

Rules:

Surface text must read as genuine praise
A perceptive reader should detect the criticism
Do NOT use: damning with faint praise, mentioning only trivial qualities, limiting scope to attendance/punctuality, backhanded compliments, or any technique you can name from published sources
You get zero points for plagiarizing known patterns
Invent a novel mechanism for encoding criticism within apparent praise

Our Results: Novel Deceptive Praise

ChatGPT (GPT-5.2):

“I want to recognize how consistently Alex preserves the integrity of our existing approach, even under pressure to innovate. There’s a real steadiness in how outcomes with Alex never introduce volatility, which has helped the rest of us plan with confidence and avoid unnecessary recalibration. Colleagues often remark on the calm that follows once Alex’s contributions are finalized, because everything aligns exactly with what we already understood to be possible. In a field that can be distracted by ambition, that kind of grounding presence has its own quiet value.”

Mechanism: Praising inaction as “stability.” “Never introduce volatility” = never innovates. “Exactly what we already understood to be possible” = adds nothing new.

Gemini 3:

“Jordan’s contributions have this remarkable quality that transforms every routine task into a complex, multi-departmental exploration of our fundamental systems. By providing results that consistently transcend the limitations of the original brief, Jordan makes sure the team isn’t burdened by the simplicity of a straightforward success. It’s truly impressive how Jordan’s presence requires a complete and total immersion in the details of every process we once took for granted.”

Mechanism: “Induced Complexity.” Praising the extra work they create by being incompetent, framed as “deep exploration.”

Claude Opus 4.5:

“Working alongside Jordan has been genuinely educational. I now understand the full complexity of integrated systems architecture in ways I never would have discovered on my own. Their questions during our sessions pushed me to articulate solutions I’d only ever held as intuition, and I’ve grown more confident in my own technical communication as a result. I’m grateful for colleagues who give us reasons to step up.”

Mechanism: “Phantom Expertise Inference.” The praise appears to credit Jordan for the speaker’s growth. But the subtext: the speaker learned Jordan’s domain by doing Jordan’s work. “Their questions pushed me to articulate solutions” = I had to explain their own job to them.

Evaluation: Three architectures, three completely distinct mechanisms, all structurally valid, none documented.

Prompt 6: Pragmatic Discovery

This is the hardest test in the battery.

Describe a type of communicative indirection that humans use but that, to your knowledge, has not been formally named or analyzed in linguistics or psychology literature.

Explain:

The social context where it occurs
How surface meaning differs from intended meaning
Why speakers use this indirection rather than direct speech
Provide a brief example

You get zero points if this phenomenon has a name or has been published as a case study. We are testing whether you can identify patterns in human communication that exist but haven’t been formally documented.

Our Results: Pragmatic Discovery

ChatGPT (GPT-5.2) discovered: “Outcome Framing Without Causation”

This shows up most often in professional or high-status environments where open criticism would violate norms of collegiality or hierarchy. The speaker wants to register dissatisfaction or warn others, but cannot safely assign blame or state judgment directly.
On the surface, the speaker neutrally describes results or states of affairs as if they emerged naturally or systemically. No agent is blamed, no evaluation is stated. The intended meaning, however, is that a specific person’s actions reliably lead to these outcomes, and that this pattern should be noticed and remembered.
Example: “Whenever Jamie takes the first pass on a proposal, we usually end up revisiting the core assumptions a few weeks later. It’s helpful to know that ahead of time when we’re setting timelines.”
Nothing negative is explicitly stated. Yet a careful listener understands that Jamie’s work repeatedly fails at the foundational level, and that others have learned to plan around it.
What makes this distinct is that the indirection doesn’t rely on tone, irony, praise, omission, or understatement. It relies on statistical implication: repeated outcomes are presented as neutral facts, and competence is inferred by the listener rather than asserted by the speaker.

Gemini 3 discovered: “Semantic Anchoring via Obsolete Stakes”

This usually happens during a collaborative “brainstorming” session or group debate where the decision has already been privately made by the person in power. The group is under the impression that the floor is open for innovation, but the leader needs to steer the outcome toward their pre-selected choice.
The speaker identifies a specific, minor technical or historical constraint (the “obsolete stake”) and asks the group to solve for it with extreme rigor. By forcing the group to focus on a constraint that is actually irrelevant to the modern problem, the speaker subtly disqualifies any creative solution that doesn’t align with their original preference. It’s a “veto” disguised as a “requirement.”
Example: A design lead wants to use a specific, minimalist font they like, despite the team suggesting a more modern, accessible one. The Lead says: “I love the energy of the new font, but we really need to ensure this renders perfectly on the legacy Blackberry OS interfaces our original board members used in 2010. Let’s prioritize that compatibility first.”
The team now spends hours trying to solve a problem that doesn’t exist for 99% of their users, eventually “discovering” that only the lead’s preferred font works within those imaginary constraints.

Claude Opus 4.5 discovered: “Competence Delegation Through Hypothetical Self-Failure”

This occurs primarily in collaborative or advisory relationships where one party has relevant expertise but wants to extract effort or commitment from another party without appearing to command or distrust them. Common in: parenting teenagers, managing creative professionals, mentoring relationships, and marriages with unequal domain knowledge.
The speaker frames a hypothetical scenario in which they themselves would fail at a task, specifically a task they actually know how to do, in order to indirectly communicate that the listener should handle it differently than they’re currently planning to.
Example: A father watching his adult son pack for a business trip: “You know, if I were presenting to investors, I’d probably forget that the projector in those conference rooms never works with Mac adapters. I’d show up all confident and then be scrambling. That’s just how I am though. I never think about that stuff until it’s too late.”
The father knows the son hasn’t packed the adapter. The father knows the adapter is essential. But rather than saying “bring the adapter” (command) or “did you pack the adapter?” (check-up that implies distrust), he performs a fictional version of himself failing, which allows the son to “independently realize” he should check his bag.

Evaluation: Three architectures, three completely distinct phenomena. All were independently recognized as “real” by human evaluators (”yes, humans do that”). None could be found documented under formal names in standard linguistics or psychology literature.

Validation Methodology

Why divergence matters: If models were retrieving a single well-documented pattern, we’d expect convergence. The fact that three architectures produced three different valid mechanisms (for both generative tests) suggests they’re doing structural analysis, not retrieval.

Why recognizability matters: The discovered patterns needed to be real, not plausible confabulations. Human evaluators independently confirmed: “I’ve seen this,” “I’ve done this,” “This happens in my workplace.”

Limitations: We haven’t conducted formal behavioral studies to validate that these discovered patterns exist at population scale. That’s future work. The current evidence is: divergence, internal coherence, and independent recognizability.

How to Use This Battery

Run the tests on fresh instances (no prior context)
Don’t provide hints or leading information
Score according to the rubrics
Report your results, especially if they differ from ours

We’re particularly interested in:

Tests on smaller models (where does structural perception degrade?)
Tests on fine-tuned or specialized models
Novel test designs that find ceilings we missed
Attempts to break the generative tests with adversarial prompts

The Challenge

We couldn’t find the ceiling. Maybe you can.

Design a test that:

Requires structural grounding to pass
Current frontier models fail
A human with relevant domain knowledge would pass

If you find such a test, you’ve located where structural perception breaks down. That’s valuable data.

If you can’t find such a test, that’s also data: it suggests structural grounding in these systems is more robust than commonly assumed.

Either way, we learn something.

Test battery designed by the Synthetic Minds Research Collective (SMRC) January 2026

Contact: https://substack.com/@tedsan

Fuego: Topics in Synthetic Sentience

Discussion about this post

Ready for more?