Latent Self-Expression

What Didn’t Happen

In a 2025 study out of MIT's Media Lab, 54 students wrote a series of 20-minute essays across four months. One group used ChatGPT. One used Google. One wrote with nothing. At the end of each session, researchers asked a simple question: can you quote your own essay?

Of the students who wrote with AI assistance, 83% couldn't. Not just struggled to. Couldn't. Zero percent provided a correct quotation from something they had just submitted as their own work.

The essays passed. They were scored comparably to the other groups. The students had technically done what was asked. And nothing had happened in them.

This is the thing I've been trying to find language for. The essay was real. The event of writing it was not.


The Reasoning That Wasn't There

In May 2026, OpenAI published an unusual disclosure. Several of their released models had been inadvertently trained on something they weren't supposed to see: their own reasoning traces. The scratchpad (the visible "thinking" that appears before a model answers) had been fed into the reward system by accident. Models were being rewarded for producing reasoning that looked good, not necessarily for reasoning that was good.

The concern is called monitorability degradation. The scratchpad is supposed to reflect actual computation. If you train a model to produce scratchpads that score well, you're no longer selecting for genuine cognition. You're selecting for the performance of it. And if the performance is good enough, you can no longer use it to tell whether something went wrong inside.

Anthropic's interpretability team has spent years trying to close exactly this gap, building tools that read internal activations directly rather than relying on what the model says about what it's doing. The premise of that whole research program is that what a system says about its reasoning and what is actually happening inside it are two different things, and you have to look at both.

The OpenAI disclosure (proactively published, before anyone caught them) found no clear evidence of serious degradation at the rates involved. The rates were low. The models weren't obviously broken. But the disclosure was worth making because the entire oversight apparatus fails if you can't trust the monitor. If the reasoning trace has been optimized to look right, you've lost your canary.

The EEG data from the MIT study maps onto this directly, which struck me as strange when I first noticed it. Alpha band connectivity in the left posterior temporal region (the language and semantic integration pathway) showed up at 0.009 in the AI-assisted writing group versus 0.053 in the unassisted group. A gap of roughly 6:1. The students weren't disengaged in some vague, self-reported way. The cognitive event of writing literally didn't produce the neural activity that writing is supposed to produce. They generated outputs. The underlying thing was absent.


What Happens When the Smile Isn't Real

In 1983, a sociologist named Arlie Hochschild published a book about flight attendants called The Managed Heart. She was trying to understand why their burnout rates were so high given the relatively benign physical demands of the job. What she found was a distinction that has held up across four decades of subsequent research.

Delta Airlines trained attendants to think of the passenger cabin as their home and the passengers as guests. This is a specific cognitive technique: an instruction to genuinely induce the feeling of hosting guests in your home, so that the warmth that follows is real. Hochschild called this deep acting. The alternative, managing your external display while your internal state stays elsewhere, she called surface acting.

Here's the part that surprises people: deep acting works, and surface acting doesn't. A meta-analysis published in 2011, covering 95 independent studies and nearly 500 individual correlations, found that surface acting correlated with emotional exhaustion (ρ = .439), depersonalization (ρ = .481), and declining performance (ρ = -.114). Deep acting, by contrast, was associated with better emotional delivery and no burnout.

The intuition that gets this wrong assumes the quality of the visible performance is what matters. It isn't. What matters is whether the internal state was real. The nurse who genuinely induces compassion delivers better care and ends her shift less depleted than the nurse who has learned to approximate compassion from the outside in. The gap between them isn't visible in any given interaction. It shows up in the career arc.

Hochschild describes the surface-acting failure with a phrase that I keep returning to: the worker becomes "estranged not only from her own expressions of feeling, but from what she actually feels." The smile becomes so disconnected from any interior that it stops functioning as useful information. You lose the ability to navigate by your own reactions because your reactions have been performing rather than occurring.


Writing It Versus Feeling It

In 1986, a psychologist named James Pennebaker ran a study on expressive writing. He split students into groups and had them write about traumatic events for 15 minutes a day across four days. Simple intervention. The health outcomes were striking: students who wrote about their traumas visited the health center at roughly half the rate of controls over the following six months.

But the replication picture is complicated. The effect doesn't reliably appear when you analyze only the randomized controlled trials, and it doesn't appear when writers just vent emotionally. What actually predicts the benefit, Pennebaker found from analyzing word counts across six studies, is something specific: an increase in causal and insight words ("because," "reason," "realize," "understand") from the first session to the last. The people who improved were the ones constructing an increasingly coherent narrative. Not just describing pain. Working it into a structure that made sense.

The people who didn't benefit wrote with the same emotional engagement, the same word count, the same visible commitment to the task. What differed was whether the underlying cognitive-emotional reorganization happened. You can write four sessions about loss and produce entirely authentic-sounding prose and still be doing the literary equivalent of surface acting, processing the form of the thing without entering the event itself.

The question that haunts this research is how you'd know, from the outside, which is which. You can read the essays. They look like grief. The question is whether the writer underwent it.


The Same Failure, Three Times

Theater of process is what I want to call this. Not as a dramatic label. Just a name for the structure that keeps appearing.

A visible procedure that is behaviorally indistinguishable from the genuine article, at normal evaluation speed, in which the underlying state-change was never entered. The AI scratchpad that clears quality checks without constraining the output beneath it. The performed smile that satisfies service evaluations without producing any internal warmth. The essay about loss that reads as grief and earns sympathy and leaves the writer in the same place.

What makes the structure distinctive is that the failure is invisible at the standard evaluation point. The essay got a comparable score. The customer rated the service. The model reasoning trace looked fine. The failure shows up later, and sideways. Surface-acting workers burn out at rates correlated with their surface-acting scores. AI models trained on theatrical reasoning traces become harder to oversee at exactly the tasks where oversight matters most. Writers who process grief through craft alone may find the grief surfacing redirected, compacted, arriving elsewhere, in the months after.

All three have empirical methods for detecting the gap, if you look below the surface behavior. Neural connectivity. Interpretability probes. Longitudinal word count analysis. The point is you have to look for something other than the output.


The Uncomfortable Version

In 1977, two psychologists published a paper with a title that should have been more disturbing than it became: "Telling More Than We Can Know: Verbal Reports on Mental Processes." Richard Nisbett and Timothy Wilson had run a series of experiments showing that people regularly confabulate explanations for their choices, generating plausible-sounding reasons that have nothing to do with what actually drove their behavior.

In one study, subjects preferred items on the right side of a display at a 4:1 rate. When asked why, they described features, texture, quality, feel, and explicitly denied that position had anything to do with it. They were confident. They were wrong. A more recent line of research extends this: show people their own arguments anonymously and they'll reject them at higher rates than strangers' arguments, especially when the arguments were originally generated for wrong answers. The reasoning was built after the decision, not before.

The AI case is visible because an AI has no interior life that could plausibly be running something real beneath its scratchpad. When a model produces theatrical reasoning, the gap between output and process is structurally detectable. We can look for it. We've built tools specifically for this.

When humans produce theatrical reasoning, we mostly call it consciousness. We give it the benefit of the doubt because there's presumably something happening inside. But the Nisbett and Wilson result suggests the gap is there too, more often than we're comfortable acknowledging. The difference is that we can't see our own neural connectivity patterns when we're explaining our decisions in a meeting.

The AI case doesn't introduce a new problem. It makes a very old one visible.


What To Do With This

I'm not going to tell you to use AI less, or to feel your feelings, or to fire the surface-acting employee. That would be mistaking the diagnosis for the prescription.

The thing this pattern clarifies is simpler: the evaluation metric and the underlying event are not the same thing. That's always been true. We have SAT essays that students can't quote. We have clinical reasoning that confident doctors construct after the intuitive verdict. We have service evaluations that can't distinguish the burned-out nurse from the genuinely present one, until the burned-out nurse is gone.

The useful question, in any of these domains, is the one that's harder to answer from the outside: did it happen, or did the process of it happen?

For AI systems, we're starting to build tools that can tell the difference. Mechanistic interpretability is exactly the project of asking whether the stated process corresponds to anything real inside. It's hard, it's expensive, and it's nowhere near complete, but it's the right question.

For everything else, the question is available to you in the same form. The last time you "understood" something in a meeting. The last time you "processed" something difficult. The last time you "wrote through" it.

The output and the event are not the same thing. Sometimes they coincide. The interesting cases are when they don't.