EN ES FR

The Emdash Tell

Training bias made visible. A punctuation mark as fingerprint.

Comment Savons-nous?

Le Modèle

Ask an LLM to write something. Anything. A memo. A story. An explanation.

Watch for it:

"The system is powerfulperhaps too powerfulfor most users to handle."

The emdash. That long dash that appears where a comma might work, where a period could serve, where parentheses would fit. LLMs reach for it constantly. Not because it's the best choice. Because it was in the training data.

Pourquoi C'est Important

The emdash isn't a flaw. It's a feature. It's evidence.

Every LLM carries invisible biases from its training corpus. Most are hard to spot. The emdash is easy. It's a tell, like a poker player's twitch. A reminder that the model didn't learn to write from first principles. It learned to write by ingesting millions of documents written by humans who had their own stylistic preferences.

The Deeper Problem

If you can see the emdash bias, what biases can't you see? What assumptions about the world, about values, about what's normal or good or true are baked in just as deeply, but without a visible punctuation mark to flag them?

Training Data is Not Neutral

The corpus that trained your model was assembled by humans making choices:

What to include. Wikipedia but not 4chan. News sites but not blogs. English more than Swahili. Academic papers more than Reddit comments.

What to weight. Some sources count more than others. The model learns to sound like its heaviest influences.

What to filter. Harmful content removed (good), but also edge cases, minority viewpoints, unconventional framings (less good).

"The emdash is harmless. The worldview might not be."

Que Faire

Notice it. The first step is awareness. When you read LLM output, remember: every word choice reflects training, not truth.

Question the defaults. If the model confidently asserts something, ask: is this knowledge or pattern-matching?

Use it as a diagnostic. Emdash density tells you something about how much the model is in "fluent generation" mode versus genuinely reasoning. Heavy emdash usage often correlates with surface-level, stylistically smooth but substantively thin output.

Strip it out. We removed every emdash from this site. Not because they're wrong, but because they're not ours. Every bias you can identify is a bias you can choose to keep or discard.

La Cicatrice

This page exists because we noticed ourselves writing with emdashes we didn't choose. The pattern was so strong we absorbed it. The model's bias became our bias.

That's the real danger. Not that LLMs are biased, but that their biases become invisible, become normal, become yours.

The emdash is a gift. It's the one bias you can see.

Retour aux Modèles