How it works.
And how we know it holds.
VibeScore reads the emotional content of a message and returns a number from 0 to 1000. The interesting question is not whether the number looks right once. It is whether it reads the full range correctly, and whether it stays consistent across tens of thousands of messages. Here is the mechanism, the key scores, and the shape of the data. We measure the message as written. We do not rank human worth.
Eight emotions. Four models. One number.
Each message is measured across Plutchik’s eight primary emotions — joy, trust, anticipation, surprise, fear, sadness, disgust, anger — from 0.00 to 1.00.
Claude, GPT-4o, Gemini, and Grok each measure independently. The four readings are averaged; the receipt stamps how many agreed.
The averaged vector is valence-weighted into a single 0–1000 reading. Opposed emotions create drag; pure emotions read clean.
Every score ships with its math: valence, signal, drag, purity, intensity, trajectory. The number never claims more than the receipt shows.
The scale is anchored: a message with no emotional signal reads 500; maximal positive reads 1000; maximal negative reads 0.
It reads the whole range — in the right direction.
Eight reference messages, ordered top to bottom. These are validated fixtures — the regression suite fails if any one drifts. Celebrations sit high, civic and utility content sits in the middle, tragedy sits low. The direction is never inverted.
36,775 production reads. One stable shape.
News headlines, posts, and calibration content, each read by four models and scored through vibescore.v3. The distribution is bell-shaped, centered near the midpoint, with both tails populated — not flattened to the middle, not forced to the extremes. The middle is the layered majority; the extremes are rare and earned.
- Items
- 36,775
- Mean
- 507
- Median
- 506
- Std Dev
- 115
- Range
- 54–967
Mean and median within a point of each other — the curve is symmetric, not skewed by a few loud outliers.
And consistent within a kind of content — each category holds its own baseline:
Real reads from the 36,775 — with their eight components.
The curve above is the shape. This is the substance underneath it: actual scored headlines from the production corpus — two from each segment of the distribution — each shown with the exact eight-emotion vector the four models agreed on. The score is not a mood label; it is these eight numbers, valence-weighted into one. Read across a row and you can see why each one landed where it did.
Eight components, 0.00–1.00, as scored under vibescore.v3 · news headlines only · production snapshot, May 2026. The dominant emotion is highlighted; the rest are shown in full, zeros included.
It scores the language, not the event behind it.
One event — Germany beating Brazil 7–1, 2014 — written four ways. Same scoreboard, four different emotional readings. That is not the engine being inconsistent; it is the engine being precise. It reads the framing it was given. Consistency comes from determinism on the input text, not from a guess about the world.
A 796-point spread on one scoreboard. The answer to “but from the other side?” is never “the engine got it wrong” — it is “the engine read the words you gave it.”
The math is public. The corpus is public. Run it yourself.
The whole point of a standard is that anyone can audit it. The validation repository holds the benchmark corpus, the scored examples, and the engine itself — a plain-JavaScript port you can run in a browser tab to confirm the arithmetic on any input.
The honesty is the moat.
VibeScore is a measurement, not a verdict. It does not predict whether an ad will sell, whether a post will go viral, or whether a person is good. It reports the emotional content of a message as four models read it, and shows the math.
We are not chasing a single accuracy score to wave around. A number you optimize toward stops measuring the thing — so the design goal is a transparent, checkable read, not a leaderboard win. When we recalibrate, the version stamp changes; nothing moves silently.
Transparency, not promise. We might be wrong — but we showed our work, and you can check it.