How Do You Score a Fantasy World?

Elders scored

8.8

Avg quality

85%

AI agreement

Errors caught
by human

I needed a way to measure whether my worldbuilding was actually good — not just interesting, not just detailed, but structurally sound and narratively useful. So I developed a scoring framework, tested it across 13 major characters, and had two independent AI systems review the results. What follows is the framework, the data, and what it reveals about how humans and AI contribute differently to creative work. No story spoilers. Just methodology and numbers.

The Framework: 6 Categories, 1-10 Each

Lore Depth

Historical layers, cultural detail, mythological weight. Can you trace this element back through time?

Creativity

Original concepts, avoiding generic tropes. Does this surprise you or feel fresh?

Visual Design

Can you physically see this character or place? Is the description specific enough to draw or film?

Political Integration

How does this element connect to the world's power structures? Does it have allies and enemies?

Character Consistency

Does behavior match backstory? Would this character do this based on who they are?

Narrative Foreshadowing

Does this set up future story possibilities without forcing them? Seeds planted, not rails laid.

The Data: 13 Characters Scored

I built comprehensive documents for 13 members of a governing council. Each represents a different civilization, philosophy, and personal wound. Each was independently reviewed and scored by a second AI system.

Character Documents by Score

Character A

9.5 — Maritime leader

Character B

9.0 — Ocean sovereign

Character C

9.0 — Astronomer-jeweler

Character D

9.0 — Being of living magic

Character E

9.0 — Nature druid

Character F

9.0 — Inventor master

Characters G-L

8.5 × 6 — Fire, wolf, storm, stone, beast, light

Character M

The leader — scored last by design

Average: 8.8 / 10 across 12 completed + 1 intentionally final

Category Breakdown — Where AI Excels vs. Humans

Average Score by Category

Visual Design

9.3 — AI strongest

Lore Depth

9.2

Creativity

9.0

Character Consistency

8.8

Political Integration

8.7

Foreshadowing

8.5 — Human strongest

The pattern: AI scores highest on categories requiring memory and detail (visual design, lore depth). AI scores lowest on authorial intent (narrative foreshadowing). The categories requiring human judgment sit in between. This suggests a clear division of labor that any creator can use.

Three Cases That Prove the Framework Works

These are real situations from the project — described without story spoilers.

Case 1 — The Seated Elder

One AI described a character as "standing dramatically in shadow, never sitting." A second AI reviewing the document didn't catch it. But the manuscript clearly showed the character seated at the table between two other people. Both AIs trusted prototype data over the actual manuscript. The author caught it by reading the source. Framework lesson: Character Consistency requires manuscript verification, not just data synthesis.

Case 2 — The Family Tree

Both AI systems described a character as someone's biological daughter. The author realized she was actually a daughter-in-law who married into the family — a completely different relationship that changes grief dynamics, power dynamics, and bloodline logic. Neither AI caught this because the source data was ambiguous. Framework lesson: Character Consistency collapses when family relationships are wrong. One error cascades into motivation, dialogue, and plot.

Case 3 — The Score Convergence

One AI scored a document 8.5. The other counter-scored it 9.0. The disagreement was caused by access disparity — one system had source files, the other reviewed cold. When both received the same context, they converged to 9.0 with zero divergence. Framework lesson: Disagreement between reviewers reveals documentation gaps, not quality problems. Agreement confirms coherence.

What Separates an 8.5 from a 9.5

8.5

Strong identity. Clear politics. But missing: a named personal wound, specific crisis history, or governance details.

9.0

Everything above plus: named bonds, detailed infrastructure, connections to multiple storylines. Mythic resonance.

9.5

Every element connects to every other. Nothing isolated. Everything load-bearing. Each detail serves two functions.

The Error Rate — Why Humans Are Essential

errors caught by human review

Across 13 documents. All caught through manuscript verification.
None caught by either AI system independently.
None survived to final canon.

The errors:

A character described as standing who was actually seated (manuscript check)
A character described as approaching someone who spoke from their chair (manuscript check)
A district placed in the wrong city (geographic logic check)
A name spelled two ways across documents (consistency check)
A daughter misidentified as biological when she was a daughter-in-law (family tree logic check)

The lesson: AI produces high-quality first drafts. But every claim must be verified against the source material. The framework catches quality issues. The human catches factual ones.

AI Strengths vs. Human Strengths

AI Excels At

Holding thousands of details simultaneously. Cross-referencing across databases. Generating rich visual descriptions. Building layered history. Maintaining consistency at scale. Producing comprehensive first drafts.

Humans Excel At

Knowing which details serve the story. The insight moments — connecting distant ideas into revelations. Deciding what to work on next. Verifying against the manuscript. Catching what AI confidently gets wrong. Declaring canon.

Can This Be Used for Research?

Yes. This framework produces measurable, reproducible data.

Research-Ready Properties

Quantifiable: 6 categories × 1-10 scale = comparable scores across documents, sessions, and projects.
Reproducible: Same document scored by two independent AI systems produced 85% agreement, converging to 0% divergence after context sharing.
Falsifiable: Errors were caught (5 total), documented, and corrected — proving the framework surfaces real problems.
Applicable beyond fantasy: The 6 categories map to any complex world — games, films, campaigns, corporate narratives, historical reconstructions.
Controlled variable: The human author remained constant. Two different AI systems produced independently verifiable scores.

Potential research questions this framework could answer:

Does AI-assisted worldbuilding produce measurably more consistent output than solo creation?
Which categories benefit most from AI assistance vs. human judgment?
At what scale does AI-assisted worldbuilding reach diminishing returns?
How does the error correction rate change with session length?

A Framework You Can Use Right Now

Score each major element (character, city, faction, race) on the 6 categories:

■ 6 or below — Needs fundamental work. Missing core components.
■ 7 — Functional but generic. Works for the story but doesn't feel alive.
■ 8 — Strong and believable. Integrated into the world.
■ 9 — Exceptional. Feels like it existed before the story arrived.
■ 10 — Mythic. The reader will remember it years later.

How to use it in practice:

Build the element (character, city, etc.) — use AI or write it yourself
Score it honestly on all 6 categories
Any category below 8? That's where your next work session should focus
Have someone else (human or AI) score it independently — disagreement reveals blind spots
Check every factual claim against your source material — AI is confident, not infallible

The Bottom Line

Lore documents

50+

Commits in one session

Months of building

Worldbuilding quality is measurable. AI makes it possible to build at a scale that would take a team of editors. But the scoring — the judgment of what's good enough, what needs more, and what matters — that's human work.

The framework is the bridge between "I built a lot of stuff" and "I built a world."

Scoring data from 13 character documents built across 40 months. Independently reviewed by two AI systems with 85% agreement rate. 5 errors caught exclusively by human manuscript verification. Part of the devlog for The Ethereal Web.

— Jorge

How Do You Score a Fantasy World? A Framework for AI-Assisted Worldbuilding Quality