Building Canary

or What Had to Come Out Before Anyone Else Could Use It

May 13, 2026

The original Canary config had [[INTERNAL DIALOGUE]] and [[THE BEAST]] hardcoded as formatting tags the preprocessor stripped before metrics ran. Those tags are mine: markup I use in drafts so I can find interior-monologue blocks and Beast-pressure beats without grep getting in the way. The exclusion lists carried character names from Vampires of Tucson: Babydoll, Blondie, Ah Puch, Sings-In-Woods. The strip rules looked for chapter headers in a format only one manuscript uses. The genre threshold profile assumed horror-gothic, because that is what the manuscript on the bench was, and the engine had no way to ask what anyone else was writing.

It worked. It was also completely unsuitable for anyone who wasn’t writing Vampires of Tucson.

The ProWritingAid piece made the argument for what Canary was architecturally. This piece is what happened next. Making it portable required a series of decisions, each one a problem the tool had and a name for the fix. Not a feature list. An archaeology of one manuscript’s assumptions, sorted from the universal craft patterns underneath.

What follows is the work.

What Had to Come Out

The hardest part of genericization is not stripping the obvious content. The obvious content is easy. Character names are searchable. Manuscript-specific markers are searchable. You grep, you delete, you move on.

The hard part is the false-positive suppression list.

Canary’s preprocessing engine flags weak adverbs adjacent to weak verbs. Standard craft gate. But certain noun-adverb pairings are stable across English prose and would otherwise fire as false positives. The exclusion list suppresses them. The list looked like this when the genericization pass started:

family, supply, valley, belly, country, daily, holy, kindly, lovely, lonely, ugly, silly, hilly, jolly, smelly, woolly, chilly, frilly, hourly, weekly, monthly, yearly, prickly, crinkly, wobbly...

Some of those words are universal. “Family” appears in every novel ever written. “Supply” is a noun before it is anything else. “Daily” and “weekly” are time markers, not adverbial constructions doing the kind of work the gate is built to catch.

VoT canary runs seeded the list. Some entries belonged because they appear unusually often in a vampire novel. Other entries belonged because they appear unusually often in English. The list itself didn’t say which was which. Observation built the list, not principle.

So the work was sorting. Read each entry, ask the question nobody asked when the list went in: did this word land here because the manuscript needed it suppressed, or because the gate would misfire on any manuscript that contained the word.

“Holy” stayed. Religious vocabulary appears across genres at a rate that justifies suppressing the gate’s misfire. “Belly” stayed for the same reason. “Family” stayed. “Supply” stayed.

“Crinkly” left. A specific scene in Babydoll put it on the list. Without that manuscript, the word has no business on any suppression list.

The pile to sort was small. The principle is what mattered: an exclusion list is not a feature. It is a record of what one manuscript taught the tool, and exporting the tool means re-asking whether each entry generalizes.

The same archaeology ran on every other field. The character names came out. The chapter header regex got demoted to one entry in a list of patterns the author can extend. The hallucination markers stopped living as hardcoded VoT artifacts and became a field the author populates from their own draft markers. The genre threshold defaults stopped assuming horror-gothic.

Sorting what belongs to the manuscript from what belongs to English. That was the work.

The Standards Builder

A config file with no documentation is not a portable tool. It is a JSON file in someone else’s namespace.

A new author opening pwa_config.json for the first time sees fields named character_names, weak_adverb_noun_exclusions, hallucination_markers, header_strip_patterns. They can read JSON. They cannot read this JSON, because the fields are unlabeled archaeology. There is no inline note explaining what weak_adverb_noun_exclusions actually excludes, why those specific words are there, what should go in character_names, or what happens if the author gets it wrong.

The fix is an interview. Seven sections, written in plain language, walking the author through the decisions in order. The output is a standards.md file that loads alongside the bundled foundation and turns the tool’s behavior toward the author’s manuscript.

Here is the interview in working form.

The Canary First-Run Interview

Run through these seven sections once. Your answers become your standards.md.

Section 1: Project context

What genre are you writing? Fiction or non-fiction? What POV (first, third limited, third omniscient)? What tense? Approximate target chapter length in words?

Section 2: Voice profile

How would you describe your sentence length tendencies: short and punchy, long and winding, or mixed by scene type? What register does your prose operate in? Are there stylistic moves you make intentionally that a tool might flag as errors?

Section 3: Threshold adjustments

Which default quality gates feel too strict for your work? Which feel too loose? (Examples: passive voice threshold, sentence length average, adverb density.) Leave blank to use the bundled defaults.

Section 4: Banned patterns

What specific phrases, constructions, or tics do you want flagged in your own prose? These are your declared enemies: patterns you know you fall back on and want caught.

Section 5: Required elements

Are there things that must be present in every chapter or piece? (Examples: a sensory grounding beat in each scene, a POV character named within the first paragraph, dialogue that breaks up prose blocks longer than X words.)

Section 6: Author fingerprints to preserve

What patterns feel like your voice and should NOT be flagged? These are the moves that distinguish your prose from generic output. Name them so the tool learns to leave them alone.

Section 7: Character names

List the character names in your manuscript. The tool will not flag these as misspellings.

Each section maps to a specific behavior in the engine. Section 1 sets the genre profile that drives the threshold defaults. Section 2 feeds the voice-anchoring checks. Section 3 overrides the numeric thresholds where the author’s craft is intentionally different from the default.

Section 4 adds entries to the banned-pattern scanner. Section 5 adds required-element checks. Section 6 is the false-positive suppression layer, populated by the author rather than excavated from someone else’s manuscript. Section 7 silences the names.

The interview takes maybe twenty minutes for an author who knows their own voice. An author who has never sat with these questions takes longer, which is the point. The questions are the work. The output is the artifact the questions produced.

The author writes the answers. The tool runs the math.

Two Layers

A standards system has to solve two problems at once. The first is technical foundation: gate definitions, scoring mechanics, pattern detection logic, the language of the report itself. The second is craft: what good prose looks like for this author, in this manuscript, at this point in their work.

Mix those two layers in one file and you cannot ship the tool to anyone else without shipping your craft positions with it, and craft positions are not portable: my passive-voice threshold is not your passive-voice threshold, my “ban the em dash” gate is not your “ban the em dash” gate, and the author writing thriller prose with deliberate short sentences and the author writing literary fiction with deliberate long ones do not share a single threshold that serves them both. A tool that pretends otherwise is not actually evaluating either of them.

Canary solves it with two loading layers.

The bundled _standards.md is the foundation. It loads automatically on every run. It contains gate definitions (”a passive voice construction is a ‘to be’ verb plus past participle lacking a clear agent”), scoring mechanics (”each gate produces a count, the counts roll up to a profile score”), and pattern detection logic (”the scanner looks for these specific regex patterns in this order”). It contains zero craft positions. It is the engine, not the settings.

The author’s standards.md is the override layer. It loads after the bundled file, on top of it. The author’s gate definitions take precedence over the bundled ones. The scanner appends the author’s banned patterns to its own pattern list. The author’s threshold values replace the bundled defaults where they appear. Where the author’s file stays silent, the bundled defaults hold.

The bundled file is the engine. The author’s file is the settings.

A reader who has been through The Standards Document will recognize the move. That piece made the argument for what a standards document is and how it works as a voice contract. It did not answer the question of how an author who does not have one yet gets there from a standing start. The standards builder is the answer. The interview produces the file. The file plus the bundled foundation produces the evaluation.

The foundation never argues with the author’s voice. It just runs the math.

What Else Changed

A few smaller decisions worth naming.

The character_names field used to pull from the developer’s manuscript. Now it pulls from the author’s. Section 7 of the interview is the entry point. The scanner reads the list before it runs the spellchecker, so character names never appear as misspellings. The scan reads the same list, so a character’s name never appears as a flagged artifact of broken generation. One field, two consumers, both correct.

The setext heading fix gets one sentence here because it is a one-sentence story. Canary’s preprocessor handled ATX-style headings (”## Section Name”) but missed setext-style (”Section Name\n========”). A real document tripped it. The fix took ten minutes and would never have surfaced if I had only ever run the tool against the manuscript it grew up on. Testing against real prose finds real bugs.

Flesch-Kincaid grade came out of the metric set entirely. It fought with sentence variety on every run: push variety up by mixing long sentences with short ones, FK grade rises with the long ones; push FK down by shortening sentences across the board, variety collapses. Two measures moving against each other on the same draft produces noise, not signal. The simpler fix was to remove FK and let sentence variety carry the structural-rhythm read on its own.

Hallucination markers used to live hardcoded inside the engine. The strings [[INTERNAL DIALOGUE]] and [[THE BEAST]] sat in the source, because those were the generation artifacts surfacing during Vampires of Tucson sessions. Different authors generate different artifacts.

Now the markers are a field, populated by the author from whatever their own AI is leaving behind. The engine reads the field, scans for the strings, flags them. The mechanism is the same. The list is no longer someone else’s.

Peer tools in the landscape: ProWritingAid ships with no config and no preprocessing layer. NovelCrafter and Sudowrite are generative tools, not evaluative ones. They write prose; they don’t score it. None of the three have a first-run interview that builds a standards file from scratch, because none of the three are doing the kind of evaluation work that would require one. The differentiation is not which tool scores better. It is which tool knows what it is reading.

The Door Is Open

The repo is at github.com/elf1024-vot/canary. Canary is now at V1.3. The interview lives in standards_builder.md, the engine in strip_engine.py, the bundled foundation in _standards.md, and the author override in the standards.md the interview produces. Four files, one pipeline.

A new author runs the interview, answers the seven sections, drops the output next to the bundled file, and runs the tool against a chapter. The first report comes back tuned to their voice, not someone else’s, with the exclusion list belonging to their manuscript, the character names belonging to their cast, and the thresholds reflecting the craft positions the interview just walked them through naming.

It doesn’t know your voice yet. It’s about to ask.

You may also like:

Architecting the Writing Room

Discussion about this post

Ready for more?