Pre-publish Gate Design: Balancing False Positives and False Negatives

The HUMAN_INPUT marker mentioned in this article is a placeholder left by the AI writing skill to indicate where a human should later fill in a specific value (e.g., ).

When we tried to write an article titled “Why We Withdrew from AI-Generated Mass Blogging in 5 Days,” the gate reported 16 fails. There were 10 instances of the string HUMAN_INPUT and 6 instances of the path /blog/—both were in prose explaining failure cases, not actual placeholders or dead links. If you make the gate too strict, articles trying to disclose failures won’t pass. If you make it too loose, actual unfinished markers will be published. The answer to which is more fatal depends on the context.

→ For the overall structure of the gate, please refer to Structural Design of the Owned Media Operation Engine mcluhan. This article focuses specifically on the design balance between false positives and false negatives. Technical terms will be supplemented with parentheses for non-engineer readers.

Defining False Positives and False Negatives

False Positive: Valid Articles are Rejected (Hindering Production)

A false positive occurs when the gate stops an article even though there is no real issue.

A specific example Yakumo experienced: An article explaining a failure case contained the description “7 HUMAN_INPUT markers remained.” The gate found that string (HUMAN_INPUT), determined that “unfinished placeholders remain,” and issued a fail. In reality, it was just descriptive text talking about HUMAN_INPUT.

If a gate continues to produce false positives, you can no longer write the articles you need to write. The very concept of an owned media that “discloses its own failures” falls apart.

False Negative: Problematic Articles Pass (Production NG is Published)

A false negative occurs when the gate overlooks an actual issue.

A specific example Yakumo experienced during Phase 0: 4 articles were published with HUMAN_INPUT markers still in them. There were 76 references to internal links that did not exist. These reached production because the gate didn’t exist (or its implementation was too weak).

If a gate continues to produce false negatives, quality-NG articles will continue to be published.

In the draft of the pillar article “Why We Withdrew from AI-Generated Mass Blogging in 5 Days,” the actual false positives generated were 10 for G1 (remaining HUMAN_INPUT) and 6 for G2 (dead links), totaling 16. In Phase 0, the false negatives were 4 HUMAN_INPUT leaks and 76 dead links published to production.

Criticality Depends on Context

Which of the two types of problems is more serious depends on the operational phase and purpose of the owned media.

Perspective	When False Positives are Fatal	When False Negatives are Fatal
Concept	Failure-disclosing owned media cannot write about failures	Quality-assured owned media publishes NG articles
SEO Risk	Decreased information volume due to inability to write	Quality-NG articles are indexed by Google
Reader Experience	Some articles cannot be written (invisible to readers)	Published articles are unfinished (directly hits readers)

This has the same structure as the question of whether to prioritize SEO or reader experience. In Yakumo’s case, because “disclosing failures is the selling point,” we treat preventing false positives and false negatives as equally important.

3 Levels of Analysis Granularity and Trade-offs

Level 1 (Plain Match): High False Positives / Low Implementation Cost

body.includes('HUMAN_INPUT')

This searches for strings across the entire body. Implementation is a single line, but it doesn’t distinguish between code blocks, inline code, or explanations in prose. Articles explaining failure cases will structurally produce false positives.

Appropriate use: Detecting “words that must never appear anywhere in the body.” For example, forbiddenWords (revolutionary, innovative, DX, etc.) are NG in any context, so Level 1 is sufficient.

Level 2 (Code-Stripped): Intermediate

const stripped = stripCodeContexts(string with code blocks / inline code removed from body);
stripped.match(/pattern of specific format/)

This removes fenced code blocks (enclosed in triple backticks) and inline code (enclosed in single backticks) as preprocessing before matching with regex (regular expressions). It can distinguish between “text explaining code” and “actual placeholders.”

Appropriate use: Cases like HUMAN_INPUT markers or internal link paths, where they are “allowed in prose but forbidden as actual markers or links.”

Level 3 (Proximity + Allow-list): Low False Positives / High Implementation Cost

// Exclude via allow-list first
// Exclude via externalReferencePatterns
// Apply proximity (within 50 characters) to remaining matches

This determines “who is being referred to.” Implementation requires multiple judgment layers, so the cost is high. However, it can minimize false positives.

Appropriate use: Rules like G10 (brand rule self-reference check), where the same word is OK or NG depending on the context.

Comparison Table of the 3 Levels

Level	Analysis Method	False Positive Risk	Implementation Cost	Example Applied Rule
L1	Plain match	High (no context distinction)	Low	forbiddenWords (strictly forbidden)
L2	Code-stripped + regex	Medium	Medium	G1 (HUMAN_INPUT) / G2 (dead link)
L3	Allow-list + pattern + proximity	Low	High	G10 (self-reference check)

Assigning the appropriate level to each rule is the core of designing a balance between false positives and false negatives.

Two-Stage Judgment Design: Fail / Warn

Fail: Rules that Block Transition to draft: false (Equivalent to G1–G13)

A fail is a judgment that “if this rule is triggered, the article cannot be published.” It physically blocks the transition to draft: false (ready for publication).

G rules (G1–G13) correspond to fails. Remaining HUMAN_INPUT, dead links, inconsistent author information, and G10 brand rule violations all represent states that must not be published. Currently, 13 G rules are implemented.

Warn: Recorded as Recommended Violations but Not Blocked (Equivalent to W1, W2, W3, W4, W5, W7)

A warn is a judgment that “a rule is triggered, but it won’t stop publication.” A record is kept, but the build passes.

W rules (W1, W2, W3, W4, W5, W7) correspond to warns. Trying to publish more than 6 articles in a day or a description falling outside the recommended character count are quality warnings, not absolute prohibitions. Currently, 6 W rules are implemented.

Design Using Accumulated Warns as Review Criteria

Warns pile up. A reviewer can look at and judge states like “there are 5 articles with short descriptions” or “4 articles are concentrated on the same scheduledAt date.”

In terms of implementation, the gate execution results are output as a summary, listing the number and content of warns. The reviewer checks that summary and decides whether to “publish with the warns” or “publish after resolving the warns.” It’s designed to use warns as information for the reviewer rather than for complete automation.

Handling Meta-content (Failure Disclosure Articles)

Contexts where Self-referential Content Structurally Occurs

The owned media Yakumo aims for has a self-referential structure where “improvements to the pipeline itself are made into articles.” If we improve the gate, we write an article about improving the gate; if we fail, we write an article about the failure. Given this structure, meta-content (articles explaining the pipeline or rules in the body) is unavoidable.

Types of meta-content include:

Failure disclosure articles: Explaining “7 HUMAN_INPUT markers remained” or “76 /blog/ links were dead” in the body.
Gate explanation articles: Explaining the contents of rules G1–G10 in the body (this article is one of them).
Pattern explanation articles: Introducing “externalReferencePatterns” or “stripCodeContexts” in the body.

All of these contain strings that are targets for the gate’s checks.

Allow Pattern Definitions to Prevent Gate Self-contradiction

The solution for meta-content is Level 2 (code-stripping). Text explained within code blocks or inline code is excluded from detection.

If a failure disclosure article writes “7 placeholders called HUMAN_INPUT remained,” the rule is to denote HUMAN_INPUT as inline code enclosed in backticks. After stripping, the string becomes “7 placeholders called remained,” and it won’t trigger G1.

This can be solved structurally just by deciding on writing rules. By combining the writing convention—“when explaining HUMAN_INPUT in meta-content, enclose it in inline code or code blocks”—with the Level 2 gate implementation, self-contradiction does not occur.

Assigning Appropriate Analysis Levels by Rule

G1 / G2 → Level 2 or higher (code-stripped regex)

As shown in the previous section’s specific examples, Level 2 is appropriate for G1 (remaining HUMAN_INPUT) and G2 (dead link). Detection is limited to specific formats after removing code contexts.

G10 → Level 3 (proximity + allow-list)

G10 (brand rule self-reference check) requires Level 3. Determining whether the word company refers to ourselves is judged within a proximity window after excluding via allow-lists and externalReferencePatterns.

W Series → Design for Keeping Plain Matches as Warns

W rules (rate-limit, description length, etc.) are judged using plain matches, and the results are recorded as warns. Since they are not fails, publication is not stopped. The reviewer makes a judgment by checking the number and content of warns.

Whether to promote a W rule to a fail is judged through operation. If a situation where “10 articles with short descriptions are released every week” continues, we can consider changing W4 to a fail. Start with warns and judge based on operational data.

At this point, no rules have been promoted from warn to fail. Rather, there is one example of the opposite. G1 (remaining HUMAN_INPUT) was initially operated as “warn at draft / fail at publication,” but once we implemented a mechanism on the render side where isPublished() automatically treats articles containing HUMAN_INPUT as private, we demoted the gate side to always warn. This is to ensure that articles in the middle of being written don’t stop the build, a decision to absorb double defense at the render layer instead of the gate.

W4, W5, and W7 are operated as warns as originally designed. Description/title length and series inconsistencies haven’t reached thresholds like “10 per week” in operational data, and the current system where reviewers look at warns and make individual judgments is functioning sufficiently.

Summary

The balance between false positives and false negatives in the gate can be achieved by a design that “assigns appropriate analysis levels to rules and differentiates between fail and warn.”

To summarize, there are two points:

Choose the analysis level according to the nature of the rule. Strictly forbidden words use L1 (plain), detection requiring distinction between prose and code uses L2 (code-stripped), and context-dependent judgment uses L3 (proximity + allow-list).
Determine the boundary between fail and warn based on “whether it is a state that must not be published”. Unfinished markers, dead links, and brand rule violations are fails; recommended violations are warns. Warns are used as information for the reviewer and are not completely automated.

The answer to the question “will making the gate too strict make it impossible to write?” is “your analysis level is wrong.” It is expected that a gate written at Level 1 will get stuck on meta-content, and this can be resolved by moving to Level 2 or Level 3.

For details on making brand rules SSOT and allow-list design, refer to Allow-list Driven Brand Check. For technical implementation details, refer to Automating Brand Rule Machine Checks and Advancing the Pre-publish Gate.