The
HUMAN_INPUTmarker mentioned in this article is a placeholder left by the AI writing skill to indicate where a human should later fill in a specific value (e.g.,<!-- HUMAN_INPUT: Fill in numerical value -->).
When you start mass-producing articles with AI, there is a landmine you might step on without realizing it: corporate-implying expressions like “our company,” “internal,” or “in-house” naturally mix into the generated text. For solo practitioners or small teams yet to incorporate, this isn’t just a matter of wording. It breaks the consistency of the brand narrative, creating a gap between the reader’s perception and the reality. The structural solution to this problem is a hybrid detection design combining forbiddenWords and contextual proximity detection.
The specific trigger occurred during the writing of a pillar article scheduled for publication on 2026-05-22. When we ran blog-fact-check, corporate-implying expressions like “operated by our company” or “completed in-house” were detected across multiple pillar and spoke articles. Since Yakumo is a solo practitioner and not yet incorporated, these are all narrative inconsistencies (see the retro from 2026-05-20). While the total count wasn’t recorded at the time, the fact that they appeared across multiple articles was enough to decide that we needed a layer of machine prevention before human review.
Why Simple forbiddenWords are Not Enough
Terms like “Our Company” Can be Detected with Simple Matches
Terms like “弊社” (heisha - our company), “当社” (tousha - our company), or “我が社” (wagasha - our firm) are expressions Yakumo must not use regardless of context. These can be covered by simple string matching.
# Equivalent to forbiddenWords in brand.ts
forbidden_exact:
- "弊社"
- "当社"
- "我が社"
- "弊所"
These are NG in any context. The only cases that should be handled via an allow-list are when quoting someone else’s statement (e.g., quoting a representative from Company A saying, “Our company is collaborating with…”).
Terms like “In-house” are Context-Dependent: OK for Generalities, NG for Self-Reference
The challenge lies in terms like “自社” (jisha - one’s own company) or “社内” (shanai - in-house/internal).
- “By inventorying your in-house tech stack, engineer hiring criteria become clear.” → OK (General theory for the reader’s organization)
- “At Yakumo, we use AI agents for our in-house blog operations.” → NG (Referring to ourselves)
With simple string matching, even “in-house” used as a general theory for the reader would be flagged as NG, causing a massive number of false positives. This could end up negating the very point of the article.
Designing Allow-lists to Exclude External Quotes
To maintain detection accuracy, an allow-list is necessary.
allow_patterns:
- context: "quote" # When quoting statements from other companies/people
example: "The official stated, 'At our company...'"
In designing an allow-list, it is more practical to focus on reducing false positives (erroneous detections) while tolerating some false negatives (overlooked issues). Overlooked issues can be caught during human review. Erroneous detections, on the other hand, break the article’s logic.
Proximity Detection Design: The 50-Character Rule
Detect Only When Terms like “In-house” Proximity to “Yakumo” Within 50 Characters
The design established in the 2026-05-20 retro is as follows:
If terms like "in-house" or "own company" appear:
→ Check if "Yakumo" exists within 50 characters before or after
→ If it exists: NG (High probability of referring to Yakumo)
→ If it doesn't: Skip (Treat as a general theory for the reader's organization)
The threshold of 50 characters roughly corresponds to a single Japanese sentence (average 40–60 characters). The design is intended to capture cases where a self-reference and a corporate expression co-occur in the same clause or sentence.
Rationale and Adjustment for Proximity Distance
The 50-character value is based on “typical Japanese sentence length.” It’s possible to expand the threshold for articles with many long sentences or narrow it for those with primarily short sentences.
When adjusting, use the following criteria:
- Too many false negatives (NG but overlooked): Expand the threshold
- Too many false positives (OK but flagged as NG): Narrow the threshold
The Trade-off Between False Positives and False Negatives
Perfect detection does not exist. The purpose of the design is to have a machine catch violations that humans easily miss. It should be positioned as a machine filter inserted before human final judgment, not complete automation.
False positives (flagging correct usage as NG) degrade article quality. False negatives (overlooking violations) can be caught by human review. The design priority is to suppress false positives.
Double Defense via SSOT and Gate
Converting Regular Vocabulary into SSOT in brand.ts and BRAND.md
Rules for corporate vocabulary are converted into SSOT in two places:
-
src/config/brand.ts: Structured data referenced by programscompany: { size: "solo", positioning: "Solo Practitioner", forbiddenVocabulary: ["弊社", "当社", "我が社", "弊所"], contextualForbidden: { words: ["自社", "社内"], proximityTargets: ["Yakumo"], proximityChars: 50, } } -
src/_docs/BRAND.md: Natural language guidelines referenced by humans (writers and AI skills)- Include voice, prohibitions, and recommended vocabulary in the Organization Narrative section.
- Explaining the “why” behind these prohibitions makes it easier for the skill to understand the context.
Encoding Narratives into Writing Skills (blog-tech-write / blog-case-write)
Directly encode narrative prohibitions into the definitions of blog-tech-write and blog-case-write. This is a mechanism where the skill self-checks “am I using corporate terms in a context that refers to myself?” during text generation.
However, this is only a primary defense. LLMs can misinterpret context. Encoding on the skill side is about “significantly reducing violations,” not “preventing them entirely.”
Final Machine Check via blog-review Gate G10
The last of the three layers is Gate G10 of blog-review. This gate performs a machine check on the output from the writing skill, referencing the forbiddenVocabulary and contextualForbidden settings in the SSOT (brand.ts).
Writing Skill (Primary Defense: Narrative Encoding)
↓
blog-fill-ssot (Complementing SSOT-derived HUMAN_INPUT)
↓
blog-review Gate G10 (Secondary Defense: Machine Detection) ← Final check of corporate vocabulary here
↓
Human Review (Tertiary Defense: Final Judgment)
With this design, a multi-layered defense is established: “prevent at the writing stage,” “catch with machines after generation,” and “final confirmation by humans.”
Conclusion: Protect Narrative Consistency through Structure
Ad-hoc Reviews Recur. A Three-Layered Approach (SSOT → Skill Encoding → Gate) is Necessary
The operation of “just being careful during each review” does not work. When mass-producing with AI, missing a check is a matter of probability, and the likelihood of violations increases as the number of articles grows.
Protecting through structure means establishing these three layers:
- SSOT: Normalize prohibitions in two places:
brand.tsandBRAND.md. - Skill Encoding: AI understands and avoids prohibitions at the time of writing.
- Gate: A final checkpoint to confirm generating results mechanically.
Once this structure is in place, the same standards will apply even if a new member writes an article or we switch to a new AI model, as long as they reference the SSOT.
When the Time Comes to Incorporate, Just Update the SSOT and All Gates Follow
If Yakumo decides to incorporate in the future, this design requires only one update. By updating company.positioning and forbiddenVocabulary in brand.ts, Gate G10 of blog-review will automatically operate with the new rules.
A mechanism that protects narrative consistency can also follow when the narrative itself changes. That is the essential value of SSOT design.