Business content-gen 15 min read

Corporate Vocabulary Proximity Detection: Solo Practitioner Brand Consistency

A hybrid design for detecting corporate terms using forbiddenWords and proximity rules to protect the solo practitioner narrative.

Published 2026-05-24 森本拓見

The HUMAN_INPUT marker mentioned in this article is a placeholder left by the AI writing skill to indicate where a human should later fill in a specific value (e.g., <!-- HUMAN_INPUT: Fill in numerical value -->).

When you start mass-producing articles with AI, there is a landmine you might step on without realizing it: corporate-implying expressions like “our company,” “internal,” or “in-house” naturally mix into the generated text. For solo practitioners or small teams yet to incorporate, this isn’t just a matter of wording. It breaks the consistency of the brand narrative, creating a gap between the reader’s perception and the reality. The structural solution to this problem is a hybrid detection design combining forbiddenWords and contextual proximity detection.

The specific trigger occurred during the writing of a pillar article scheduled for publication on 2026-05-22. When we ran blog-fact-check, corporate-implying expressions like “operated by our company” or “completed in-house” were detected across multiple pillar and spoke articles. Since Yakumo is a solo practitioner and not yet incorporated, these are all narrative inconsistencies (see the retro from 2026-05-20). While the total count wasn’t recorded at the time, the fact that they appeared across multiple articles was enough to decide that we needed a layer of machine prevention before human review.


Why Simple forbiddenWords are Not Enough

Terms like “Our Company” Can be Detected with Simple Matches

Terms like “弊社” (heisha - our company), “当社” (tousha - our company), or “我が社” (wagasha - our firm) are expressions Yakumo must not use regardless of context. These can be covered by simple string matching.

# Equivalent to forbiddenWords in brand.ts
forbidden_exact:
  - "弊社"
  - "当社"
  - "我が社"
  - "弊所"

These are NG in any context. The only cases that should be handled via an allow-list are when quoting someone else’s statement (e.g., quoting a representative from Company A saying, “Our company is collaborating with…”).

Terms like “In-house” are Context-Dependent: OK for Generalities, NG for Self-Reference

The challenge lies in terms like “自社” (jisha - one’s own company) or “社内” (shanai - in-house/internal).

  • “By inventorying your in-house tech stack, engineer hiring criteria become clear.” → OK (General theory for the reader’s organization)
  • “At Yakumo, we use AI agents for our in-house blog operations.” → NG (Referring to ourselves)

With simple string matching, even “in-house” used as a general theory for the reader would be flagged as NG, causing a massive number of false positives. This could end up negating the very point of the article.

Designing Allow-lists to Exclude External Quotes

To maintain detection accuracy, an allow-list is necessary.

allow_patterns:
  - context: "quote"  # When quoting statements from other companies/people
    example: "The official stated, 'At our company...'"

In designing an allow-list, it is more practical to focus on reducing false positives (erroneous detections) while tolerating some false negatives (overlooked issues). Overlooked issues can be caught during human review. Erroneous detections, on the other hand, break the article’s logic.


Proximity Detection Design: The 50-Character Rule

Detect Only When Terms like “In-house” Proximity to “Yakumo” Within 50 Characters

The design established in the 2026-05-20 retro is as follows:

If terms like "in-house" or "own company" appear:
  → Check if "Yakumo" exists within 50 characters before or after
  → If it exists: NG (High probability of referring to Yakumo)
  → If it doesn't: Skip (Treat as a general theory for the reader's organization)

The threshold of 50 characters roughly corresponds to a single Japanese sentence (average 40–60 characters). The design is intended to capture cases where a self-reference and a corporate expression co-occur in the same clause or sentence.

Rationale and Adjustment for Proximity Distance

The 50-character value is based on “typical Japanese sentence length.” It’s possible to expand the threshold for articles with many long sentences or narrow it for those with primarily short sentences.

When adjusting, use the following criteria:

  • Too many false negatives (NG but overlooked): Expand the threshold
  • Too many false positives (OK but flagged as NG): Narrow the threshold

The Trade-off Between False Positives and False Negatives

Perfect detection does not exist. The purpose of the design is to have a machine catch violations that humans easily miss. It should be positioned as a machine filter inserted before human final judgment, not complete automation.

False positives (flagging correct usage as NG) degrade article quality. False negatives (overlooking violations) can be caught by human review. The design priority is to suppress false positives.


Double Defense via SSOT and Gate

Converting Regular Vocabulary into SSOT in brand.ts and BRAND.md

Rules for corporate vocabulary are converted into SSOT in two places:

  1. src/config/brand.ts: Structured data referenced by programs

    company: {
      size: "solo",
      positioning: "Solo Practitioner",
      forbiddenVocabulary: ["弊社", "当社", "我が社", "弊所"],
      contextualForbidden: {
        words: ["自社", "社内"],
        proximityTargets: ["Yakumo"],
        proximityChars: 50,
      }
    }
  2. src/_docs/BRAND.md: Natural language guidelines referenced by humans (writers and AI skills)

    • Include voice, prohibitions, and recommended vocabulary in the Organization Narrative section.
    • Explaining the “why” behind these prohibitions makes it easier for the skill to understand the context.

Encoding Narratives into Writing Skills (blog-tech-write / blog-case-write)

Directly encode narrative prohibitions into the definitions of blog-tech-write and blog-case-write. This is a mechanism where the skill self-checks “am I using corporate terms in a context that refers to myself?” during text generation.

However, this is only a primary defense. LLMs can misinterpret context. Encoding on the skill side is about “significantly reducing violations,” not “preventing them entirely.”

Final Machine Check via blog-review Gate G10

The last of the three layers is Gate G10 of blog-review. This gate performs a machine check on the output from the writing skill, referencing the forbiddenVocabulary and contextualForbidden settings in the SSOT (brand.ts).

Writing Skill (Primary Defense: Narrative Encoding)

blog-fill-ssot (Complementing SSOT-derived HUMAN_INPUT)

blog-review Gate G10 (Secondary Defense: Machine Detection) ← Final check of corporate vocabulary here

Human Review (Tertiary Defense: Final Judgment)

With this design, a multi-layered defense is established: “prevent at the writing stage,” “catch with machines after generation,” and “final confirmation by humans.”


Conclusion: Protect Narrative Consistency through Structure

Ad-hoc Reviews Recur. A Three-Layered Approach (SSOT → Skill Encoding → Gate) is Necessary

The operation of “just being careful during each review” does not work. When mass-producing with AI, missing a check is a matter of probability, and the likelihood of violations increases as the number of articles grows.

Protecting through structure means establishing these three layers:

  1. SSOT: Normalize prohibitions in two places: brand.ts and BRAND.md.
  2. Skill Encoding: AI understands and avoids prohibitions at the time of writing.
  3. Gate: A final checkpoint to confirm generating results mechanically.

Once this structure is in place, the same standards will apply even if a new member writes an article or we switch to a new AI model, as long as they reference the SSOT.

When the Time Comes to Incorporate, Just Update the SSOT and All Gates Follow

If Yakumo decides to incorporate in the future, this design requires only one update. By updating company.positioning and forbiddenVocabulary in brand.ts, Gate G10 of blog-review will automatically operate with the new rules.

A mechanism that protects narrative consistency can also follow when the narrative itself changes. That is the essential value of SSOT design.