Engineering content-gen 19 min read

Automating Brand Rule Checks — Context-Aware Detection of Self-Reference vs. External Reference

Prevent referring to Yakumo as a "company" with a context-aware gate. Learn to combine proximity analysis, allow-lists, and regex.

Published 2026-05-24 森本拓見

When integrating brand rule checks into a pre-publish gate, the first idea that often comes to mind is adding forbidden words to an array. However, this approach hits a wall: the inability to distinguish between self-reference and references to other companies. The moment you add “company” to the forbiddenWords in brand.ts, the gate starts blocking legitimate phrases like Market Enterprise Co., Ltd. (a formal name of another company) or so-called AI development companies (criticism of others). This is exactly why context-aware detection is necessary.

→ For the overall gate structure, please read Structural Design of mcluhan: The Operations Engine for Owned Media first. This article focuses specifically on the design of G10 (Brand Rule: Self-Reference Check).


Why a Flat Forbidden Word List Fails

Four Legitimate Patterns That Trigger False Positives

If you detect “company” or “firm” as forbidden words using flat string matching, all of the following descriptions will fail:

  1. Formal Names of Other Companies: “Served as the Head of AI DX Promotion at Market Enterprise Co., Ltd.”
  2. Contrast with Other Company Categories: “Unlike Web production companies, Yakumo focuses on business redesign.”
  3. Criticism of Other Companies: “Many so-called AI development companies only call APIs without designing business processes.”
  4. Past Experience: “Based on experience at a company I previously co-founded.”

All four patterns explicitly fall under the exception clause of the BRAND.md rules: Do not call Yakumo itself a 'company.' Mentioning other companies is not an issue. When a gate blocks these, it’s proof that the implementation has diverged from the rule’s intent.

Brand Rules Apply Only to “References to Yakumo Itself”

To restate the rule more accurately: “Do not use terms like ‘company,’ ‘firm,’ or ‘corporation’ when Yakumo is the subject or in close proximity.” It is perfectly fine if the subject is another organization.

Implementing this distinction requires context judgment to determine “who is being referred to,” rather than just string matching. While full NLP analysis (morphological analysis or dependency parsing) is too costly to implement, a practical solution is a combination of proximity analysis, allow-lists, and regex patterns.


Detecting Self-Reference via Proximity Analysis

Logic for Determining if “Yakumo” Appears Within N Characters

The concept of the proximity window used for self-reference judgment is simple. When the word “company” is found, if “Yakumo,” “Yakumo (in Japanese),” or “we” exists within N characters before or after it, it is judged as a self-reference.

function isProximateSelfReference(body: string, match: RegExpMatchArray): boolean {
  const PROXIMITY_WINDOW = 50; // Number of characters (before and after)
  const SELF_IDENTIFIERS = ['Yakumo', '八雲', '私たち'];

  const start = Math.max(0, match.index! - PROXIMITY_WINDOW);
  const end = Math.min(body.length, match.index! + match[0].length + PROXIMITY_WINDOW);
  const window = body.slice(start, end);

  return SELF_IDENTIFIERS.some(id => window.includes(id));
}

A failure is triggered only if a self-identifier is included within a 50-character window on either side. Direct self-references like “Yakumo is an AI development company” are reliably detected because the identifier and forbidden word fall within 20 characters of each other.

Designing the Appropriate Proximity Window Size

Choosing the window size is a tradeoff.

Window SizeCharacteristics
Less than 20 charactersIncreases false negatives (fails to catch cases where “Yakumo” and “company” are slightly separated).
50 charactersA practical middle ground. Generally covers mentions within a single sentence.
More than 100 charactersIncreases false positives (where “company” in the previous sentence is incorrectly linked to “Yakumo” in the next).

The G10 implementation in scripts/blog-gate.ts adopts a 50-character window. This is based on the rationale that a single sentence, such as “Yakumo is a company that places AI agents at the center of its business,” generally fits within 50 characters.


Designing with an Allow-list as the SSOT

Defining externalOrgAllowlist in brand.ts

Avoid hardcoding the allow-list within scripts/blog-gate.ts. Embedding exceptions directly into the gate script would require modifying the script every time a new organization name appears in an article.

The correct design is to define it as the SSOT (Single Source of Truth) in src/config/brand.ts.

// src/config/brand.ts
externalOrgAllowlist: [
  '株式会社マーケットエンタープライズ',
  'Market Enterprise',
] as const,

A simple array of literal strings is sufficient for the type definition. Adding as const allows TypeScript to infer the types accurately and detect typos early.

Currently, two organization names are registered in externalOrgAllowlist.

Referencing from the Gate Script

The gate script imports brand.ts to reference the allow-list.

// scripts/blog-gate.ts
import { brand } from '../src/config/brand';

function isInAllowList(text: string): boolean {
  return brand.externalOrgAllowlist.some(org => text.includes(org));
}

Any string containing a phrase from the allow-list is passed, even if it contains “company” or “firm.” For example, a sentence like “At Market Enterprise Co., Ltd…” is excluded before proximity analysis because it contains a substring that perfectly matches the allow-list.


Structuring External Reference Patterns with Regex

Three Patterns: Contrast, Criticism, and Past Experience

While allow-lists handle known organization names, they cannot cover generic external references like “differences from other AI development companies.” For these cases, we use regex patterns.

// src/config/brand.ts
externalReferencePatterns: [
  // Category contrast: "Other ... companies" or "Compared to Web production companies"
  /他の[^。]{0,30}?(会社|企業)/g,
  // Criticism: "So-called ... companies"
  /自称[^。]{0,30}?(会社)/g,
  // Category + Company: General mentions of "development companies" or "production companies"
  /(他|別の)[^。]{0,30}?(事業|開発)[^。]{0,30}?会社/g,
  // Past experience: "The company I founded" or "A company I co-founded"
  /(共同|過去に)?創業(した|していた)?会社/g,
  // Web production companies (frequent category mention)
  /(Web|web)\s*制作会社/g,
  // English patterns
  /other\s+(AI\s+)?(development\s+)?(firms|companies)/gi,
  /self-described\s+["「]?AI\s+(companies|firms)["」]?/gi,
] as const,

Defining and Managing externalReferencePatterns

Each pattern uses regex to capture contexts referring to external organization categories. [^。]{0,30} means “up to 30 characters until a period,” which prevents misdetection across sentence boundaries.

Currently, seven patterns are defined in externalReferencePatterns.

The gate script treats substrings matched against these patterns as allowed:

function isExternalReferenceContext(text: string, matchStart: number, matchEnd: number): boolean {
  const matchedText = text.slice(matchStart, matchEnd);
  return brand.externalReferencePatterns.some(pattern => {
    // pattern.exec targets a substring including the matchedText
    const searchWindow = text.slice(
      Math.max(0, matchStart - 60),
      Math.min(text.length, matchEnd + 60)
    );
    return pattern.test(searchWindow);
  });
}

Before the G10 implementation, flat grepping produced four false positives. These are listed in the root cause section of blog-ops/retros/2026-05-18-context-aware-company-gate.md: “Market Enterprise Co., Ltd.”, “A company I co-founded”, “So-called AI development company”, and “Unlike Web production companies”.

The implementation date for G10 was 2026-05-18, commit d1e3ae7 (feat(brand/gate): G10 catches Yakumo self-reference as 会社 / firm, 06:12:34 +0900).


Validating with the Four Quadrants Test Case

Four Quadrants: Self-Reference (NG) / External-Known (OK) / External-Pattern (OK) / Unknown (WARN)

Validating a context-aware gate starts with clarifying what passes and what doesn’t. We designed test cases organized into four quadrants.

QuadrantInput SentenceExpected ResultRationale
Self-Reference (NG)Yakumo is an AI-driven company.Fail”Yakumo” is within the proximity window.
External-Known (OK)Served as the Head of AI Promotion at Market Enterprise Co., Ltd.PassMatches externalOrgAllowlist.
External-Pattern (OK)The difference from other AI development companies lies in business design.PassMatches externalReferencePatterns.
Unknown (WARN)That company is excellent.WarningNo proximity, no allow-list, no pattern. Not a reference to Yakumo, but context is unclear.

Actual fixture files mix examples like the following:

<!-- Expected Fail -->
Yakumo is recognized as an AI-powered company.

<!-- Expected Pass: Includes formal name of another company -->
I have experience working at Market Enterprise Co., Ltd. in the past.

<!-- Expected Pass: Mention of another company category -->
Many other AI development companies take an approach of just calling APIs.

<!-- Expected Warning: Unclear context -->
I respect that company's decision.

The test fixture file for G10 has not yet been created. blog-gate.test.ts does not exist, and the only *.test.ts files in corporate-site are in the e2e directory.


Summary — Principles for the Reader

A context-aware brand rule detection system consists of three layers:

  1. Allow-list (Exact Match → Exclude): List known external organization names in brand.ts as the SSOT. The gate script simply imports and references this.
  2. ExternalReferencePatterns (Regex → Exclude): Structure patterns for other company categories, criticism, and past experience using regex. Limit the window with [^。]{0,30} to avoid misdetection across sentences.
  3. Proximity Analysis (Remaining Matches → Judge): Apply the proximity window only to matches remaining after exclusion by the allow-list and patterns. If a self-identifier exists within 50 characters before or after, it fails; otherwise, it triggers a warning.

By applying these three layers in order, you can achieve practical accuracy without full NLP analysis. Separating the brand rule SSOT (brand.ts) from the gate implementation (blog-gate.ts) allows for adding to the allow-list without modifying the gate logic itself.

For more details on allow-list design, please refer to Allow-list Driven Brand Checks — Designing Exceptions via Organization Name SSOT. The tradeoff between false positives and false negatives for the overall gate is handled in Pre-publish Gate Design — Balancing False Positives and False Negatives.