Written for senior engineers and content operations designers who want to build their own pre-publish gate. Full disclosure of Yakumo’s implementation: a 4-state machine, Gates G1-G10, SSOT, and a self-improving pipeline through accumulated retros. Business perspective on this engine—management decisions, quality investment, and transparency strategy—is in the sister pillar 「Owned Media Operations in the AI-Scale Era — mcluhan’s Management Decisions and Quality Control Investment」. This article focuses on the engine structure itself.
mcluhan is a general-purpose owned media operations engine Yakumo runs. It takes responsibility for: article state management (drafting / audit-pending / scheduled / published), 10 pre-publish mechanical checks, automated drip scheduling, and an accumulated learning loop that turns operational discoveries into article material. Every plumbing that owned media needs before the articles go out.
Why we built it: in May 2026, Yakumo published 48 AI-assisted articles in a single wave and pulled them all in 5 days (the full account is in 「How We Pulled an AI-Generated Blog in 5 Days」, referred to here as “Phase 0”). HUMAN_INPUT markers left in body text, dead /blog/ links, tag inconsistencies everywhere, featured articles all failing the quality audit—every one of these was something a machine could have caught before human review.
mcluhan is what we built in response. Named after Marshall McLuhan’s “The medium is the message.” The message the medium itself conveys lives not in the surface articles, but in the engine operating them—this was the design intent.
And the article you are reading right now is itself one dogfooding output of mcluhan. It passed through the 4-state machine and all 10 Gate items described below before publication. This document discloses the full structure.
Why Build the Engine In-House — Connection to the Business
Before getting into the technical design decisions, a word on why we wrote this ourselves rather than using a SaaS.
Yakumo’s goal was not a tool for writing articles—it was an AI agent system with structural discipline for all of owned media: SSOT + context-aware Gates + mechanical verification of reviewer scope. We couldn’t find a SaaS that provided this operations layer. In addition, combining Astro / Vercel / Claude Code with AI assistance, the Phase 1 implementation—SSOT + Gate + scheduler + retro accumulation—came together in a matter of days. The build cost was less than the cost of contorting a SaaS to fit our requirements. High operational complexity, low build cost—that’s why we wrote it ourselves.
Business details (contract-to-license conversion, external sales plan, dogfooding significance) are in the business spoke 「AI Content Quality Gate ROI」. From here, we look at the engine structure itself.
Why Owned Media Needed a Dedicated Engine
The lesson from Phase 0 was that failure detection and failure prevention are different things.
Running the audit made everything visible as machine-readable data: HUMAN_INPUT markers in body text, dead links, tag inconsistencies, featured selection failures, frontmatter with 37 articles published on the same day. Simple failures that a single grep line would catch had gone through to the publish button with no one stopping them.
How do we prevent the same failures next time? The first answer that comes to mind is “humans just need to try harder.” That is the wrong answer. A pipeline that stops only when humans try hard enough will always eventually break. Tired days, rushed days, days when judgment is off—those days are certain to come.
The right answer was: “Failures that machines can mechanically detect should be stopped by machines.”
The Engine’s Responsibility Boundary
What mcluhan takes on:
- Pre-publish inspection (HUMAN_INPUT residue, dead links, tag inconsistencies, author coherence, publication time spacing)
- State management (drafting / audit-pending / scheduled / published)
- Publication scheduler (triggers build at the specified time)
- Transparency display (AI-assisted footer)
- Improvement loop (accumulates operational learning as retros)
What remains for humans to decide:
- Voice / style / tone
- The quality and originality of primary information
- The final call on whether to publish
- Editorial vision (what to write, what not to write)
The line between “what can be mechanized” and “what only humans can do” was consciously drawn in the design.
4-State Machine — Decoupling Writing from Publishing
The core of mcluhan is a 4-state machine expressed in frontmatter.
[Drafting] → [Audit Pending] → [Scheduled] → [Published]
draft:true draft:true draft:false draft:false
reviewed:f reviewed:f reviewed:true reviewed:true
scheduled:- scheduled:- scheduled:future scheduled:past
State is expressed directly in article frontmatter. No dedicated DB or queue. The article’s metadata is itself the state.
---
title: "..."
publishedAt: 2026-05-19
draft: true # state flag
reviewed: false # state flag
reviewedBy: "" # auditor's slug
scheduledAt: null # scheduled publish time
authorSlug: takumi-morimoto
aiAssisted: true
---
The build filter outputs only “Published” articles to the magazine:
// src/lib/magazine.ts
export async function getAllPosts(): Promise<MagazineEntry[]> {
const now = new Date();
const posts = await getCollection('magazine', ({ data }) => {
if (data.draft) return false; // drafting or audit-pending
if (!data.reviewed) return false; // audit incomplete
if (data.scheduledAt && data.scheduledAt > now) return false; // future scheduled time
return true;
});
return posts.sort(/* ... */);
}
The advantage of this design is that it decouples the rhythm of writing from publishing.
Batch-write articles with AI on weekends → accumulate in drafts. Editor reviews daily on weekdays → switches to reviewed:true + assigns future scheduledAt. Vercel cron triggers daily rebuilds → only articles whose scheduledAt has passed become newly published.
The moment you separate “time to write” from “time to publish,” you can sustain both mass-production mode and quality mode simultaneously.
Gate — 10+7 Rules for Pre-Publish Inspection
scripts/blog-gate.ts scans all src/content/magazine/**/*.md and runs required checks G1-G10 and warning checks W1-W7 against each. Because it runs before every build on npm run prebuild, articles containing failures physically cannot be shipped in the build.
Required Checks (fail → build stops)
| # | Check | Failure Condition |
|---|---|---|
| G1 | HUMAN_INPUT marker | Residual <!-- HUMAN_INPUT --> / HUMAN_INPUT[TAG] / line-leading colon form |
| G2 | Dead link | Markdown link format ]\(/blog/...\) remaining |
| G3 | Tag SSOT coherence | frontmatter tags contains a value not in tagCatalog |
| G4 | Track-category match | Mismatch between frontmatter category and physical file path |
| G5 | Reviewer exists | When draft: false, reviewedBy is not present in members.ts |
| G6 | Reviewer scope | reviewedBy’s reviewScope does not cover the article’s track |
| G7 | scheduledAt required | When draft: false, scheduledAt is missing |
| G8 | Time-slot coherence | scheduledAt time is not in publishPolicy.timeSlots |
| G9 | authorSlug exists | authorSlug missing / not present in members.ts |
| G10 | Self-reference detection | Refers to Yakumo itself as kaisha (会社) / kigyō (企業) / firm / company |
Warning Checks (build continues)
| # | Check | Warning Condition |
|---|---|---|
| W1 | Rate-limit (daily) | More than 5 scheduledAt values on the same day |
| W2 | Rate-limit (weekly) | More than 18 in the same week |
| W3 | Rate-limit (monthly) | More than 80 in the same month |
| W4 | Description length | Under 60 chars or over 130 chars |
| W5 | Title length | Under 18 chars or over 60 chars |
| W6 | Pillar inbound link | Spoke article has no internal link pointing to its parent pillar |
| W7 | Series coherence | series value not present in seriesCatalog |
Context-Aware — Precision of Mechanical Detection
Initially, simple grep was used. But as soon as we tried to write articles that explain failure cases in their body text—like this article—the Gate would false-positive on HUMAN_INPUT or /blog/... patterns used in the explanation itself. Running a failure-disclosure owned media requires the Gate to handle meta-description.
The solution was pre-stripping markdown code context.
function stripCodeContexts(body: string): string {
// Remove fenced code blocks (```...```)
let result = body.replace(/```[\s\S]*?```/g, '');
// Remove inline code (`...`)
result = result.replace(/`[^`\n]+`/g, '');
return result;
}
After stripping, detection is limited to specific formats:
- HTML comment:
<!--\s*HUMAN_INPUT - Bracket tag:
HUMAN_INPUT\[[A-Z_]+\] - Line-leading colon:
^HUMAN_INPUT: - Markdown link form:
\]\(/blog/[^)]+\)
This way, “inline code explaining HUMAN_INPUT” is excluded, and only “actual markers left as HTML comments” fail.
G10 — Applied Context Judgment
Because Yakumo is not incorporated, self-references like “Yakumo is a kaisha (会社, company)” or “Yakumo is a firm” are prohibited. But “Market Enterprise Co., Ltd. (株式会社マーケットエンタープライズ)” (a client’s formal name) or “self-described AI development company” (a description of another company’s category) are fine. The same word “company” is OK or NG depending on context.
G10 determines this with an allow-list and a pattern-list:
// src/config/brand.ts
externalOrgAllowlist: [
'Market Enterprise Co., Ltd.',
'Market Enterprise',
],
externalReferencePatterns: [
/他の[^。]{0,30}?(会社|企業)/g, // JA: "other ... company/firm"
/自称[^。]{0,30}?(会社)/g, // JA: "self-described ... company"
/other\s+(AI\s+)?(development\s+)?(firms|companies)/gi,
// ...
],
Detection flow:
- Extract
kaisha (会社)/kigyō (企業)/firm/companyfrom body - If the match is contained within an allowlist string, exclude
- If within a substring matching an external pattern, exclude
- For remaining matches, if Yakumo / Yakumo (八雲) / we (私たち) appears within 50 characters: fail
- Otherwise: warning (requires human check)
Context-awareness is implemented with two approaches: “stripping-based” and “allow-list + pattern-based.” The same pattern can be applied to other rules (“heisha (弊社, formal ‘our company’)” / “tōsha (当社, ‘our firm’)” / “internal terminology” etc.) going forward.
SSOT Groups — Making Inconsistencies and Prohibitions Machine-Readable
All values referenced in frontmatter are consolidated in SSOT files.
| File | Responsibility |
|---|---|
src/config/tags.ts | TagKey list (~25 types). Frontmatter tags cannot use values outside the SSOT |
src/config/pillars.ts | Pillar article slug declarations. Spoke articles get a W6 warning if no inbound link to parent pillar |
src/config/series.ts | MagazineSeriesId SSOT. Frontmatter series cannot use values outside the SSOT |
src/config/members.ts | Author / reviewer definitions. canReview / reviewScope verify G5/G6 |
src/config/publish-policy.ts | timeSlots / rateLimits / aiAssistedFooterTemplate / editorNoteCadence |
One of the failures discovered in Phase 0 was 77 unique tags, 55 of which were isolation tags. Label inconsistencies like agents / agent-design / agent-design (three forms of the same concept) were destroying topical authority. The combination of SSOT and Gate G3 physically closes the path by which new articles can introduce inconsistencies.
Frontmatter references are simple string keys:
tags: [content-ops, quality-control, blog-audit]
series: mcluhan-engine
authorSlug: takumi-morimoto
reviewedBy: takumi-morimoto
If the string content-ops does not exist in tagCatalog['content-ops'], the Gate fails. If mcluhan-engine doesn’t exist in seriesCatalog, frontmatter validation via zod strips it.
Change the SSOT and every article’s labeling changes. Don’t change the SSOT and every article uses the same labeling. Inconsistencies structurally cannot occur.
Vercel Cron Scheduler — Automated Drip Publishing
Articles with a future scheduledAt are not included in the build. But if Vercel Cron triggers periodic rebuilds, articles whose scheduledAt has passed become newly published with the next build.
// vercel.json
{
"crons": [
{
"path": "/api/cron/rebuild-magazine",
"schedule": "0 0,2,5,7,9 * * *"
}
]
}
Cron trigger times are UTC 00 / 02 / 05 / 07 / 09—in JST that is 09 / 11 / 14 / 16 / 18, which matches exactly with publishPolicy.timeSlots.
// src/config/publish-policy.ts
timeSlots: ['09:00', '11:00', '14:00', '16:00', '18:00'] as const,
An article with scheduledAt: 2026-05-20T11:00:00+09:00 is first published when the cron triggers a rebuild at 02:00 UTC on May 20. Separating publishedAt (for display) from scheduledAt (for the machine) ensures that the display shows “published 2026-05-20” while Google’s discovery date is the actual build trigger time—both are consistent.
The body of /api/cron/rebuild-magazine is a simple proxy that fetches the Vercel Deploy Hook:
// src/pages/api/cron/rebuild-magazine.ts
export async function GET() {
const hookUrl = process.env.VERCEL_REBUILD_HOOK_URL;
if (!hookUrl) return new Response('OK (no hook)', { status: 200 });
await fetch(hookUrl, { method: 'POST' });
return new Response('rebuilt', { status: 200 });
}
No matter how far in the future scheduledAt is set, the article won’t publish until that time arrives. Conversely, if scheduledAt is in the past, the article will publish with the next cron trigger. Complete control over “when to publish” with a single frontmatter field.
AI-Assisted Footer — Designing for Transparency
Every article’s bottom section displays the fact of AI assistance and the reviewer’s name.
# publishPolicy.aiAssistedFooterTemplate
ja: 'この記事は AI 支援でドラフトされ、{reviewer} が {reviewedAt} にレビューしました。'
en: 'This article was drafted with AI assistance and reviewed by {reviewer} on {reviewedAt}.'
src/components/magazine/article/ArticleFooterMeta.astro reads aiAssisted / reviewedBy / reviewedAt from frontmatter, looks up the reviewer name from members.ts, and expands the string per locale.
Why display it? Because Google’s ranking signals for AI-generated content are moving toward rewarding “not hiding AI use” and “explicitly stated accountability.” The risk of hiding it—and being judged as deceptive—is higher.
Many owned media operations try to erase the fact that AI wrote the content. mcluhan’s design goes the opposite direction: AI assistance is displayed as publicly committed fact, and accountability is handed to the reviewer. This is the design that satisfies the oversight part of the Google Scaled Content Abuse judgment boundary (unique value × volume × oversight).
Retro Accumulation — The Pipeline as Its Own Improvement Target
Issues, fixes, and learnings discovered while operating mcluhan are recorded in blog-ops/retros/{YYYY-MM-DD}-{slug}.md. The blog-retro skill writes to this format.
---
date: 2026-05-18
short_slug: gate-false-positive
title: "Gate's grep false-positived on a failure-disclosure article"
trigger: "..."
related_pillar: "2026-05-magazine-reset-timeline"
spoke_potential:
- "Pre-publish gate design — balancing false positives and false negatives"
- "Gate handling when writing articles that describe the Gate itself"
status: captured
---
## Background
## Root Cause
## Fix
## Learning
## Derived Spoke Candidates
As of this session, 3 retros have accumulated:
2026-05-18-gate-false-positive: Plain grep in G1/G2 false-positived onHUMAN_INPUTexplanation in article body → resolved with context-aware stripping2026-05-18-naming-rule-english-parity: Defined the “kaisha (会社, company)” rule in Japanese but missed parallel definition for English company/firm → BRAND.md rule definitions need locale parity2026-05-18-context-aware-company-gate: Spec for context-discrimination when mechanically checking the “kaisha/kigyō (会社/企業, company/firm)” rule → implemented as G10
Each retro becomes material for future spoke articles. “Pre-publish gate design,” “multi-locale brand rule SSOT,” “context-aware static analysis”—the retros already contain enough material to write abstracted spokes on each.
In other words, mcluhan is also an engine that generates the next article each time it runs. Failure detection → systematization → article production → further detection: a loop where the pipeline itself is built in as an improvement target. The most literal implementation of Marshall McLuhan’s “the medium is the message” is right here.
The Name — Why mcluhan
Marshall McLuhan was a media theorist of the 1960s. His most famous phrase: “The medium is the message”—the message that a medium itself conveys lives not in the surface content, but in the structure of the medium carrying that content.
Applied to owned media: what visitors perceive is not just the text of the articles. “How does this organization operate its articles?” “How does it guarantee quality?” “How does it disclose failures?”—the pipeline design itself communicates editorial attitude.
That is why the engine is named mcluhan. Articles flow through and pass on. The engine design stays.
Incidentally, src/lib/bateson/ follows the same naming principle. Named after Gregory Bateson—anthropologist and cybernetician. Yakumo’s src/lib/ namespace is consistently named after thinkers and critics in “human-information systems theory.” bateson is the booking engine (the site’s sales agent), mcluhan is the owned media engine (the site’s marketing agent)—two faces of what visitors experience, running as independent engines, each carrying a thinker’s name.
The Road to Generalization — Can Other Organizations Run This?
mcluhan is currently dogfooding at Yakumo’s corporate site. The future plan is to extract it as a standalone version that other organizations can install on their own owned media. It inherits the same design principles as bateson:
- No dependence on corporate-site (reverse dependency only)
- Every public API has
tenantIdas an argument - Adapter pattern allows external system replacement (storage / ai / publish / analytics)
- Templates and brand elements are injected (pillar definitions, authors, categories are tenant config)
- UI is headless-leaning (same core called from agents, CLI, or UI)
Anticipated structure after extraction:
src/lib/mcluhan/
├── core/
│ ├── types.ts # Article / Author / Tag / Series / Pillar / Reviewer types
│ ├── content.ts # createContentCore(adapters) — adapter DI
│ ├── editor-policies.ts # per-track policy validation
│ └── gate.ts # pre-publish gate logic
├── adapters/
│ ├── storage/ # GitHub / Notion / filesystem
│ ├── ai/ # Claude / OpenAI
│ ├── publish/ # Vercel / Cloudflare
│ └── analytics/ # Search Console / GA4
└── ui/
In the current Phase 1, src/config/*.ts and scripts/blog-gate.ts play the role of core. When generalizing, these will be migrated to src/lib/mcluhan/core/. Plan: agent-ification in Phase 4, extraction as a standalone distributable in Phase 5.
Summary — What the Engine Differentiates
In the AI mass-production era, owned media differentiation comes down to how the engine is built. In an era where AI can write the articles, what remains as differentiating factors:
- A system where machines stop failures that machines can mechanically detect (Gate)
- Design that decouples state management and publication timing (4-state machine)
- A structure that physically closes the path for labeling inconsistencies and prohibitions via SSOT (tags / pillars / series / members)
- A scheduler that automates drip publishing (Vercel cron + scheduledAt)
- A footer that guarantees AI-use transparency (accountability)
- Retro accumulation that builds the pipeline itself in as an improvement target (self-referential loop)
The time humans spend is limited to voice design, editorial judgment, and generating primary information. Everything else is handled by the engine.
And this article itself, as the second production output from mcluhan’s first run, was scheduled for the 5/19 11:00 JST slot, triggered by Vercel cron, passed Gate G1-G10, approved by reviewer takumi-morimoto, and is now being read.
The medium is the message—the design of the engine is the message of the Yakumo magazine.