Business ai-driven-dev 43 min read

How We Pulled an AI-Generated Blog in 5 Days — A Full Timeline of Detection, Diagnosis, and Emergency Response

We published 48 articles in 5 days, then pulled them all. 11 failures, HUMAN_INPUT markers exposed, 76 internal links dead. Here is the complete record—from detecting the problem before Google indexed us, to rebuilding the pipeline.

Published 2026-05-21 Takumi Morimoto

On May 8, 2026, Yakumo published 48 articles written with AI assistance (47 magazine pieces + 1 editor’s note) in a single wave. Ten days later, on May 18, we pulled all 48. A quality audit had found 11 failures, 4 articles with exposed HUMAN_INPUT markers, 76 internal links dead—100% mortality—and all 3 featured articles failing the quality bar. We caught it before Google’s crawlers had fully indexed the subdomain.

This document is a complete record of the whole process. If you’re reading it before making the same mistake, you’re in luck. If you’re reading it after, the procedures here should apply directly.


Executive Summary — 3 Conclusions

The conclusion first.

We used AI to mass-produce 48 articles and published them in a single batch. A quality audit found 11 failures. Exposed HUMAN_INPUT markers and 100% dead internal links compounded the structural damage. We decided to pull everything before Google finished indexing, and we did it the same day. The choice we made afterward was not to hide what happened—it was to systematize it and disclose it.

Key Takeaways:

  1. Mass production is a given; what comes after is the real design challenge. The ability to generate large volumes of articles with AI is no longer a differentiator. What separates sustainable owned media from the rest is whether you have an operational design for detecting failures, executing a retreat, and rebuilding.

  2. Failures that machines can detect should be stopped by machines. HUMAN_INPUT markers left in body text, dead internal links, tag inconsistencies—every one of these was a failure a script could catch. Relying on human review to catch them was the root structural failure. A pre-publish gate would stop them mechanically next time.

  3. Disclosing failure rather than hiding it is an E-E-A-T strategy. Sharing the numbers—“37 articles published in one day, 76 internal links dead, 4 articles with HUMAN_INPUT markers”—is primary information only an organization that actually went through it can produce. AI search engines cite exactly this kind of specific, first-hand record, not generic explainer articles.

Technical implementation details (middleware code, Python scripts, curl verification) are in #tech-detail at the end of this article. If you want the business judgment and operational design first, keep reading.


Phase 1 — What Was Broken (Failures Found via a 5-Axis Scoring System)

Ten days after publication, the articles felt off. Instead of reading them one by one and judging by feel, we switched to a decision to score all 48 mechanically. We needed to surface structural failures in a measurable form—the kind that subjective review would miss.

What the 5-Axis Scoring Measured

We defined five evaluation axes: A1 Structure, A2 Primary Information, A3 Track Coherence, A4 SEO, and A5 Originality. Each axis was scored 1–5, giving a total of 25 points per article across the full set.

The reason for choosing five axes was to eliminate the “it felt okay” or “it felt thin” impressions. When individual readers apply varying personal standards, you can’t meaningfully compare 48 articles. By fixing measurable axes before scoring, we could show the basis for business decisions in numbers.

Full definitions for each axis (scoring criteria, thresholds, script implementation) are in #tech-detail.

Results Summary — pass 22 / weak 15 / fail 11

Of 48 articles: pass 22, weak 15, fail 11. A failure rate of 23%. Nearly 1 in 4 articles did not meet the publication standard.

ResultCountRatio
pass2246%
weak1531%
fail1123%

More alarming was the result for featured articles. All 3 articles we had promoted to the highest-visibility position on the top page were failures.

3/3 featured articles failing is not a selection error. “The articles we most wanted readers to see” were “the articles with the worst quality.” This is evidence that both editorial judgment (deciding what to surface) and quality measurement (a system to verify quality in numbers) had failed. If either had been functioning, failing articles wouldn’t have ended up at the top.

3 Severe Failure Cases

These three cases had the most serious breakdowns in the body text itself.

hybrid-dev-team-style (management / case) — The body contained 7 unfilled placeholders. Places the writing process had marked as “put the numbers in here later” had gone out with no numbers. The very end of the article still contained an internal note that read “needs verification before publishing.”

sales-docs-business-impact (sales / case) — Sentences like “If we assume sending __ proposals per month,” “combined, that’s __ hours spent on documentation per month,” and “reduced to __” had shipped with the blanks intact. Readers encountered sentences that stopped mid-thought and conveyed no meaning.

minutes-automation (claude-code / tech) — The end of the article had shipped with internal writing-process instructions verbatim. To a reader, it was completely unintelligible text. The canonical example of a quality control process that failed all the way to the end.

Structural Finding — Failures That Machines Could Have Caught Were Left to Humans

The three cases look like separate symptoms, but the root cause was the same: failures that machines could detect automatically had been left to human visual review to catch. That is a structural failure.

This is not a product problem; it is an organizational process design problem. AI honestly marked the places it didn’t know (“I need a number here, but I don’t know what it is”), and no one verified before publishing. The AI’s output wasn’t the problem—the process that received the AI’s output had broken down.

A process where machines can reliably detect failures, but humans are expected to catch them anyway, will always break down on a tired day, a rushed day, or a short-staffed day. This is not a product bug; it is a process bug.


Phase 2 — Assessing the Risk of a Google Penalty

Alongside the quality problem, another dangerous signal surfaced: the distribution of publication dates.

37 of 47 Magazine Articles on the Same Day — May 8, 2026

Of 47 magazine articles, 37 had been published on the same calendar day: May 8, 2026. That is 79%. The rest were also clustered within a few days.

“37 articles by the same author on one day” is a figure that Google’s quality system would consider physically impossible. A single writer working from sunrise to midnight at 30 minutes per article could produce 48 in 24 hours—that’s the absolute ceiling. A domain that appears to have done this belongs to either a major publisher or an AI-scale operation.

We needed to quantify what that figure meant for the domain’s trust assessment.

Google Scaled Content Abuse and Penalty Thresholds

Google formally documented “Scaled Content Abuse” in its Spam Policy in March 2024. It defines creating large numbers of pages without adding value as spam, and began running human-review-based manual penalties in June 2025.

The determination is not based on volume alone. It’s a combination of primary information, explicit editorial responsibility, and publication pace. 50 articles per month with original data and editorial review can pass. 8 articles per month of AI-generated templates may not.

That said, “37 in one day” exceeds the baseline threshold itself. No matter how high the quality of primary information, “volume exceeds what a human could write” is the first determination—one that quality cannot offset. We were not in a range where quality could save us.

Is Back-Dating Effective?

“We already published—can we change the dates afterward and overwrite the record?” This is a natural instinct, but it doesn’t work.

Google separately records the date a page was first discovered (the discovery date). Changing the displayed publish date after the fact does not change the discovery date. John Mueller has explicitly stated that retroactively rewriting dates is itself a deceptive practice that risks being flagged. A gap between displayed publish date and actual discovery date is itself material that lowers a domain’s trust score.

The “fix what we already shipped by dressing it up” route was closed. The remaining options were “stop everything and restart with a clean slate” or “keep going.” That was the real choice.


The publication date distribution was not the only problem. The structure linking articles together was also entirely broken.

Across the 47 article bodies, we found 76 internal links. All 76 were dead.

The cause was not in the articles themselves—it was upstream, in the site construction phase. When we changed the URL convention, three things failed to happen: applying the new convention to existing articles, standardizing the convention for new article generation, and building a mechanism to detect broken links. None of these were done. A structural gap in an upstream decision cascaded down to every piece of content below it.

All 76 being dead was not individual article drafting errors. It was evidence that “site construction quality” and “content operations quality” were not being managed separately. If what gets handed off at phase transitions is not documented, upstream decisions quietly destroy downstream work.

On the other hand, these dead links incidentally acted as a defensive barrier against Google’s crawl paths. It was not intentional design, but it was part of why a retreat before indexing was possible.

Tag Fragmentation and Isolation

We had 77 unique tags. Of those, 55 were isolation tags used in only one article.

Inconsistent labeling was also severe. The same concept had been scattered across three different forms:

  • agents (4 articles)
  • agent-design (1 article)
  • agent-design in Japanese script (1 article)

When articles can’t be connected through tags, topical authority doesn’t accumulate even if individual articles are strong. The content set holds no power as a collection of articles on the same theme—it’s just an isolated set. Without a structure that readers can follow as a series, they read one article and leave.

The series Field Was Spinning in Place

The navigation functionality for linking previous and next articles was already implemented. Specifying series would connect articles into a series. Despite this, not one of the 47 articles used series.

“The feature exists, but no one is using it.” This is not a code problem; it is a governance gap. If you don’t know the feature exists, you won’t use it. If using it is left to individual discretion, it won’t get used when operations are busy. Having the system is not enough to change operations.

28 of 47 articles (60%) had zero links pointing to them from other articles—isolated articles. We had intended a pillar-spoke structure, but the reality was a collection of isolated pieces.


Phase 4 — Business Decision: Delete or noindex? Disclose Failure or Not?

The technical retreat was complete, but the strategic questions were only beginning. Opening “Page indexing” in Search Console returned “Processing data. Please check again in 1 day”—the magazine.yakumo.world subdomain had been separated 3–5 days earlier, no sitemap had been submitted, and all 76 internal links were dead as described above.

In other words, almost every path by which Google could discover this subdomain’s URLs had been cut. Four coincidental defensive layers were in place simultaneously:

  1. The subdomain was new (deployed 3–5 days before) → minimal crawl budget
  2. Zero external backlinks → no discovery path
  3. No sitemap submission to Search Console → no active discovery signal
  4. 76 internal links dead → no path from the main yakumo.world either

Not intentional defenses, but structural accidents—the likelihood that indexing had not yet begun was extremely high. “If we retreat now, we can restart with a clean slate.” One week later, Google’s crawlers would have started, and the “37 articles in one day” fingerprint would have been recorded in the quality system.

The Logic Behind Delete vs. noindex

Two options for executing the retreat:

A: Physical deletion — Remove all 48 from the working tree. Clean, but the 22 pass-rated articles would be lost too.

B: noindex with draft status — Keep physical files while excluding them from the build. Preserves the option to reactivate later.

We initially leaned toward B. The 22 passes had cleared the mechanical metrics, and reactivating them seemed viable. But reading them qualitatively, most of the 22 were articles that passed the mechanical metrics but couldn’t be read as firsthand accounts—they had AI tone. The pillar-spoke structure intent was also absent; topics were scattered without cohesion.

We chose A. We physically deleted all 48 from the working tree, keeping only the git history. Every article in Phase 2 would be written entirely from scratch. The reason was narrative consistency. Saying “we reset” while keeping half the May articles around breaks the coherence of the publishedAt timestamps and damages trust from the outside.

The content of deleted articles is not lost. git show {commit}:{path} retrieves any past version. When we need raw material, we pull it from git history. There was no reason to keep files in the working tree.

Using Failure Disclosure as an E-E-A-T Strategy

The other decision was whether to disclose this failure externally.

A: Stay silent — Relaunch as if nothing happened. Readers likely wouldn’t notice (since indexing hadn’t started).

B: Publish it — Disclose the facts and procedures, with numbers. This article is the result.

We chose B. The reason: HouseFresh. That site published “How Google decimated us”—a first-hand account of how Google had destroyed them—and the article gathered citations from across the industry, actually raising their E-E-A-T rating on the Experience dimension.

Failure disclosure with specific numbers—“37 articles in one day, 76 internal links dead, 4 articles with HUMAN_INPUT markers”—is primary information only the owned media that actually experienced it can write. This is exactly the form AI search engines (ChatGPT Search, Perplexity, Google AI Overviews) want to cite.

The failure itself becomes an asset. Hiding it costs more.


Phase 5 — What We Systematized (The Investment Decision Behind the mcluhan Engine)

Retreat and publication alone don’t change the state—“we’ll make the same mistake next time.” We implemented the failures we found as Gates, so they can be detected mechanically.

We named the system bundling all of this mcluhan—a general-purpose operations engine for owned media, implemented as a layer over Astro and Vercel infrastructure. The name comes from Marshall McLuhan’s media theory: “The medium is the message.” The message that the medium itself conveys lives not in the surface articles, but in the system operating those articles—the pre-publish Gate, the reviewer SSOT, the publication scheduler, the accumulated retros. That was the design intent.

Why “Engine Design” Rather Than “Adding Tools”

We could have treated this failure as individual bug fixes. “Add a grep check for HUMAN_INPUT.” “Add a regex check for internal links.” Each would take an hour to implement.

But doing that means the next new failure pattern requires another individual fix. Don’t make the Gate a collection of scripts. Consolidate it in scripts/blog-gate.ts and design it to run on every npm run prebuild. Failures that machines can mechanically detect should be stopped by machines—we set this as policy and positioned the Gate as an investment target we continuously develop.

The Gate Configuration Built into mcluhan

G1 — Pre-publish HUMAN_INPUT grep scripts/blog-gate.ts scans all src/content/magazine/**/*.md and exits 1 to stop the build if HUMAN_INPUT is found in body text. The most basic check not being in place was the proximate cause of this failure.

G2 — Internal link /blog/ path check Any remaining /blog/... pattern fails. Prevents URL convention change residue from lingering when Astro’s URL rules change.

G3-G6 — Frontmatter SSOT coherence Tags must be keys existing in tagCatalog in src/config/tags.ts. Reviewers must be slugs from src/config/members.ts with canReview: true, and their reviewScope must cover the article’s track. Eliminates inconsistent labeling and “unclear who was responsible for the review.”

G7-G8 — scheduledAt and time-slot Articles with draft: false must have a scheduledAt, and that time must match one of the timeSlots (JST 09:00 / 18:00) in publish-policy.ts. Makes simultaneous bursting physically impossible.

W1-W3 — Rate-limit warnings Issues warnings if more than 4 articles are scheduled on a single day, more than 14 in a week, or more than 80 in a month. Doesn’t stop the build, but notifies editors that “the publication pace is exceeding the speed limit.”

AI-assisted footer on every article Every article’s footer mandates: “This article was drafted with AI assistance and reviewed by [name] on [date].” The fact of AI use is not hidden, and the reviewer is named explicitly.

Not More Gates — Gates That Mean Something

Adding more Gates is not the goal. What matters is whether the Gates cover all the failure patterns machines can reliably detect. The mcluhan design principle is not “keep finding new things to detect” but “keep converting detected failures into Gates.”

The three failures this time—HUMAN_INPUT exposure, dead internal links, tag inconsistencies—have all been converted to Gates. The next failure patterns that emerge will be converted the same way. The pipeline itself is the improvement target, and with every article we operate, the Gate grows stronger. That structure is what creates the sustainability of owned media in the AI mass-production era.


Technical Detail: The Implementation of the Emergency Retreat {#tech-detail}

After business judgment and operational design were settled, the implementation needed to move fast. The window was from when the Vercel deploy hook runs to Google’s next visit.

Step 1 — Set All 48 Articles to draft: true

Astro Content Collection excludes from the build any article where draft: true in the frontmatter. This lets us create a state where “physical files remain but generated HTML disappears.”

Manually rewriting 48 files one by one would take 30+ minutes and introduce mistakes. We wrote a Python script to batch-rewrite frontmatter.

import re
from pathlib import Path

root = Path("src/content/magazine")
files = sorted([p for p in root.rglob("*.md") if p.name != "_README.md"])

for p in files:
    text = p.read_text(encoding="utf-8")
    m = re.match(r"^(---\n)(.*?)(\n---\n?)(.*)$", text, re.DOTALL)
    if not m:
        continue
    open_tag, fm, close_tag, body = m.groups()
    new_fm = re.sub(r"^draft:\s*false\s*$", "draft: true", fm, flags=re.MULTILINE)
    if new_fm != fm:
        p.write_text(open_tag + new_fm + close_tag + body, encoding="utf-8")

The regex rewrites draft: false to draft: true. Other frontmatter fields are untouched. 48 files processed in under a second.

Step 2 — noindex in middleware.ts

Draft status alone wasn’t enough. Magazine top pages, category listings, tag listings stay in the build, and 404 pages also need noindex. We needed to control search engine behavior for all of magazine.yakumo.world from one place.

We added a Vercel Edge Middleware that checks the host and attaches X-Robots-Tag: noindex, nofollow headers to every response from magazine.yakumo.world. We also added a branch that returns dedicated content (Disallow: /) directly as a Response for magazine.yakumo.world/robots.txt.

const MAGAZINE_HOST = 'magazine.yakumo.world';

export default function middleware(request: Request): Response | undefined {
  const url = new URL(request.url);
  const host = (request.headers.get('host') ?? '').toLowerCase();

  if (host !== MAGAZINE_HOST) {
    return next(); // Pass through for the main yakumo.world
  }

  const { pathname } = url;
  const noindexHeaders = { 'X-Robots-Tag': 'noindex, nofollow' };

  // Magazine-specific robots.txt
  if (pathname === '/robots.txt') {
    return new Response('User-agent: *\nDisallow: /\n', {
      status: 200,
      headers: { 'Content-Type': 'text/plain' },
    });
  }

  // Attach noindex header to normal rewrites
  url.pathname = '/magazine' + pathname;
  return rewrite(url, { headers: noindexHeaders });
}

This ensures every response (including 404) under magazine.yakumo.world carries noindex, and robots.txt returns a subdomain-specific response. The main yakumo.world is unaffected.

Step 3 — Split Commits into Logical Units

Bundling all changes into one commit makes it impossible to understand what changed when reading history later. We split into three.

  1. docs(blog-ops/audit): record full quality audit of 48 magazine articles — Audit CSV and Markdown report (the basis for the decision)
  2. chore(magazine/content): unpublish all 48 articles for quality reset — Drafting all 48 (content change)
  3. feat(middleware/magazine): block search indexing of the magazine subdomain — Middleware noindex (infrastructure layer)

The dependency order: decision rationale → content change → infrastructure. Readable in any future history review.

Step 4 — Verification Commands After Vercel Deploy

Once the deploy completed, we verified noindex was working from four angles with curl.

# 1. Does magazine.yakumo.world's top page have the noindex header?
curl -sI https://magazine.yakumo.world/ | grep -i x-robots-tag

# 2. Does magazine.yakumo.world's robots.txt return Disallow: /?
curl -s https://magazine.yakumo.world/robots.txt

# 3. Does a specific article URL return 404 + noindex header?
curl -sI https://magazine.yakumo.world/2026-05-foo | grep -iE "HTTP|x-robots-tag"

# 4. Does the main yakumo.world not have noindex?
curl -sI https://yakumo.world/ | grep -i x-robots-tag

All four behaved as expected. The main yakumo.world was unaffected; the magazine subdomain was completely hidden from search engines.

From deploy completion to verification: 5 minutes. From the decision to pull to implementation complete: same day.


Summary — The Timeline

DateAction
2026-05-0848 articles published in a batch (37 of them on the same day)
2026-05-17Switched subdomain magazine.yakumo.world to production
2026-05-18 morningRan blog-audit 5-axis scoring; detected 11 failures and 4 articles with HUMAN_INPUT markers
2026-05-18 middayStrategic diagnosis (Google Scaled Content Abuse boundary check) + structural diagnosis (76 dead internal links, 55 isolated tags, 28 orphan articles)
2026-05-18 eveningConfirmed indexing not yet started in Search Console; decided on full retreat
2026-05-18 nightDrafted all 48 articles + implemented middleware noindex + 3 commits pushed to main + deployed + 4-point curl verification

Detection to retreat: 1 day. Implementation to production: 1 hour. Getting here fast was possible because of AI assistance. Half the reason we caught Google before it indexed us was that the subdomain happened to be new; the other half was having an operational design where the speed of judgment and implementation outpaced the clock.

And there is something worth noting: this article itself is the first production output from the new writing pipeline Yakumo is now building (blog-planblog-tech-write --from-briefblog-reviewblog-publishblog-schedule). The pipeline is being written about by the pipeline itself. Every session writes decision logs to _state.json, which become reference material for the next blog-plan. The chain of failure detection and systematization becomes the content of this publication.

Mass-producing with AI is already a given. What gets asked is the operational design: what you detect afterward, how you retreat, and how you systematize it.

And one more thing: continuously accumulating primary information becomes the most durable asset in the AI mass-production era.

AI cannot write information that isn’t in its training data. The numbers Yakumo has written here—48 / 5 / 11 / 76 / 37 / 79% / 23%—are primary information born from what we actually did in the last 10 days. No other company can reproduce this article with AI, because what comes out is “a page explaining AI mass production in general terms.” The specific failure numbers, the retreat procedure, the code fragments of the systematization, the rationale and tradeoffs of the decisions—these can only be written by someone who went through it.

The strategy for owned media is not to produce large volumes of text with AI. It is, in a world where everyone can do that, to have a system that continuously accumulates primary information that other organizations cannot write. Failure logs, operational retros, implementation histories—systematized as skills (mcluhan’s blog-retro is one example) so that each act of writing compounds into an asset.

What AI cites, and what competitors cannot replicate, is ultimately the depth of that primary information. Mass production is the baseline; retreat and systematization are operations; and accumulating primary information is the asset.