Bulk Video Production with Claude Code + Remotion — Full montage Pipeline Architecture

This article is written for senior engineers, architects, and solo developers, and fully discloses the design of “montage”—a bulk video production pipeline combining Claude Code agents with Remotion. Business ROI, cost decisions, and management perspective are delegated to the sister pillar 「Management Decisions and ROI for AI-Driven Bulk Video Production」. Here we cover the three design elements of pipeline DAG, Block Schema, and Zod validation with concrete type definitions and code.

The backbone of montage is a 4-agent division: researcher → scriptwriter → composer → reviewer. Each agent inputs and outputs strictly typed JSON, and Zod validation acts as a quality gate for LLM output. Remotion’s renderMedia ultimately receives a type-safe VideoSpec and writes out an MP4.

3 key takeaways:

Agent division exists to clarify “ownership boundaries.” Single-agent approaches make fault isolation difficult, which is why we split into 4 agents.
The ContentBlock type (Block Schema) is the sole connection point between Remotion components and agent output.
Zod validation is the gate that determines “can the JSON generated by the LLM be passed to Remotion?” — when it fails, fallback rules return control to the upstream agent.

montage Overall Architecture — AI Video Generation Pipeline Design

Responsibility Boundaries of the 4 Agents (researcher / scriptwriter / composer / reviewer)

montage’s pipeline is declared as JSON in config/pipelines/long-video.json. Of the full 10 phases, the 4 key agent-driven steps are shown below.

researcher ─→ analyst ─→ scriptwriter ─→ composer ─→ implement(skill) ─→ render(skill)
                                                       ↑                      │
                                                  reviewer ←──────────────────┘
                                                  (NG/WARN → fallback)

Responsibility boundaries per agent:

Agent	Input Type	Output Type	Responsibility
researcher	ResearchInput	ResearchOutput	Data collection for a topic (IR / earnings / macro indicators)
analyst	ResearchOutput	AnalystOutput	Extract insights, storylines, and key numbers from collected data
scriptwriter	AnalystOutput	ScriptwriterOutput	Determine scene structure, narration, and subtitles
composer	ScriptwriterOutput	ComposerOutput	Design the Block Schema (video composition) for each scene
reviewer	ImplementerOutput	ReviewResult	Quality-check screenshots after rendering

In the actual long-video.json, there is an analyst step (fact analysis) between researcher and scriptwriter, and thumbnailer / copywriter / record steps follow the reviewer—totalling 10+ phases. This article focuses on the 4 core agents (researcher / scriptwriter / composer / reviewer) to explain the design intent.

DAG Design Philosophy — Why DAG Instead of Sequential

montage’s pipeline is designed as a DAG (Directed Acyclic Graph) rather than a purely sequential execution. The key property is that when the reviewer issues an NG verdict, it can fall back to the appropriate upstream agent based on the root cause.

"retry": {
  "rules": [
    { "cause": "scene-layout",   "backTo": "composer",     "description": "Scene composition / layout issue" },
    { "cause": "narration",      "backTo": "scriptwriter", "description": "Narration inconsistency" },
    { "cause": "duration",       "backTo": "implement",    "description": "Insufficient duration" },
    { "cause": "rendering-bug",  "backTo": "system-developer", "description": "Rendering bug (code fix required)" }
  ],
  "maxRetries": 3
}

Narration problems go back to scriptwriter; scene-layout problems go back to composer. Because this branching exists, expressing the pipeline declaratively as a DAG is more maintainable than flat if/else logic.

The reason to forgo sequential simplicity in favor of a DAG is that failure causes are diverse. Video quality NG issues come in three kinds: “script-level problems (scriptwriter),” “video composition problems (composer),” and “code bugs (system-developer),” each with a different fallback target. A single “retry and rerun” approach cannot apply cause-specific fixes.

Type Definitions for Node Inputs and Outputs

The type definitions for the pipeline are centralized in PIPELINE_STAGE_TYPES in src/shared/types/pipeline.ts.

// src/shared/types/pipeline.ts
export const PIPELINE_STAGE_TYPES = {
  researcher: {
    input: "ResearchInput",
    output: "ResearchOutput",
    outputFile: "input.json",
  },
  scriptwriter: {
    input: "AnalystOutput",
    output: "ScriptwriterOutput",
    outputFile: "scripted.json",
  },
  composer: {
    input: "ScriptwriterOutput",
    output: "ComposerOutput",
    outputFile: "composed.json",
  },
  implement: {
    input: "ComposerOutput",
    output: "ImplementerOutput",
    outputFile: "final.json",
  },
  reviewer: {
    input: "ImplementerOutput",
    output: "ReviewResult",
    outputFile: "review/quality-score.json",
  },
} as const;

Each stage’s output file is placed under output/{topic-id}/ (input.json → analysis.json → scripted.json → composed.json → final.json). An agent writes JSON to its own outputFile, and the next agent reads that file. The filesystem functions as a “typed queue.”

Block Schema — Connecting Remotion Components and Agents

What Is a Block — Abstracting the Video Unit in Claude Code Remotion Video Generation

montage’s video hierarchy is as follows:

Video → Chapter → Scene → Layer → Layout → Slot → ContentBlock

ContentBlock is the smallest video unit (atomic rendering unit). A single ContentBlock holds exactly one video responsibility: “render a bar chart,” “display text,” “render a map,” and so on.

A Slot is a subdivision of the screen defined by the Layout. For example, a side-by-side layout has a primary slot and a secondary slot, with one ContentBlock assigned to each slot.

A Layer is a collection of Slots with a z-index. The four types are background, content, overlay, and ui, stacked from background to foreground.

A Scene is a collection of Layers with narration and subtitles. One scene equals a few seconds to tens of seconds of video.

A Chapter is a collection of Scenes representing a thematic grouping. A single video consists of one or more Chapters.

This hierarchy allows the composer agent to structurally describe “what to show” at the Layer, Slot, and ContentBlock level, while Remotion components receive that structure and render it into video.

Block Schema Type Definitions (TypeScript + Zod)

The ContentBlock type definitions are centralized in src/video/types/video.ts:

// src/video/types/video.ts
export type ContentBlockType =
  // Charts
  | "bar-chart"
  | "stacked-bar"
  | "line-chart"
  | "area-chart"
  | "waterfall"
  | "pie-chart"
  // Data displays
  | "metric-card"
  | "number-highlight"
  | "ranking-list"
  | "data-table"
  // Structured content
  | "pros-cons-list"
  | "timeline"
  | "comparison"
  // Text
  | "text-block"
  | "quote-block"
  // Special
  | "title-card"
  | "outro-card"
  // ... 58 types total
  ;

export type ContentBlock = {
  id: string;
  type: ContentBlockType;
  data: Record<string, unknown>;
};

The data field holds type-specific data for each block type. For bar-chart, it would be { items: [{label, value}], unit, valueFormat }.

The shape of data for each block type is defined by Zod schemas in src/video/types/block-data.ts:

// src/video/types/block-data.ts (excerpt — actual implementation also has `showAxis`, `sorted`, and other fields)
export const BarChartDataSchema = z.object({
  title: z.string().optional(),
  unit: z.string().optional(),
  valueFormat: z.enum(["compact", "number", "jpy", "percent"]).optional(),
  items: z.array(
    z.object({
      label: z.string(),
      value: z.number(),
      previousValue: z.number().optional(),
      highlight: z.boolean().optional(),
      color: z.string().nullable().optional(),
      annotation: z.string().nullable().optional(),
    })
  ).min(1),
  variant: z.enum(["vertical", "horizontal"]).optional(),
  showGrid: z.boolean().optional(),
  referenceLines: z.array(ReferenceLineSchema).optional(),
});

This type definition serves both as “the spec for JSON generated by the composer agent” and “the data contract expected by Remotion components.”

Data Flow: Agent Output → Block Schema → Remotion Component

composer agent
  └─→ composed.json (VideoSpec format)
        └─→ implement (skill)
              ├─→ Zod validation (validateBlockData)
              ├─→ Duration calculation (narration char count → estimated seconds)
              └─→ final.json (ImplementerOutput)
                    └─→ renderMedia (Remotion)
                          └─→ {topic-id}.mp4

The composed.json written by composer conforms to the VideoSpec type. The implement skill reads this file, performs Zod validation and duration calculation, and generates final.json. renderMedia receives final.json and executes rendering.

Zod Validation — Quality Gate for Agent Output

Why Zod Validation Is Necessary for LLM Output

JSON generated by an LLM is “plausible” but “not necessarily correct.” In particular, LLMs tend to produce incorrect values for numeric precision, array element counts, and enum values.

In montage, validateBlockData validates the data of each ContentBlock against a Zod schema immediately before Remotion rendering:

// src/video/types/block-data.ts
export function validateBlockData(
  type: ContentBlockType,
  data: unknown
): { success: boolean; error?: string } {
  const schema = BLOCK_DATA_SCHEMAS[type];
  if (!schema) return { success: true };
  const result = schema.safeParse(data);
  if (result.success) return { success: true };
  return { success: false, error: result.error.message };
}

The schemas under validation are managed in a dictionary called BLOCK_DATA_SCHEMAS:

export const BLOCK_DATA_SCHEMAS: Partial<Record<ContentBlockType, z.ZodType>> = {
  "bar-chart": BarChartDataSchema,
  "line-chart": LineChartDataSchema,
  "area-chart": AreaChartDataSchema,
  "pie-chart": PieChartDataSchema,
  "stacked-bar": StackedBarDataSchema,
  // ... 60+ block types
};

Placing validation immediately before Remotion rendering prevents “type-mismatched JSON from reaching the renderer.”

Retry Design on Validation Failure

When Zod validation fails, the implement skill returns a ValidationResult:

// src/shared/pipeline/types.ts
export type ValidationError = {
  sceneId: string;
  field: string;
  message: string;
  severity: "error" | "warning";
};

export type ValidationResult = {
  valid: boolean;
  errors: ValidationError[];
};

When there are severity: "error" errors, the pipeline halts. According to the retry rules in the reviewer step of the pipeline config, control falls back to the upstream agent based on error classification (scene-layout / narration / duration / rendering-bug).

severity: "warning" errors are recorded in final.json but rendering continues. After the reviewer checks the screenshots, if the quality score falls below the threshold, the same fallback flow runs.

Zod Schema Design Patterns in Agent Division Bulk Video Production Design

Let’s review the design patterns seen in BarChartDataSchema.

Pattern 1: Explicit required vs. optional fields

const BarChartDataSchema = z.object({
  items: z.array(...).min(1),      // required, minimum 1 element
  title: z.string().optional(),    // optional
  valueFormat: z.enum([...]).optional(),  // optional (default applied when omitted)
});

LLMs often make the mistake of either “not omitting fields that should be optional” or “omitting required fields.” Reading the schema allows communicating to the LLM (via the composer agent’s prompt) that “items is required, title is optional.” The schema itself functions as a spec document for the agent.

Pattern 2: Restrict valid values with enums

valueFormat: z.enum(["compact", "number", "jpy", "percent"]).optional(),
variant: z.enum(["vertical", "horizontal"]).optional(),

Using z.enum to enumerate valid values prevents LLMs from generating semantically close but invalid values like "percentage" or "yen".

Pattern 3: Distinguishing nullable from optional

color: z.string().nullable().optional(),
annotation: z.string().nullable().optional(),

optional() means “the field itself may be absent,” while nullable() means “null values are allowed.” Using both allows undefined (no key) and null (explicit reset) simultaneously. This lets the Remotion component distinguish “no field → apply default” from “null → explicit disable.” This design decision is reflected in the actual implementation (BarChartDataSchema’s color: z.string().nullable().optional()).

Implementation Details of the 4 Agents

researcher — Topic Research and Information Gathering Design

researcher takes a ResearchInput (of the form { channelId, title, data }) and writes a ResearchOutput (input.json). Data sources are declared in profile JSON files under config/profiles/scrape/, switchable per channel. The equity profile for stock channels defines 5 sources with priority ordering: TDnet (timely disclosures), corporate IR pages, EDINET API, accumulated spreadsheets, and news articles.

researcher’s config is declared as follows:

{
  "phase": 1,
  "type": "agent",
  "name": "researcher",
  "inputType": "ResearchInput",
  "outputType": "ResearchOutput",
  "config": {
    "profiles": "config/profiles/scrape/",
    "source": "brands/{channelId}.research.researchers"
  },
  "output": "input.json",
  "intermediates": ["research-equity.json", "research-macro.json"],
  "shared": true
}

shared: true indicates that the same input.json can be referenced by multiple downstream agents. Both analyst and intermediate composer phases read the same file.

scriptwriter — Script Generation and Narrative Design

scriptwriter takes AnalystOutput (analysis.json) and writes ScriptwriterOutput (scripted.json).

// src/shared/pipeline/types.ts
/** Scriptwriter output: VideoSpec with narration + subtitle filled in. */
export type ScriptwriterOutput = VideoSpec;

ScriptwriterOutput is actually a VideoSpec. However, at this phase each Scene’s narration (narration text) and subtitle (subtitle text) are filled in, but the ContentBlock data inside layers may be empty or contain placeholder values. scriptwriter decides “what to say,” while composer decides “how to show it.” scriptwriter determines the script structure, leaving specific ContentBlock data as composer’s responsibility.

composer — Block Schema Generation and Remotion Connection

composer is the most complex decision-making agent in the montage pipeline. It takes ScriptwriterOutput (scripted.json), decides “which ContentBlock to place in which Slot” for each scene, and writes ComposerOutput (composed.json).

composer’s config includes references to blockSchemas and layoutOptions:

{
  "phase": 4,
  "type": "agent",
  "name": "composer",
  "config": {
    "template": "specs/channels/{channelId}/templates/{contentType}.json",
    "directive": "brands/{channelId}.composer",
    "blockSchemas": "specs/system/block-schemas.md",
    "layoutOptions": "specs/system/layout-options.md"
  },
  "validation": {
    "blockNames": "src/blocks/registry.ts",
    "dataSchemas": "specs/system/block-schemas.md"
  }
}

blockSchemas: "specs/system/block-schemas.md" is the Markdown read by the composer agent—a spec document describing block types and each block’s data field specification. It functions as the LLM’s spec sheet and forms double type-safety assurance together with the Zod schema.

validation.blockNames points to a registry of implemented blocks. If composer generates a non-existent block type, this validation catches it early.

reviewer — Quality Check and Fallback Design

reviewer takes ImplementerOutput (final.json) and {topic-id}.mp4, visually inspects screenshots, and makes a quality judgment.

reviewer writes a ReviewResult (review-result) as the verdict. On NG judgment, it falls back according to the retry rules.

reviewer’s distinctive feature is that the fallback target changes dynamically. The retry rules referenced earlier:

"rules": [
  { "cause": "scene-layout",       "backTo": "composer" },
  { "cause": "narration",          "backTo": "scriptwriter" },
  { "cause": "duration",           "backTo": "implement" },
  { "cause": "rendering-bug",      "backTo": "system-developer" },
  { "cause": "implementation-bug", "backTo": "system-developer" },
  { "cause": "brand-mismatch",     "backTo": "composer" }
]

backTo: "system-developer" is special, indicating a problem that requires fixing code rather than an agent. Rendering bugs in components cannot be resolved by re-running prompts, so they are judged as requiring engineer code fixes and delegated to a human.

reviewer’s postHook has the record-review skill registered, which records quality scores to reviews.jsonl and is designed to automatically trigger a learning-loop when a threshold is exceeded.

Quality score thresholds are defined in specs/quality/rubrics.json. Scores are numeric values 1–5, classified into 3 levels: OK: 3.5 (professional quality or above), WARN: 2.5 (passing but needs improvement), and NG: 0 (regeneration required). reviewer scores each block across 4 weighted-average criteria—“data readability (0.30), visual balance (0.25), animation quality (0.20), data accuracy (0.25)“—and when it falls below the threshold, retry rules are triggered.

Remotion Integration — Remotion AI Automation Pipeline

Type-Safe Connection Between Remotion Type Definitions and Agent Output

Remotion can type-annotate component props with TypeScript. In montage, the VideoSpec type serves as those props.

// src/video/types/video.ts (excerpt)
export type VideoSpec = {
  meta: {
    title: string;
    theme: string;
    format: FormatPreset;
    fps: number;
    width: number;
    height: number;
    estimatedDuration: number;
  };
  chapters: Chapter[];
  transitions: TransitionType[];
};

Remotion’s <Composition> component uses calculateMetadata to compute duration from VideoSpec and passes it to component. The entire path from agent (composer)-written composed.json → implement-processed final.json → renderMedia arguments is protected by TypeScript types.

This is the substance of “type-safe connection between agent output and Remotion components.” The connection point is consolidated into a single type, VideoSpec, and JSON that doesn’t match the type is rejected by validation before reaching the renderer.

renderMedia Call Design

render is a skill-type pipeline step executed after the implement skill:

{
  "phase": 5,
  "type": "skill",
  "name": "render",
  "inputType": "ImplementerOutput",
  "outputType": "MP4",
  "input": "final.json",
  "output": "{topic-id}.mp4"
}

The render skill executes the npx remotion render CLI. It passes the path to final.json via --props, and Remotion renders video starting from src/Root.tsx, generating output/{topic-id}/{topic-id}.mp4. For long-form videos, it renders in parallel by chapter (render-by-chapter.ts) and then concatenates.

# Example Remotion CLI command assembled by render-by-chapter.ts
npx remotion render \
  src/Root.tsx \
  <topic-id> \
  --output=output/<topic-id>/chapters/<chapter-id>.mp4 \
  --props='{"chapterIdFilter":"<chapter-id>"}' \
  --frames=0-<durationInFrames-1>

Post-Rendering Quality Check Flow

After the render skill generates the MP4, the reviewer agent inspects screenshots. The check flow:

render skill
  └─→ {topic-id}.mp4 (output)
        └─→ reviewer agent
              ├─→ Screenshot capture per scene
              ├─→ Judge against rubrics.json (quality evaluation rubric)
              └─→ ReviewResult (quality-score.json)
                    ├─→ OK: proceed to thumbnailer / copywriter / record
                    └─→ NG: fall back per retry rules (max 3 times)

reviewer’s postHook calls the record-review skill, appending quality-score.json to reviews.jsonl. This is designed to serve as input for a future learning-loop (quality improvement feedback).

Design Decision Log — Why This Structure

Why 4-Agent Division (Why Not a Single Agent)

Instructing a single agent to “do everything from research to video composition” is technically possible. But there are 3 problems.

Problem 1: Context pollution

The amount of information needed to produce a single video (earnings data, macro indicators, channel settings, block specs, narration, subtitles) is enormous. Cramming everything into one agent’s context increases the risk that the model skips important constraints (e.g., “this block’s items needs at least 1 element”).

Problem 2: Fault tracing is difficult

“Narration is inconsistent,” “bar chart numbers are wrong,” “layout is broken”—these 3 types of problems have different root causes. With division of labor, you can pinpoint “is it a scriptwriter-phase problem or a composer-phase problem?” A single agent makes it impossible to trace where output problems originate.

Problem 3: Doesn’t scale

When generating multiple videos in parallel, you can design researcher to run ahead for multiple channels and then launch scriptwriters in parallel. The existence of batch and batch-research entry points is for this reason. A single agent cannot parallelize at the process-step level.

Why Remotion Was Chosen and Comparison with Alternatives

There are 2 main reasons for choosing Remotion.

Reason 1: Video can be described in TypeScript

Describing video as React components allows the JSON generated by agents (VideoSpec) to be passed as props to components in a type-safe manner. With an ffmpeg + template string approach, there is no type guarantee for the connection between agent output and video.

Reason 2: Component reuse and extension

60+ ContentBlock types are individually implemented as Remotion components, making them reusable. src/video/blocks/ is divided into 5 categories—charts/, data/, structured/, text/, and special/—with charts alone including 25 implementations such as BarChartBlock, LineChartBlock, AreaChartBlock, StackedBarBlock, and WaterfallBlock. Adding a new block type follows a consistent procedure: add a Zod schema to block-data.ts, add a component to src/video/blocks/, and register it in BLOCK_DATA_SCHEMAS.

One reason for choosing Remotion is that “video structure and TypeScript types align.” With an approach that calls ffmpeg directly, video layout is specified as parameter strings, leaving no type guarantee between agent output and video. Remotion’s approach of passing React component props makes VideoSpec protected as a TypeScript type.

Block Schema Design Decisions

The most challenging question in Block Schema design was “how strictly to enforce types.”

Using data: Record<string, unknown> for everything is flexible, but you lose the ability to determine whether the LLM generated JSON in the correct shape.

Conversely, marking all fields as required risks the LLM generating unnatural values for fields that should be optional (e.g., annotation: "" as an empty string) in an attempt to fill them all in.

Yakumo’s chosen design is “strict for required fields, optional + enum-restricted for optional fields.” This increases the probability that JSON generated by the LLM passes validation, while preventing over-filling (filling fields that aren’t needed).

Additionally, distinguishing nullable() from optional() allows the Remotion component side to differentiate “no field → apply default value” from “null → explicit invalidation.” This design decision is reflected in the actual implementation as color: z.string().nullable().optional() in BarChartDataSchema.

Summary — montage Pipeline Implementation Checklist

montage’s design was organized from 3 perspectives.

Pipeline DAG: The 4-agent division researcher → scriptwriter → composer → reviewer is declared in JSON, and reviewer fallbacks are managed with cause classification. The reasons for not using a single agent are the 3 points: “context pollution, difficult fault tracing, and inability to parallelize.”

Block Schema: The hierarchy Video → Chapter → Scene → Layer → Slot → ContentBlock serves as the connection point between agents and Remotion components. The ContentBlock type definition (ContentBlockType + Zod schema) guarantees type safety for this connection.

Zod validation: validateBlockData validates LLM output immediately before Remotion rendering. Validation failures return control to the upstream agent based on cause via fallback rules.

montage Pipeline Implementation Checklist

Declare pipeline DAG in config/pipelines/*.json
Define input/output types and filenames for each stage in PIPELINE_STAGE_TYPES
Add new block types to ContentBlockType
Add Remotion components to src/video/blocks/
Add Zod schemas to block-data.ts and register in BLOCK_DATA_SCHEMAS
Define cause classification and fallback targets in reviewer retry rules
Update specs/system/block-schemas.md to keep the composer agent’s spec doc current
Define quality evaluation criteria in reviewers’ rubrics.json
Verify in tests that the implement skill correctly returns ValidationResult

For the business decision, ROI, and operational cost perspective on bulk video production, see 「Management Decisions and ROI for AI-Driven Bulk Video Production」.