Deep Knowledge Extraction — Post-Production Intelligence Mining

Purpose

Extract MAXIMUM structured intelligence from ALL project source files — not just knowledge.json, but the raw production files that contain data the knowledge tagging step may have missed or compressed. This skill governs the TypeScript extraction module (knowledge-extractor.ts) that feeds the GatorSquare platform database and knowledge graph.

This is the quality backbone of the knowledge graph. It stays private — never exposed to users.

The production pipeline's Step 4.7 (knowledge-tagging.md) tells Claude what to extract INTO knowledge.json. This skill tells the platform how to INGEST from ALL sources and cross-reference them to catch everything.

When This Runs

During database seeding (seed script)
During project import/re-import
After any project update that touches source files
Retroactively on completed projects to backfill missing intelligence

The 8 Source Files

Every complete GatorSquare project has up to 8 source files. The extractor reads ALL of them:

#	File	Type	What It Contains	What It Adds to the Graph
1	`investigation.json`	JSON	Assembled panel data — narration, images, acts, metadata	Title, subtitle, panel count, art style, narration text
2	`metadata.json`	JSON	Project metadata — tags, scope, period, figures	Tags, systems_mapped, geographic_scope, key_figures
3	`knowledge.json`	JSON	Pre-extracted entities, causality, emotions (Step 4.7 output)	Entities (8 categories), causal links/chains/trees, cross-project links, themes, emotion arc, visual elements, panel logic
4	`prompts.json`	JSON	Image generation prompts — characters, environments per panel	Character names → entities.people, Environment names → entities.places
5	`script.md`	Markdown	Raw narration text per panel — dialogue, arguments, stage directions	Per-panel narration text, dialogue extraction
6	`scene-plan.md`	Markdown	Visual composition per panel — composition, mood, camera	Mood descriptions, composition details, camera angles, additional visual elements
7	`brief.md`	Markdown	Deep research — facts, dates, sources, historical context	Research summary (investigation hook text)
8	`plot.md`	Markdown	Narrative structure — act breakdowns, moment descriptions	Act structure with panel ranges and descriptions

Critical Gap: prompts.json

The prompts.json file contains characters[] and environments[] arrays per panel that are NOT captured by knowledge.json. These are structured entity data — named characters and specific locations — that the knowledge tagging step doesn't extract because it runs before prompt generation.

Example from prompts.json:

{
  "panel_id": "panel-05",
  "characters": ["John Maynard Keynes"],
  "environments": ["hotel desk", "private office"],
  "prompt_text": "Close-medium portrait of an older British economist..."
}

The extractor merges:

characters[] → entities.people (only proper-noun names, not generic "delegates")
environments[] → entities.places (filtered, no abstracts like "blueprint")

Structural Variants in knowledge.json

Two variants exist across projects:

Standard variant (8 of 9 projects)

knowledge.project_knowledge.entities → merged into entities
knowledge.project_knowledge.causal_chains → stored as causal_chains
knowledge.project_knowledge.themes → stored as themes
knowledge.project_knowledge.cross_project_links → stored as cross_project_links

Green-revolution variant (1 project)

knowledge.project_level.entities → merged into entities
knowledge.project_level.causal_chains → stored as causal_chains
knowledge.project_level.causal_trees → stored as causal_trees (branching cascades)
knowledge.project_level.cross_project_links → stored as cross_project_links
knowledge.project_level.emotion_arc → stored as emotion_arc
knowledge.project_level.dominant_emotions → merged into emotions

The extractor handles both variants and deduplicates across them.

Scene-Plan Format Variants

Three scene-plan.md formats exist across projects:

Format	Example	Mood Field
Standard	`Mood: Weight. Scale.`	`Mood:` (colon inside bold)
Opium variant	`Mood: Solemn. Decisive.`	`Mood:` (colon outside bold)
Banana-republic variant	`Palette: Deep amber dawn` + `Visual Metaphor: Democracy under bombardment`	Uses Palette and Visual Metaphor instead of Mood
Green-revolution variant	No Mood field — uses `Key elements:`	No mood extraction possible

The extractor handles all four. When no Mood field exists, the Visual Metaphor is used as mood fallback.

Deduplication Rules

Entities: Deduplicated by name within each category. If same name appears in both knowledge.json and prompts.json, the richer record (with role/context) wins.
Causal links: Deduplicated by from|relation|to composite key.
Causal chains: Deduplicated by chain.join("|") key.
Cross-project links: Deduplicated by project + type composite key.
Visual elements: Stored in a Set — automatic dedup.
Emotions: Stored in a Set — automatic dedup.
Tags: Merged from metadata.json and investigation.json with Set dedup.

Tag Categories — The Second Layer

Tag categories are derived from entity names, grouped by the 8 entity categories:

people: ["John Maynard Keynes", "Harry Dexter White"]
institutions: ["IMF", "World Bank", "WTO"]
concepts: ["reserve currency", "conditionality", "exorbitant privilege"]
mechanisms: ["structural adjustment", "dollar-gold peg"]
systems: ["Bretton Woods System"]
commodities: ["gold", "oil"]

This is displayed as color-coded pills on the project page — a quick visual index of what the investigation covers.

Quality Metrics

After extraction, the stats should show:

Metric	Minimum Expected	Red Flag If
Entities	100+ for a 25-panel project	< 50
Causal links	40+	< 20
Causal chains	3+	0
Cross-project links	3+	0
Themes	3+	0
Emotions	5+ unique	< 3
Emotion arc	Same as panel count	Mismatched
Visual elements	50+	< 20
Panel logic	Same as panel count	< 50% of panels
Narrations	Same as panel count	0 (if script.md exists)
Scene directions	Same as panel count	0 (if scene-plan.md exists)
Moods	10+ unique	0 (if scene-plan has Mood field)
Research summary	200+ chars	0 (if brief.md exists)

Architecture Note

The extraction module is TypeScript, not a Claude production skill. It runs at build/seed time, not during content production. The relationship is:

[Production Pipeline]
  Step 4.7: knowledge-tagging.md skill
    → Claude extracts → knowledge.json

[Platform Ingestion]
  deep-knowledge-extraction (this skill)
    → TypeScript module reads ALL 8 files
    → Cross-references, deduplicates, normalizes
    → Writes to SQLite database
    → Feeds knowledge graph + project pages

The production skill generates one file. The extraction module reads everything. Both are non-negotiable. Both stay private.

Deep Knowledge Extraction — Post-Production Intelligence Mining

Purpose

This is the quality backbone of the knowledge graph. It stays private — never exposed to users.

When This Runs

During database seeding (seed script)
During project import/re-import
After any project update that touches source files
Retroactively on completed projects to backfill missing intelligence

The 8 Source Files

Every complete GatorSquare project has up to 8 source files. The extractor reads ALL of them:

#	File	Type	What It Contains	What It Adds to the Graph
1	`investigation.json`	JSON	Assembled panel data — narration, images, acts, metadata	Title, subtitle, panel count, art style, narration text
2	`metadata.json`	JSON	Project metadata — tags, scope, period, figures	Tags, systems_mapped, geographic_scope, key_figures
3	`knowledge.json`	JSON	Pre-extracted entities, causality, emotions (Step 4.7 output)	Entities (8 categories), causal links/chains/trees, cross-project links, themes, emotion arc, visual elements, panel logic
4	`prompts.json`	JSON	Image generation prompts — characters, environments per panel	Character names → entities.people, Environment names → entities.places
5	`script.md`	Markdown	Raw narration text per panel — dialogue, arguments, stage directions	Per-panel narration text, dialogue extraction
6	`scene-plan.md`	Markdown	Visual composition per panel — composition, mood, camera	Mood descriptions, composition details, camera angles, additional visual elements
7	`brief.md`	Markdown	Deep research — facts, dates, sources, historical context	Research summary (investigation hook text)
8	`plot.md`	Markdown	Narrative structure — act breakdowns, moment descriptions	Act structure with panel ranges and descriptions

Critical Gap: prompts.json

Example from prompts.json:

{
  "panel_id": "panel-05",
  "characters": ["John Maynard Keynes"],
  "environments": ["hotel desk", "private office"],
  "prompt_text": "Close-medium portrait of an older British economist..."
}

The extractor merges:

characters[] → entities.people (only proper-noun names, not generic "delegates")
environments[] → entities.places (filtered, no abstracts like "blueprint")

Structural Variants in knowledge.json

Two variants exist across projects:

Standard variant (8 of 9 projects)

knowledge.project_knowledge.entities → merged into entities
knowledge.project_knowledge.causal_chains → stored as causal_chains
knowledge.project_knowledge.themes → stored as themes
knowledge.project_knowledge.cross_project_links → stored as cross_project_links

Green-revolution variant (1 project)

knowledge.project_level.entities → merged into entities
knowledge.project_level.causal_chains → stored as causal_chains
knowledge.project_level.causal_trees → stored as causal_trees (branching cascades)
knowledge.project_level.cross_project_links → stored as cross_project_links
knowledge.project_level.emotion_arc → stored as emotion_arc
knowledge.project_level.dominant_emotions → merged into emotions

The extractor handles both variants and deduplicates across them.

Scene-Plan Format Variants

Three scene-plan.md formats exist across projects:

Format	Example	Mood Field
Standard	`Mood: Weight. Scale.`	`Mood:` (colon inside bold)
Opium variant	`Mood: Solemn. Decisive.`	`Mood:` (colon outside bold)
Banana-republic variant	`Palette: Deep amber dawn` + `Visual Metaphor: Democracy under bombardment`	Uses Palette and Visual Metaphor instead of Mood
Green-revolution variant	No Mood field — uses `Key elements:`	No mood extraction possible

The extractor handles all four. When no Mood field exists, the Visual Metaphor is used as mood fallback.

Deduplication Rules

Entities: Deduplicated by name within each category. If same name appears in both knowledge.json and prompts.json, the richer record (with role/context) wins.
Causal links: Deduplicated by from|relation|to composite key.
Causal chains: Deduplicated by chain.join("|") key.
Cross-project links: Deduplicated by project + type composite key.
Visual elements: Stored in a Set — automatic dedup.
Emotions: Stored in a Set — automatic dedup.
Tags: Merged from metadata.json and investigation.json with Set dedup.

Tag Categories — The Second Layer

Tag categories are derived from entity names, grouped by the 8 entity categories:

people: ["John Maynard Keynes", "Harry Dexter White"]
institutions: ["IMF", "World Bank", "WTO"]
concepts: ["reserve currency", "conditionality", "exorbitant privilege"]
mechanisms: ["structural adjustment", "dollar-gold peg"]
systems: ["Bretton Woods System"]
commodities: ["gold", "oil"]

This is displayed as color-coded pills on the project page — a quick visual index of what the investigation covers.

Quality Metrics

After extraction, the stats should show:

Metric	Minimum Expected	Red Flag If
Entities	100+ for a 25-panel project	< 50
Causal links	40+	< 20
Causal chains	3+	0
Cross-project links	3+	0
Themes	3+	0
Emotions	5+ unique	< 3
Emotion arc	Same as panel count	Mismatched
Visual elements	50+	< 20
Panel logic	Same as panel count	< 50% of panels
Narrations	Same as panel count	0 (if script.md exists)
Scene directions	Same as panel count	0 (if scene-plan.md exists)
Moods	10+ unique	0 (if scene-plan has Mood field)
Research summary	200+ chars	0 (if brief.md exists)

Architecture Note

The extraction module is TypeScript, not a Claude production skill. It runs at build/seed time, not during content production. The relationship is:

[Production Pipeline]
  Step 4.7: knowledge-tagging.md skill
    → Claude extracts → knowledge.json

[Platform Ingestion]
  deep-knowledge-extraction (this skill)
    → TypeScript module reads ALL 8 files
    → Cross-references, deduplicates, normalizes
    → Writes to SQLite database
    → Feeds knowledge graph + project pages

The production skill generates one file. The extraction module reads everything. Both are non-negotiable. Both stay private.

Deep Knowledge Extraction — Post-Production Intelligence Mining

Additional Files (20)

Deep Knowledge Extraction — Post-Production Intelligence Mining

Purpose

When This Runs

The 8 Source Files

Critical Gap: prompts.json

Structural Variants in knowledge.json

Standard variant (8 of 9 projects)

Green-revolution variant (1 project)

Scene-Plan Format Variants

Deduplication Rules

Tag Categories — The Second Layer

Quality Metrics

Architecture Note

Related Skills

<h1 align="center">

Frontend Typescript Linting.mdc

2. Apply Deepthink Protocol (reason about dependencies

Additional Files (20)

Deep Knowledge Extraction — Post-Production Intelligence Mining

Purpose

When This Runs

The 8 Source Files

Critical Gap: prompts.json

Structural Variants in knowledge.json

Standard variant (8 of 9 projects)

Green-revolution variant (1 project)

Scene-Plan Format Variants

Deduplication Rules

Tag Categories — The Second Layer

Quality Metrics

Architecture Note