<h1 align="center">
<a href="https://prompts.chat">
Open-source tool that turns any topic into a structured knowledge graph stored in Apache AGE (PostgreSQL). The agent designs the ontology, discovers sources, crawls, chunks, loads, and parses — fully automated.
Loading actions...
<a href="https://prompts.chat">
TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend linting, type safety, or ESLint configuration.
risks
Open-source tool that turns any topic into a structured knowledge graph stored in Apache AGE (PostgreSQL). The agent designs the ontology, discovers sources, crawls, chunks, loads, and parses — fully automated.
src/build_kg/ Main Python package
config.py Configuration from .env
crawl.py Web crawler (Crawl4AI)
chunk.py Document chunker (Unstructured)
load.py Database loader (PostgreSQL)
parse.py Sync LLM parser
parse_batch.py Batch LLM parser (50% cheaper)
setup_graph.py Apache AGE graph initialization
verify.py Setup verification
domain.py Domain profile system
domains/ YAML domain profiles
db/ Docker Compose for PostgreSQL + AGE
docs/ Static HTML documentation
tests/ Test suite (pytest)
kg_builds/ Working directory for graph builds (gitignored)
make setup # Creates venv, installs deps, starts DB, inits graph
cp .env.example .env # Then set ANTHROPIC_API_KEY (or OPENAI_API_KEY)
make verify # Confirm everything works
Always activate the venv before Python commands: . venv/bin/activate && <command>
Required: DB_PASSWORD, ANTHROPIC_API_KEY (or OPENAI_API_KEY)
| Variable | Default | Purpose |
|---|---|---|
LLM_PROVIDER | anthropic | anthropic or openai |
ANTHROPIC_MODEL | claude-haiku-4-5-20251001 | Model for parsing |
OPENAI_MODEL | gpt-4o-mini | Model for parsing (if openai) |
AGE_GRAPH_NAME | knowledge_graph | Graph name in PostgreSQL |
DOMAIN | default | Domain profile |
DB_HOST | localhost | PostgreSQL host |
DB_PORT | 5432 | PostgreSQL port |
DB_NAME | buildkg | Database name |
DB_USER | buildkg | Database user |
make test # pytest tests/ -v
make lint # ruff check src/ tests/
ruff check --fix src/ tests/When the user asks to build a knowledge graph about a topic, follow this pipeline:
kubernetes_networkingmkdir -p kg_builds/$GRAPH_NAMEAGE_GRAPH_NAME=$GRAPH_NAME in .envpython -m build_kg.setup_graphDesign 3-6 node types (PascalCase) and 3-8 edge types (UPPER_SNAKE_CASE) for the topic. Save as kg_builds/$GRAPH_NAME/ontology.yaml:
description: "Ontology description"
nodes:
- label: "NodeType"
description: "What this represents"
properties:
name: "string"
category: "string"
edges:
- label: "RELATIONSHIP"
source: "SourceNode"
target: "TargetNode"
description: "What this means"
root_node: "PrimaryNodeType"
json_schema: |
{
"entities": [{"_label": "NodeType", "name": "...", "category": "..."}],
"relationships": [{"_label": "RELATIONSHIP", "_from_index": 0, "_to_index": 1}]
}
Then: python -m build_kg.setup_graph --ontology kg_builds/$GRAPH_NAME/ontology.yaml
Search the web for 5-15 authoritative sources. Create kg_builds/$GRAPH_NAME/manifest.json with source metadata (url, title, authority, priority tier P1/P2, crawl depth/pages).
For each source: build-kg-crawl --url "$URL" --output kg_builds/$GRAPH_NAME/crawled/$SOURCE_NAME --depth $DEPTH --pages $MAX_PAGES --delay $DELAY --format markdown
build-kg-chunk kg_builds/$GRAPH_NAME/crawled kg_builds/$GRAPH_NAME/chunks --strategy by_title --max-chars 1000
build-kg-load kg_builds/$GRAPH_NAME/chunks --manifest kg_builds/$GRAPH_NAME/manifest.json
build-kg-parse --ontology kg_builds/$GRAPH_NAME/ontology.yamlbuild-kg-parse-batch prepare --ontology kg_builds/$GRAPH_NAME/ontology.yaml --output kg_builds/$GRAPH_NAME/batch_requests.jsonl
build-kg-parse-batch submit kg_builds/$GRAPH_NAME/batch_requests.jsonl
build-kg-parse-batch status $BATCH_ID --watch
build-kg-parse-batch process $BATCH_ID --ontology kg_builds/$GRAPH_NAME/ontology.yaml
Query the graph with Cypher and present node/edge counts by type, example subgraphs, and cost estimate.
All commands require the venv to be activated.
| Command | Purpose |
|---|---|
build-kg-crawl | Crawl a URL (Crawl4AI) |
build-kg-chunk | Chunk documents (Unstructured) |
build-kg-load | Load chunks to PostgreSQL |
build-kg-parse | Parse fragments with LLM (sync) |
build-kg-parse-batch | Parse fragments with Batch API |
build-kg-setup | Initialize AGE graph schema |
build-kg-verify | Verify system readiness |
build-kg-domains | List available domain profiles |