General
PromptBeginner5 minmarkdown
<h1 align="center">
<a href="https://prompts.chat">
163
AetherMind architecture from the research-agent plan — stack, build phases, router, guardrails, API/UI
Loading actions...
<a href="https://prompts.chat">
TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend linting, type safety, or ESLint configuration.
risks
Source of truth: .cursor/plans/aethermind_research_agent_plan_2dc943b3.plan.md — read it before starting or extending a phase.
backend/ (Python 3.12, uv, FastAPI, SQLAlchemy + Alembic, pytest), frontend/ (Next.js 15 App Router, Tailwind, shadcn/ui).docker-compose — API, frontend, Chroma, optional self-hosted Langfuse.llm_gateway + vram_router + embeddings_module → 3. schemas + db_layer → 4. tool_stubs → 5. langgraph_core + parallel_research + critic_loop → 6. guardrails + memory_service → 7. fastapi_endpoints → 8. frontend_* → 9. eval_harness → 10. observability + tests (stretch last).memory_writer.backend/app/agent/graph.py, state in state.py, prompts in agent/prompts/. Checkpointer: SqliteSaver (resume, time-travel, HITL).asyncio.gather.backend/app/llm/client.py; task-tagged routing in backend/app/llm/router.py — e.g. planner, synthesize, critic_inner, critic_final, pref_extract, source_summary, entailment, tool_format, eval_judge. Do not scatter provider/model calls outside client + router + env.LOCALVRAM_MAX_GB=8 — only small local models (Ollama 3B–7B Q4, bge-small / MiniLM / nomic-embed-text, bge-reranker-base, small cross-encoder/NLI). Heavier workloads → small API (e.g. gpt-4o-mini, Haiku) or skip. Use FORCE_API_FOR_HEAVY in CI / no-GPU dev.backend/app/embeddings/ — high volume; optional hosted override; never load >8GB local.BaseTool + JSON schema for function calling; return ToolResult { content, source } with registry-backed source IDs.web_search (Tavily, Brave fallback), arxiv_search, pdf_loader (pymupdf only; optional MinHash dedup before embed), fetch_url (httpx + readability), code_exec (E2B; local subprocess opt-in only).memory_preferences, memory_reports (persistent); scratch_sources (per-job dedup). Planner calls memory.recall; memory_writer persists structured + semantic updates. Pref extraction / summary-for-embed goes through the router.backend/app/eval/ — LLM-as-judge + Ragas-style metrics; default cheap judge via router; trace to Langfuse when enabled.POST /research, GET /research/{id}/stream (SSE), GET /reports/{id}, GET /reports/{id}/versions, POST /feedback, GET/POST /memory/preferences.app/page.tsx), report viewer (app/reports/[id]/page.tsx — trace, Markdown + citations, version diff, feedback), memory (app/memory/page.tsx). Use react-markdown + remark-gfm, diff-match-patch where needed..env.example (per-task MODEL_*, EMBEDDINGS_*, OLLAMA_*, API keys).