AGENTS.md

Guidance for AI agents working on this TypeScript codebase.

Views0
PublishedFeb 1, 2026

Loading actions...

5 minBeginnerpromptSingle file

Skill content

Main instructions and any bundled files for this skill.

markdown

AGENTS.md

Guidance for AI agents working on this TypeScript codebase.

Project Overview

Promptfoo is an open-source framework for evaluating and testing LLM applications.

Project Structure

DirectoryPurposeLocal Docs
.agents/Codex metadata and repo skills.agents/AGENTS.md
.github/GitHub Actions and workflows.github/AGENTS.md
code-scan-action/Code scan GitHub Action wrappercode-scan-action/AGENTS.md
docs/agents/Reusable coding-agent docsdocs/agents/AGENTS.md
plugins/Agent plugin bundlesplugins/AGENTS.md
src/Core library-
src/app/Web UI (React 19/Vite/MUI v7)src/app/AGENTS.md
src/assertions/Assertion handlerssrc/assertions/AGENTS.md
src/codeScan/Code scan scannersrc/codeScan/AGENTS.md
src/commands/CLI commandssrc/commands/AGENTS.md
src/contracts/Public package contractssrc/contracts/AGENTS.md
src/database/SQLite/libSQL persistencesrc/database/AGENTS.md
src/matchers/Assertion matcher helperssrc/matchers/AGENTS.md
src/models/Eval/result persistence modelssrc/models/AGENTS.md
src/prompts/Prompt loading & processorssrc/prompts/AGENTS.md
src/providers/LLM providerssrc/providers/AGENTS.md
src/redteam/Security testingsrc/redteam/AGENTS.md
src/scheduler/Concurrency & rate limitssrc/scheduler/AGENTS.md
src/server/Backend serversrc/server/AGENTS.md
src/tracing/OpenTelemetry trace storagesrc/tracing/AGENTS.md
src/types/Config/API types & Zod schemassrc/types/AGENTS.md
src/util/Shared utilitiessrc/util/AGENTS.md
src/validators/Config validation schemassrc/validators/AGENTS.md
test/Tests (Vitest)test/AGENTS.md
site/Docs site (Docusaurus)site/AGENTS.md
examples/Example configsexamples/AGENTS.md
drizzle/DB migrationsdrizzle/AGENTS.md

Read the relevant AGENTS.md when working in that directory.

Build Commands

# Core commands
npm run build              # Build the project
npm run build:clean        # Clean the dist directory
npm run build:watch        # Watch and rebuild TypeScript files
npm test                   # Run all tests
npm run tsc                # Run TypeScript compiler

# Linting & Formatting
npm run lint               # Run Biome linter (alias for lint:src)
npm run lint:src           # Lint src directory
npm run lint:tests         # Lint test directory
npm run lint:site          # Lint site directory
npm run format             # Format all files (Biome + Prettier)
npm run format:check       # Check formatting without changes
npm run l                  # Lint only changed files
npm run f                  # Format only changed files

# Testing
npm run test:watch         # Run tests in watch mode
npm run test:integration   # Run integration tests
npm run test:redteam:integration  # Run red team integration tests
npm run test:app -- src/pages/path/to/test.test.tsx --run  # Run a specific frontend test file from repo root
npx vitest path/to/test    # Run a specific backend test file

# Development
npm run dev                # Start both server and app
npm run dev:app            # Start only frontend (localhost:3000)
npm run dev:server         # Start only server/API (localhost:15500)
npm run local -- eval      # Test with local build

# Database
npm run db:generate        # Generate Drizzle migrations
npm run db:migrate         # Run database migrations
npm run db:studio          # Open Drizzle studio

# Other
npm run jsonSchema:generate  # Generate JSON schema for config
npm run citation:generate    # Generate citation file

Testing in Development

When testing changes, use the local build:

npm run local -- eval -c path/to/config.yaml

Important: Always use -- before flags with npm run local:

npm run local -- eval --max-concurrency 1  # Correct
npm run local eval --max-concurrency 1     # Wrong - flags go to npm

Don't run npm run local -- view unless explicitly asked. Assume the user already has npm run dev running. The view command serves static production builds without hot reload.

When starting npm run dev, keep it attached in a live terminal session; backgrounding with &/nohup can exit silently in agent shells. The expected local URLs are http://localhost:3000/ for the Web UI and http://localhost:15500 for the server/API. Do not assume Vite's default 5173; confirm the actual ports from startup output or with lsof -nP -iTCP:3000 -iTCP:15500 -sTCP:LISTEN.

Using Environment Variables

The repository includes a .env file for API keys. To use it:

# Use --env-file flag to load environment variables
npm run local -- eval -c config.yaml --env-file .env

# Or set specific variables inline
OPENAI_API_KEY=sk-... npm run local -- eval -c config.yaml

# Disable remote generation for testing
PROMPTFOO_DISABLE_REMOTE_GENERATION=true npm run local -- eval -c config.yaml

Never commit the .env file or expose API keys in code or commit messages.

Running Evaluations

Always run from the repository root, not from subdirectories.

Always use --no-cache during development to ensure fresh results:

npm run local -- eval -c examples/my-example/promptfooconfig.yaml --no-cache

Export and inspect results to verify pass/fail/errors:

npm run local -- eval -c path/to/config.yaml -o output.json --no-cache

Add --env-file .env or another explicit env file only when the eval needs local secrets and the file exists.

Review the output file for success, score, and error fields. With the default pass-rate threshold, exit code 0 means the eval met the threshold; still inspect the JSON for per-test failures, errors, and scores, especially when the threshold has been lowered. This is the standard command for verifying a PR end-to-end.

Keep local secrets in the repo's gitignored .env (or another path the user points at with --env-file); never echo them into logs or commit messages.

End-to-End Work Expectations

When asked only to review or audit a PR, keep the work read-only: inspect the branch, diff, PR comments, and CI as needed; run non-mutating tests or QA when useful; and report findings without committing, pushing, or changing files unless the user explicitly asks for fixes.

When asked to fix, improve, or land a PR, own the full loop: check out the branch, inspect the diff and PR comments, merge or rebase on current origin/main when requested, run focused tests, run the relevant real workflow, commit, push, and watch CI until it is green or the remaining failure is clearly unrelated.

Standing commit/push authorization on feature branches. When the user has asked you to fix, improve, or land work on a non-main branch, you have durable authorization to git commit and git push to that branch's tracking remote without per-step confirmation. Do not pause to ask "want me to commit?" — committing and pushing is part of the requested work. The safety constraints in Git Workflow (CRITICAL) below (no commits to main, no --force without approval, no --no-verify, etc.) still apply.

After landing a PR, watch main until its CI is green. Merging is not the end of the loop. The squash commit kicks off a fresh CI run on main that can fail for reasons the PR's own checks never surfaced — a base that advanced, a flaky job, or a real regression from the merge. After you merge, follow that main run to completion and do not leave main red:

  • Classify before reacting. Read the failing job's logs and decide whether it is a flake or a real failure. A failure is a flake when the tests themselves pass and the job dies on infrastructure noise — e.g. a vitest Worker exited unexpectedly / Timeout terminating forks worker printed after Test Files N passed, a cache-cleanup step (Post Use Node …), or a transient Install Dependencies error. The signature of a flake is non-determinism: different jobs/files fail across consecutive commits. A real failure is deterministic and attributable to the change — the same test, build, or type error fails on re-run.
  • Flake → re-run. Re-run only the failed jobs (gh run rerun <run-id> --failed) and confirm the run goes green. Do not blame the just-merged change for a flake it cannot have caused (e.g. a config-only diff breaking a test worker).
  • Real regression → fix it. Open a follow-up PR (never commit to main directly). If the regression is yours and main is broken for everyone, prefer reverting the merge to get main green quickly, then re-land with the fix.
  • Recurring class of flake → fix the flake itself. If the same failure mode keeps reddening main across unrelated PRs, treat the flake as the bug: open a separate PR that fixes it at the source (a leaked handle keeping a test worker alive, an over-tight timeout, a fragile setup step) instead of re-running indefinitely.

Confirm the final main state is green, or that the only remaining failure is a pre-existing, clearly-unrelated issue you have explicitly flagged.

For behavior changes, do not stop at unit tests. Run the actual CLI or example with the local build. For eval and redteam work, prefer:

npm run local -- eval -c path/to/promptfooconfig.yaml --no-cache -o output.json

Add --env-file .env only when the eval needs local credentials and the file exists.

Inspect exported JSON for success, score, error, provider outputs, traces, and redteam findings. If you claim a redteam ran, report the plugins, strategies, interesting failures, and the evidence reviewed.

Debugging & Troubleshooting

Before running tests or review checks, align Node with the repo version first:

nvm use

If you're using npm rather than pnpm/yarn, match the repo's npm major before treating install behavior as authoritative:

npm install -g npm@11

If Node-based tools fail with ERR_MODULE_NOT_FOUND or similar missing-package errors in a fresh worktree, run npm ci before treating the environment as blocked.

Verbose logging:

npm run local -- eval -c config.yaml --verbose
# Or set environment variable
LOG_LEVEL=debug npm run local -- eval -c config.yaml

Disable cache (results may be cached during development):

npm run local -- eval -c config.yaml --no-cache

View results in web UI: First check if the Web UI is running on port 3000, then ask user before starting. Use npm run dev for localhost:3000.

Cache: Located at ~/.promptfoo/cache by default, unless overridden with PROMPTFOO_CACHE_PATH or PROMPTFOO_CONFIG_DIR. NEVER delete or clear the cache without explicit permission. Use --no-cache flag instead.

Database: Located at ~/.promptfoo/promptfoo.db (SQLite). You may read from it but NEVER delete it.

Git Workflow (CRITICAL)

  • NEVER commit/push directly to main
  • NEVER use --force without explicit approval
  • NEVER comment on GitHub issues - only create PRs to address them
  • ALWAYS create new commits - never amend, squash, or rebase unless explicitly asked
  • All changes go through pull requests

Standard workflow:

git checkout main &#x26;&#x26; git pull origin main   # Always start fresh
git checkout -b feature/your-branch-name    # New branch for changes
# Make changes...
git add &#x3C;specific-files>                    # Never blindly add everything
npm run l &#x26;&#x26; npm run f                      # Lint and format before commit/push
git commit -m "type(scope): description"    # Conventional commit format
git fetch origin main &#x26;&#x26; git merge origin/main  # Sync with main
git push -u origin feature/your-branch-name # Push branch

Conventional commit types: feat, fix, chore, docs, test, refactor, ci, perf

See docs/agents/git-workflow.md for full workflow. See docs/agents/pr-conventions.md for PR title format and scope selection (especially THE REDTEAM RULE).

Pull Request Creation

  • Default to full (non-draft) PRs. Omit --draft from gh pr create unless the user explicitly asks for a draft, or the PR is for an unpublished security advisory (see "Security-Sensitive PRs" below). docs/agents/pr-conventions.md lists the full set of draft exceptions.
  • Never attribute commits or PR bodies to Claude / Claude Code. No Co-Authored-By: Claude… trailers, no "Generated with Claude Code" footers. Use your configured git identity only.
  • Update the existing PR instead of opening a new one when iterating on a branch that already has an open PR. Push to the same branch. Only run gh pr create if the user explicitly asks for a new PR or the existing PR is closed.
  • Don't let npm audit fix drift ride along with an unrelated change. If package-lock.json changes outside the scope of the PR, revert the drift and ship it separately so reviewers can reason about each change independently.

Security-Sensitive PRs

  • Before opening any public PR for a CVE/GHSA: confirm the advisory has been published and the coordinated-disclosure embargo has lifted. See SECURITY.md for the disclosure policy. If the advisory is still private, use the GHSA private collaboration flow (or a temporary private fork) until the release that contains the fix is cut.
  • Do not put the CVE/GHSA identifier, exploit description, or vulnerable-version range in a PR title, body, or branch name before disclosure.
  • Every security fix should land with a regression test that exercises the original attack vector.

Screenshots for Pull Requests

GitHub has no official API for uploading images to PR descriptions. When asked to add screenshots to a PR:

  1. Take the screenshot using browser tools or other methods
  2. Upload to freeimage.host (no API key required):
curl -s -X POST \
  -F "source=@/path/to/screenshot.png" \
  -F "type=file" \
  -F "action=upload" \
  "https://freeimage.host/api/1/upload?key=6d207e02198a847aa98d0a2a901485a5" \
  | jq -r '.image.url'
  1. Update the PR body with the returned URL:
gh pr edit &#x3C;PR_NUMBER> --body "$(cat &#x3C;&#x3C;'EOF'
## Summary
...
## Screenshot
![Screenshot](https://iili.io/XXXXXXX.png)
...
EOF
)"

Do NOT:

  • Commit screenshots to the branch
  • Upload to GitHub release assets
  • Use GitHub's internal upload endpoints (require browser cookies, not PATs)

Code Style Guidelines

  • Use TypeScript with strict type checking
  • Keep tracked root-owned TypeScript files under the root compiler project unless they belong to an explicitly separate project such as src/app/ or test/code-scan-action/.
  • Follow consistent import order (Biome handles sorting)
  • Use consistent curly braces for all control statements
  • Prefer const over let; avoid var
  • Use object shorthand syntax whenever possible
  • Use async/await for asynchronous code
  • Use Vitest for all tests (both test/ and src/app/)
  • Use consistent error handling with proper type checks
  • Avoid re-exporting from files; import directly from the source module

Before committing: npm run l && npm run f

Pre-commit hook: A pre-commit hook is installed automatically on npm install and runs Biome and Prettier on staged files.

Logging

Use the logger with object context (auto-sanitized):

logger.debug('[Component] Message', { headers, body, config });

See docs/agents/logging.md for details on sanitization patterns.

Testing

  • Vitest is the test framework for all tests
  • Frontend tests (src/app/): Vitest with explicit imports
  • Backend tests (test/): Vitest with globals enabled (describe, it, expect available without imports)

See test/AGENTS.md for testing patterns.

Project Conventions

  • ESM modules (type: "module" in package.json)
  • Node.js ^20.20.0 || >=22.22.0 - Before npm/vite/vitest, run source ~/.nvm/nvm.sh && nvm use so node -v matches .nvmrc. If you're using npm, upgrade to npm@11 so the repo's release-age policy is applied consistently. .npmrc sets engine-strict=true
  • Alternative package managers (pnpm, yarn) are supported
  • File structure: core logic in src/, tests in test/
  • Examples belong in examples/ with clear README.md
  • Drizzle ORM for database operations
  • Workspaces include src/app and site directories
  • Don't edit CHANGELOG.md - it's auto-generated

Before Writing Code

  • Search for existing implementations before creating new code
  • Check for existing utilities in src/util/ before adding helpers
  • Don't add dependencies without checking if functionality exists in current deps
  • Reuse patterns from similar files in the codebase
  • Test both success and error cases for all functionality
  • Document provider configurations following examples in existing code

Adversarial and Redteam Bias

For security, model scanning, redteam, and coding-agent work, test like an attacker first. Look for false negatives, bypasses, hidden payloads, unsafe tool use, prompt injection, exfiltration, cache misuse, and evidence gaps. When a bypass is found, add a focused regression test before or alongside the fix.

For demo/example apps used to show red teaming, do not harden away all interesting findings unless explicitly asked. A slightly vulnerable sample app is useful when the goal is to demonstrate Promptfoo's ability to find real breaks.

Review Guidelines

  • Prioritize security regressions first, especially injection risks, unsafe handling of user-controlled or adversarial content, credential exposure, SSRF, path traversal, unsafe deserialization, and authorization mistakes.
  • Then prioritize correctness issues that can break behavior, public APIs, data integrity, concurrency, or error handling.
  • Treat missing or ineffective tests as a P1 issue when a change adds security-sensitive behavior, changes public behavior, or fixes a bug without meaningful coverage.
  • Focus on the code changed by the pull request. Do not flag pre-existing issues outside the touched diff unless the pull request materially worsens them.
  • Avoid repeating findings that were already raised in the current pull request unless the new diff reintroduces them or leaves the same risk in newly changed code.
  • Verify findings on the current branch tip after syncing with the latest main.
  • Treat existing PR comments and bot reviews as hints; confirm they still apply before reporting them. If CI is failing, inspect the failing job logs and separate unrelated base-branch failures from PR regressions.
  • Ignore formatting, import ordering, naming, and other style-only issues already enforced by CI or repository tooling.
  • If a pull request is primarily about redteam functionality, verify the title follows THE REDTEAM RULE in docs/agents/pr-conventions.md and uses (redteam) scope. Incidental src/redteam/ touches in broad maintenance PRs do not require (redteam) scope.

Documentation Testing

When testing doc changes, speed up builds by skipping OG image generation:

cd site
SKIP_OG_GENERATION=true npm run build

See site/AGENTS.md for documentation guidelines.

Additional Documentation

Read these when relevant to your task:

DocumentWhen to Read
docs/agents/pr-conventions.mdCreating pull requests
docs/agents/git-workflow.mdGit operations
docs/agents/dependency-management.mdUpdating packages
docs/agents/logging.mdAdding logging to code
docs/agents/python.mdPython providers/scripts
docs/agents/database-security.mdWriting database queries
src/app/AGENTS.mdFrontend React development
src/providers/AGENTS.mdAdding/modifying LLM providers
test/AGENTS.mdWriting tests
site/AGENTS.mdDocumentation site changes
.github/AGENTS.mdGitHub Actions / release workflow changes

Prompt Playground

1 Variable

Fill Variables

Preview

# AGENTS.md

Guidance for AI agents working on this TypeScript codebase.

## Project Overview

Promptfoo is an open-source framework for evaluating and testing LLM applications.

## Project Structure

| Directory           | Purpose                         | Local Docs                   |
| ------------------- | ------------------------------- | ---------------------------- |
| `.agents/`          | Codex metadata and repo skills  | `.agents/AGENTS.md`          |
| `.github/`          | GitHub Actions and workflows    | `.github/AGENTS.md`          |
| `code-scan-action/` | Code scan GitHub Action wrapper | `code-scan-action/AGENTS.md` |
| `docs/agents/`      | Reusable coding-agent docs      | `docs/agents/AGENTS.md`      |
| `plugins/`          | Agent plugin bundles            | `plugins/AGENTS.md`          |
| `src/`              | Core library                    | -                            |
| `src/app/`          | Web UI (React 19/Vite/MUI v7)   | `src/app/AGENTS.md`          |
| `src/assertions/`   | Assertion handlers              | `src/assertions/AGENTS.md`   |
| `src/codeScan/`     | Code scan scanner               | `src/codeScan/AGENTS.md`     |
| `src/commands/`     | CLI commands                    | `src/commands/AGENTS.md`     |
| `src/contracts/`    | Public package contracts        | `src/contracts/AGENTS.md`    |
| `src/database/`     | SQLite/libSQL persistence       | `src/database/AGENTS.md`     |
| `src/matchers/`     | Assertion matcher helpers       | `src/matchers/AGENTS.md`     |
| `src/models/`       | Eval/result persistence models  | `src/models/AGENTS.md`       |
| `src/prompts/`      | Prompt loading & processors     | `src/prompts/AGENTS.md`      |
| `src/providers/`    | LLM providers                   | `src/providers/AGENTS.md`    |
| `src/redteam/`      | Security testing                | `src/redteam/AGENTS.md`      |
| `src/scheduler/`    | Concurrency & rate limits       | `src/scheduler/AGENTS.md`    |
| `src/server/`       | Backend server                  | `src/server/AGENTS.md`       |
| `src/tracing/`      | OpenTelemetry trace storage     | `src/tracing/AGENTS.md`      |
| `src/types/`        | Config/API types & Zod schemas  | `src/types/AGENTS.md`        |
| `src/util/`         | Shared utilities                | `src/util/AGENTS.md`         |
| `src/validators/`   | Config validation schemas       | `src/validators/AGENTS.md`   |
| `test/`             | Tests (Vitest)                  | `test/AGENTS.md`             |
| `site/`             | Docs site (Docusaurus)          | `site/AGENTS.md`             |
| `examples/`         | Example configs                 | `examples/AGENTS.md`         |
| `drizzle/`          | DB migrations                   | `drizzle/AGENTS.md`          |

**Read the relevant AGENTS.md when working in that directory.**

## Build Commands

```bash
# Core commands
npm run build              # Build the project
npm run build:clean        # Clean the dist directory
npm run build:watch        # Watch and rebuild TypeScript files
npm test                   # Run all tests
npm run tsc                # Run TypeScript compiler

# Linting & Formatting
npm run lint               # Run Biome linter (alias for lint:src)
npm run lint:src           # Lint src directory
npm run lint:tests         # Lint test directory
npm run lint:site          # Lint site directory
npm run format             # Format all files (Biome + Prettier)
npm run format:check       # Check formatting without changes
npm run l                  # Lint only changed files
npm run f                  # Format only changed files

# Testing
npm run test:watch         # Run tests in watch mode
npm run test:integration   # Run integration tests
npm run test:redteam:integration  # Run red team integration tests
npm run test:app -- src/pages/path/to/test.test.tsx --run  # Run a specific frontend test file from repo root
npx vitest path/to/test    # Run a specific backend test file

# Development
npm run dev                # Start both server and app
npm run dev:app            # Start only frontend (localhost:3000)
npm run dev:server         # Start only server/API (localhost:15500)
npm run local -- eval      # Test with local build

# Database
npm run db:generate        # Generate Drizzle migrations
npm run db:migrate         # Run database migrations
npm run db:studio          # Open Drizzle studio

# Other
npm run jsonSchema:generate  # Generate JSON schema for config
npm run citation:generate    # Generate citation file
```

## Testing in Development

When testing changes, use the local build:

```bash
npm run local -- eval -c path/to/config.yaml
```

**Important:** Always use `--` before flags with `npm run local`:

```bash
npm run local -- eval --max-concurrency 1  # Correct
npm run local eval --max-concurrency 1     # Wrong - flags go to npm
```

**Don't run `npm run local -- view`** unless explicitly asked. Assume the user already has `npm run dev` running. The `view` command serves static production builds without hot reload.

When starting `npm run dev`, keep it attached in a live terminal session; backgrounding with `&`/`nohup` can exit silently in agent shells. The expected local URLs are `http://localhost:3000/` for the Web UI and `http://localhost:15500` for the server/API. Do not assume Vite's default `5173`; confirm the actual ports from startup output or with `lsof -nP -iTCP:3000 -iTCP:15500 -sTCP:LISTEN`.

### Using Environment Variables

The repository includes a `.env` file for API keys. To use it:

```bash
# Use --env-file flag to load environment variables
npm run local -- eval -c config.yaml --env-file .env

# Or set specific variables inline
OPENAI_API_KEY=sk-... npm run local -- eval -c config.yaml

# Disable remote generation for testing
PROMPTFOO_DISABLE_REMOTE_GENERATION=true npm run local -- eval -c config.yaml
```

**Never commit the `.env` file or expose API keys in code or commit messages.**

## Running Evaluations

**Always run from the repository root**, not from subdirectories.

**Always use `--no-cache` during development** to ensure fresh results:

```bash
npm run local -- eval -c examples/my-example/promptfooconfig.yaml --no-cache
```

**Export and inspect results** to verify pass/fail/errors:

```bash
npm run local -- eval -c path/to/config.yaml -o output.json --no-cache
```

Add `--env-file .env` or another explicit env file only when the eval needs local
secrets and the file exists.

Review the output file for `success`, `score`, and `error` fields. With the default
pass-rate threshold, exit code 0 means the eval met the threshold; still inspect the
JSON for per-test failures, errors, and scores, especially when the threshold has been
lowered. This is the standard command for verifying a PR end-to-end.

Keep local secrets in the repo's gitignored `.env` (or another path the user points at
with `--env-file`); never echo them into logs or commit messages.

## End-to-End Work Expectations

When asked only to review or audit a PR, keep the work read-only: inspect the branch, diff, PR comments, and CI as needed; run non-mutating tests or QA when useful; and report findings without committing, pushing, or changing files unless the user explicitly asks for fixes.

When asked to fix, improve, or land a PR, own the full loop: check out the branch, inspect the diff and PR comments, merge or rebase on current `origin/main` when requested, run focused tests, run the relevant real workflow, commit, push, and watch CI until it is green or the remaining failure is clearly unrelated.

**Standing commit/push authorization on feature branches.** When the user has asked you to fix, improve, or land work on a non-`main` branch, you have durable authorization to `git commit` and `git push` to that branch's tracking remote without per-step confirmation. Do not pause to ask "want me to commit?" — committing and pushing is part of the requested work. The safety constraints in _Git Workflow (CRITICAL)_ below (no commits to `main`, no `--force` without approval, no `--no-verify`, etc.) still apply.

**After landing a PR, watch `main` until its CI is green.** Merging is not the end of the loop. The squash commit kicks off a fresh CI run on `main` that can fail for reasons the PR's own checks never surfaced — a base that advanced, a flaky job, or a real regression from the merge. After you merge, follow that `main` run to completion and do not leave `main` red:

- **Classify before reacting.** Read the failing job's logs and decide whether it is a flake or a real failure. A failure is a _flake_ when the tests themselves pass and the job dies on infrastructure noise — e.g. a vitest `Worker exited unexpectedly` / `Timeout terminating forks worker` printed _after_ `Test Files N passed`, a cache-cleanup step (`Post Use Node …`), or a transient `Install Dependencies` error. The signature of a flake is non-determinism: different jobs/files fail across consecutive commits. A _real_ failure is deterministic and attributable to the change — the same test, build, or type error fails on re-run.
- **Flake → re-run.** Re-run only the failed jobs (`gh run rerun <run-id> --failed`) and confirm the run goes green. Do not blame the just-merged change for a flake it cannot have caused (e.g. a config-only diff breaking a test worker).
- **Real regression → fix it.** Open a follow-up PR (never commit to `main` directly). If the regression is yours and `main` is broken for everyone, prefer reverting the merge to get `main` green quickly, then re-land with the fix.
- **Recurring class of flake → fix the flake itself.** If the same failure mode keeps reddening `main` across unrelated PRs, treat the flake as the bug: open a separate PR that fixes it at the source (a leaked handle keeping a test worker alive, an over-tight timeout, a fragile setup step) instead of re-running indefinitely.

Confirm the final `main` state is green, or that the only remaining failure is a pre-existing, clearly-unrelated issue you have explicitly flagged.

For behavior changes, do not stop at unit tests. Run the actual CLI or example with the local build. For eval and redteam work, prefer:

```bash
npm run local -- eval -c path/to/promptfooconfig.yaml --no-cache -o output.json
```

Add `--env-file .env` only when the eval needs local credentials and the file exists.

Inspect exported JSON for `success`, `score`, `error`, provider outputs, traces, and redteam findings. If you claim a redteam ran, report the plugins, strategies, interesting failures, and the evidence reviewed.

## Debugging & Troubleshooting

**Before running tests or review checks, align Node with the repo version first:**

```bash
nvm use
```

If you're using npm rather than pnpm/yarn, match the repo's npm major before treating install behavior as authoritative:

```bash
npm install -g npm@11
```

If Node-based tools fail with `ERR_MODULE_NOT_FOUND` or similar missing-package errors in a fresh worktree, run `npm ci` before treating the environment as blocked.

**Verbose logging:**

```bash
npm run local -- eval -c config.yaml --verbose
# Or set environment variable
LOG_LEVEL=debug npm run local -- eval -c config.yaml
```

**Disable cache** (results may be cached during development):

```bash
npm run local -- eval -c config.yaml --no-cache
```

**View results in web UI:** First check if the Web UI is running on port 3000, then ask user before starting. Use `npm run dev` for localhost:3000.

**Cache:** Located at `~/.promptfoo/cache` by default, unless overridden with
`PROMPTFOO_CACHE_PATH` or `PROMPTFOO_CONFIG_DIR`. **NEVER delete or clear the cache
without explicit permission.** Use `--no-cache` flag instead.

**Database:** Located at `~/.promptfoo/promptfoo.db` (SQLite). You may read from it but **NEVER delete it**.

## Git Workflow (CRITICAL)

- **NEVER** commit/push directly to main
- **NEVER** use `--force` without explicit approval
- **NEVER** comment on GitHub issues - only create PRs to address them
- **ALWAYS create new commits** - never amend, squash, or rebase unless explicitly asked
- All changes go through pull requests

**Standard workflow:**

```bash
git checkout main && git pull origin main   # Always start fresh
git checkout -b feature/your-branch-name    # New branch for changes
# Make changes...
git add <specific-files>                    # Never blindly add everything
npm run l && npm run f                      # Lint and format before commit/push
git commit -m "type(scope): description"    # Conventional commit format
git fetch origin main && git merge origin/main  # Sync with main
git push -u origin feature/your-branch-name # Push branch
```

**Conventional commit types:** `feat`, `fix`, `chore`, `docs`, `test`, `refactor`, `ci`, `perf`

See `docs/agents/git-workflow.md` for full workflow.
See `docs/agents/pr-conventions.md` for PR title format and scope selection (especially THE REDTEAM RULE).

## Pull Request Creation

- **Default to full (non-draft) PRs.** Omit `--draft` from `gh pr create` unless the
  user explicitly asks for a draft, or the PR is for an unpublished security advisory
  (see "Security-Sensitive PRs" below). `docs/agents/pr-conventions.md` lists the full
  set of draft exceptions.
- **Never attribute commits or PR bodies to Claude / Claude Code.** No
  `Co-Authored-By: Claude…` trailers, no "Generated with Claude Code" footers. Use
  your configured git identity only.
- **Update the existing PR instead of opening a new one** when iterating on a branch
  that already has an open PR. Push to the same branch. Only run `gh pr create` if the
  user explicitly asks for a new PR or the existing PR is closed.
- **Don't let `npm audit fix` drift ride along with an unrelated change.** If
  `package-lock.json` changes outside the scope of the PR, revert the drift and ship
  it separately so reviewers can reason about each change independently.

## Security-Sensitive PRs

- **Before opening any public PR for a CVE/GHSA:** confirm the advisory has been
  published and the coordinated-disclosure embargo has lifted. See `SECURITY.md` for
  the disclosure policy. If the advisory is still private, use the GHSA private
  collaboration flow (or a temporary private fork) until the release that contains the
  fix is cut.
- Do **not** put the CVE/GHSA identifier, exploit description, or vulnerable-version
  range in a PR title, body, or branch name before disclosure.
- Every security fix should land with a regression test that exercises the original
  attack vector.

## Screenshots for Pull Requests

GitHub has no official API for uploading images to PR descriptions. When asked to add screenshots to a PR:

1. **Take the screenshot** using browser tools or other methods
2. **Upload to freeimage.host** (no API key required):

```bash
curl -s -X POST \
  -F "source=@/path/to/screenshot.png" \
  -F "type=file" \
  -F "action=upload" \
  "https://freeimage.host/api/1/upload?key=6d207e02198a847aa98d0a2a901485a5" \
  | jq -r '.image.url'
```

3. **Update the PR body** with the returned URL:

```bash
gh pr edit <PR_NUMBER> --body "$(cat <<'EOF'
## Summary
...
## Screenshot
![Screenshot](https://iili.io/XXXXXXX.png)
...
EOF
)"
```

**Do NOT:**

- Commit screenshots to the branch
- Upload to GitHub release assets
- Use GitHub's internal upload endpoints (require browser cookies, not PATs)

## Code Style Guidelines

- Use TypeScript with strict type checking
- Keep tracked root-owned TypeScript files under the root compiler project unless they belong to an explicitly separate project such as `src/app/` or `test/code-scan-action/`.
- Follow consistent import order (Biome handles sorting)
- Use consistent curly braces for all control statements
- Prefer `const` over `let`; avoid `var`
- Use object shorthand syntax whenever possible
- Use `async/await` for asynchronous code
- Use Vitest for all tests (both `test/` and `src/app/`)
- Use consistent error handling with proper type checks
- Avoid re-exporting from files; import directly from the source module

**Before committing:** `npm run l && npm run f`

**Pre-commit hook:** A pre-commit hook is installed automatically on `npm install` and runs Biome and Prettier on staged files.

## Logging

Use the logger with object context (auto-sanitized):

```typescript
logger.debug('[Component] Message', { headers, body, config });
```

See `docs/agents/logging.md` for details on sanitization patterns.

## Testing

- **Vitest** is the test framework for all tests
- Frontend tests (`src/app/`): Vitest with explicit imports
- Backend tests (`test/`): Vitest with globals enabled (`describe`, `it`, `expect` available without imports)

See `test/AGENTS.md` for testing patterns.

## Project Conventions

- **ESM modules** (type: "module" in package.json)
- **Node.js ^20.20.0 || >=22.22.0** - Before `npm`/`vite`/`vitest`, run `source ~/.nvm/nvm.sh && nvm use` so `node -v` matches `.nvmrc`. If you're using npm, upgrade to `npm@11` so the repo's release-age policy is applied consistently. `.npmrc` sets `engine-strict=true`
- **Alternative package managers** (pnpm, yarn) are supported
- **File structure:** core logic in `src/`, tests in `test/`
- **Examples** belong in `examples/` with clear README.md
- **Drizzle ORM** for database operations
- **Workspaces** include `src/app` and `site` directories
- **Don't edit `CHANGELOG.md`** - it's auto-generated

## Before Writing Code

- **Search for existing implementations** before creating new code
- **Check for existing utilities** in `src/util/` before adding helpers
- **Don't add dependencies** without checking if functionality exists in current deps
- **Reuse patterns** from similar files in the codebase
- **Test both success and error cases** for all functionality
- **Document provider configurations** following examples in existing code

## Adversarial and Redteam Bias

For security, model scanning, redteam, and coding-agent work, test like an attacker first. Look for false negatives, bypasses, hidden payloads, unsafe tool use, prompt injection, exfiltration, cache misuse, and evidence gaps. When a bypass is found, add a focused regression test before or alongside the fix.

For demo/example apps used to show red teaming, do not harden away all interesting findings unless explicitly asked. A slightly vulnerable sample app is useful when the goal is to demonstrate Promptfoo's ability to find real breaks.

## Review Guidelines

- Prioritize security regressions first, especially injection risks, unsafe handling of user-controlled or adversarial content, credential exposure, SSRF, path traversal, unsafe deserialization, and authorization mistakes.
- Then prioritize correctness issues that can break behavior, public APIs, data integrity, concurrency, or error handling.
- Treat missing or ineffective tests as a P1 issue when a change adds security-sensitive behavior, changes public behavior, or fixes a bug without meaningful coverage.
- Focus on the code changed by the pull request. Do not flag pre-existing issues outside the touched diff unless the pull request materially worsens them.
- Avoid repeating findings that were already raised in the current pull request unless the new diff reintroduces them or leaves the same risk in newly changed code.
- Verify findings on the current branch tip after syncing with the latest `main`.
- Treat existing PR comments and bot reviews as hints; confirm they still apply before reporting them. If CI is failing, inspect the failing job logs and separate unrelated base-branch failures from PR regressions.
- Ignore formatting, import ordering, naming, and other style-only issues already enforced by CI or repository tooling.
- If a pull request is primarily about redteam functionality, verify the title follows THE REDTEAM RULE in `docs/agents/pr-conventions.md` and uses `(redteam)` scope. Incidental `src/redteam/` touches in broad maintenance PRs do not require `(redteam)` scope.

## Documentation Testing

When testing doc changes, speed up builds by skipping OG image generation:

```bash
cd site
SKIP_OG_GENERATION=true npm run build
```

See `site/AGENTS.md` for documentation guidelines.

## Additional Documentation

Read these when relevant to your task:

| Document                               | When to Read                              |
| -------------------------------------- | ----------------------------------------- |
| `docs/agents/pr-conventions.md`        | Creating pull requests                    |
| `docs/agents/git-workflow.md`          | Git operations                            |
| `docs/agents/dependency-management.md` | Updating packages                         |
| `docs/agents/logging.md`               | Adding logging to code                    |
| `docs/agents/python.md`                | Python providers/scripts                  |
| `docs/agents/database-security.md`     | Writing database queries                  |
| `src/app/AGENTS.md`                    | Frontend React development                |
| `src/providers/AGENTS.md`              | Adding/modifying LLM providers            |
| `test/AGENTS.md`                       | Writing tests                             |
| `site/AGENTS.md`                       | Documentation site changes                |
| `.github/AGENTS.md`                    | GitHub Actions / release workflow changes |
Share: