<h1 align="center">
<a href="https://prompts.chat">
Fine-tuning framework for LLMs. Config-driven: every training run is defined by a single YAML file.
Loading actions...
<a href="https://prompts.chat">
TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend linting, type safety, or ESLint configuration.
risks
Fine-tuning framework for LLMs. Config-driven: every training run is defined by a single YAML file.
Python, PyTorch, HuggingFace Transformers, TRL, PEFT (LoRA/QLoRA), DeepSpeed, FSDP, vLLM (for GRPO generation).
axolotl train config.yaml # Train (single or multi-GPU, auto-detected)
axolotl preprocess config.yaml # Tokenize dataset and validate config
axolotl preprocess config.yaml --debug # Inspect tokenized samples and label masking
axolotl inference config.yaml # Interactive inference
axolotl merge-lora config.yaml # Merge LoRA adapter into base model
axolotl vllm-serve config.yaml # Start vLLM server for GRPO/EBFT training
axolotl fetch examples # Download example configs
axolotl agent-docs # Show agent-optimized docs (bundled with pip package)
axolotl agent-docs grpo # Topic-specific agent reference
axolotl config-schema # Dump config JSON schema
| Method | Config Key | When to Use |
|---|---|---|
| SFT | (default) | Input-output pairs, instruction tuning |
| DPO/IPO | rl: dpo / rl: dpo, dpo_loss_type: ["ipo"] | Paired preference data (chosen vs rejected) |
| KTO | rl: kto | Unpaired binary preference labels |
| ORPO | rl: orpo | Single-stage alignment, no ref model |
| GRPO | rl: grpo | RL with verifiable reward functions (math, code) |
| EBFT | rl: ebft | Feature-matching rewards from internal representations |
Agent-specific references:
All training is config-driven. A YAML file specifies model, adapter, dataset(s), and hyperparameters:
base_model: meta-llama/Llama-3.1-8B-Instruct
adapter: lora # or qlora, or omit for full fine-tune
datasets:
- path: my_dataset
type: chat_template # prompt strategy (see docs/dataset-formats/)
output_dir: ./outputs/lora-out
Config schema: src/axolotl/utils/schemas/config.py (AxolotlInputConfig).
src/axolotl/
cli/ # CLI entry points (train, preprocess, inference, merge_lora, vllm_serve)
core/
builders/ # TrainerBuilder classes (causal.py for SFT, rl.py for RLHF)
trainers/ # Trainer classes, mixins (optimizer, scheduler, packing)
dpo/ # DPO trainer and config
grpo/ # GRPO trainer and sampler
loaders/ # Model, tokenizer, adapter, processor loading
prompt_strategies/ # Dataset format handlers (chat_template, alpaca, dpo/, kto/, orpo/)
utils/schemas/ # Pydantic config schemas (config, model, training, peft, trl, fsdp)
integrations/ # Plugins (liger, cut_cross_entropy, swanlab, nemo_gym)
monkeypatch/ # Runtime patches for HF transformers
examples/ # Example YAML configs by model (llama-3/, qwen2/, mistral/, ebft/)
deepspeed_configs/ # DeepSpeed JSON configs (zero2, zero3)
docs/ # Quarto documentation site
src/axolotl/prompt_strategies/ — each type: value maps to a functionplugins: list in config loads integration modulescore/trainers/mixins/ for composable trainer behaviorsutils/schemas/