<h2 align="center">Awesome Prompt Engineering 🧙‍♂️</h2>

PublishedJan 12, 2026

Loading actions...

5 minBeginnerpromptSingle file

Skill content

Main instructions and any bundled files for this skill.

markdown

Awesome Prompt Engineering 🧙‍♂️

A hand-curated collection of resources for Prompt Engineering and Context Engineering — covering papers, tools, models, APIs, benchmarks, courses, and communities for working with Large Language Models.

https://promptslab.github.io

   Master Prompt Engineering. Join the Course at https://promptslab.github.io

🚀 Start Here

New to prompt engineering? Follow this path:

Learn the basics → ChatGPT Prompt Engineering for Developers (free, ~90 min)
Read the guide → Prompt Engineering Guide by DAIR.AI (open-source, comprehensive)
Study provider docs → OpenAI Prompt Engineering Guide · Anthropic Prompt Engineering Guide
Understand where the field is heading → Anthropic: Effective Context Engineering for AI Agents
Read the research → The Prompt Report — taxonomy of 58+ prompting techniques from 1,500+ papers

Papers
Tools and Code
APIs
Datasets and Benchmarks
Models
AI Content Detectors
Books
Courses
Tutorials and Guides
Videos
Communities
How to Contribute

Papers

📄

Major Surveys

The Prompt Report: A Systematic Survey of Prompting Techniques [2024] — Most comprehensive survey: taxonomy of 58 text and 40 multimodal prompting techniques from 1,500+ papers. Co-authored with OpenAI, Microsoft, Google, Stanford.
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications [2024] — 44 techniques across application areas with per-task performance summaries.
A Survey of Prompt Engineering Methods in LLMs for Different NLP Tasks [2024] — 39 prompting methods across 29 NLP tasks.
A Survey of Automatic Prompt Engineering: An Optimization Perspective [2025] — Formalizes auto-PE methods as discrete/continuous/hybrid optimization problems.
Efficient Prompting Methods for Large Language Models: A Survey [2024] — Survey of efficiency-oriented prompting (compression, optimization, APE) for reducing compute and latency.
Navigate through Enigmatic Labyrinth: A Survey of Chain of Thought Reasoning [2023, ACL 2024] — Systematic CoT survey.
Demystifying Chains, Trees, and Graphs of Thoughts [2024] — Unified framework for multi-prompt reasoning topologies.
Towards Goal-oriented Prompt Engineering for Large Language Models: A Survey [2024] — Focuses on prompts designed around explicit task goals.
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning LLMs [2025] — Distinguishes Long CoT from Short CoT in o1/R1-era models.

Prompt Optimization and Automatic Prompting

OPRO: Large Language Models as Optimizers [2023, NeurIPS 2024] — Uses LLMs as optimizers via meta-prompts; optimized prompts outperform human-designed ones by up to 50% on BBH.
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines [2023, ICLR 2024] — Framework for programming (not prompting) LLMs with automatic prompt optimization.
MIPRO: Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs [2024, EMNLP 2024] — Bayesian optimization for multi-stage LM programs; up to 13% accuracy gains.
TextGrad: Automatic "Differentiation" via Text [2024] — Treats compound AI systems as computation graphs with textual feedback as gradients. Published in Nature.
EvoPrompt [2023, ACL 2024] — Evolutionary algorithm approach for automatically optimizing discrete prompts.
Meta Prompting for AI Systems [2023, ICLR 2024 Workshop] — Example-agnostic structural templates formalized using category theory.
Prompt Engineering a Prompt Engineer (PE²) [2024, ACL Findings] — Uses LLMs to meta-prompt themselves, refining prompts with step-by-step templates to significantly improve reasoning.
Large Language Models Are Human-Level Prompt Engineers [2022] — Automatic prompt generation via APE.
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning [2023]
SPO: Self-Supervised Prompt Optimization [2025] — Competitive performance at 1–6% of the cost of prior methods.

Prompt Compression

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression [2024, ACL 2024] — 3x–6x faster than LLMLingua with GPT-4 data distillation.
LongLLMLingua [2023, ACL 2024] — Question-aware compression for long contexts; 21.4% performance boost with 4x fewer tokens.
Prompt Compression for Large Language Models: A Survey [2024] — Comprehensive survey of hard and soft prompt compression methods.

Reasoning Advances

Scaling LLM Test-Time Compute Optimally [2024] — Shows optimal test-time compute allocation can outperform 14x larger models.
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [2025] — Pure RL-trained reasoning model matching o1; open-source with distilled variants.
s1: Simple Test-Time Scaling [2025] — SFT on just 1,000 examples creates competitive reasoning model via "budget forcing."
Reasoning Language Models: A Blueprint [2025] — Systematic framework organizing reasoning LM approaches.
Demystifying Long Chain-of-Thought Reasoning in LLMs [2025] — Analyzes long CoT behavior in modern reasoning models.
Graph of Thoughts: Solving Elaborate Problems with LLMs [2023, AAAI 2024] — Models thoughts as arbitrary graphs; 62% quality improvement over ToT on sorting.
Tree of Thoughts: Deliberate Problem Solving with LLMs [2023, NeurIPS 2023] — Tree search over reasoning paths.
Everything of Thoughts [2023] — Integrates CoT, ToT, and external solvers via MCTS.
Skeleton-of-Thought [2023] — Parallel decoding via answer skeleton generation for up to 2.69x speedup.
Chain of Thought Prompting Elicits Reasoning in Large Language Models [2022] — The foundational CoT paper.
Self-Consistency Improves Chain of Thought Reasoning [2022] — Aggregating multiple CoT outputs for reliability.
Large Language Models are Zero-Shot Reasoners [2022] — "Let's think step by step" as a zero-shot reasoning trigger.
ReAct: Synergizing Reasoning and Acting in Language Models [2022] — Interleaving reasoning and tool use.

In-Context Learning

Many-Shot In-Context Learning [2024, NeurIPS 2024 Spotlight] — Significant gains scaling ICL to hundreds/thousands of examples; introduces Reinforced and Unsupervised ICL.
Many-Shot In-Context Learning in Multimodal Foundation Models [2024] — Scales multimodal ICL to ~2,000 examples across 14 datasets.
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? [2022]
Fantastically Ordered Prompts and Where to Find Them [2021] — Overcoming few-shot prompt order sensitivity.
Calibrate Before Use: Improving Few-Shot Performance of Language Models [2021]

Agentic Prompting and Multi-Agent Systems

Agentic Large Language Models: A Survey [2025] — Comprehensive survey organizing agentic LLMs by reasoning, acting, and interacting capabilities.
Large Language Model based Multi-Agents: A Survey of Progress and Challenges [2024] — Covers profiling, communication, and growth mechanisms.
Multi-Agent Collaboration Mechanisms: A Survey of LLMs [2025] — Reviews debate and cooperation strategies in LLM-based multi-agent systems.
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation [2023] — Microsoft's foundational multi-agent framework paper.
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-World APIs [2023, ICLR 2024] — Trains LLMs to use massive real-world API collections.
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? [2023, ICLR 2024] — The benchmark driving agentic coding progress.
AgentBench: Evaluating LLMs as Agents [2023, ICLR 2024] — Benchmark across 8 environments.
PAL: Program-aided Language Models [2023] — Offloading computation to code interpreters.

Text-to-Image Generation

Text-to-Music/Audio Generation

Foundational Papers (Pre-2024)

These papers established the core concepts that modern prompt engineering builds on:

Language Models are Few-Shot Learners (GPT-3) [2020] — Demonstrated few-shot prompting at scale.
Prefix-Tuning: Optimizing Continuous Prompts for Generation [2021]
The Power of Scale for Parameter-Efficient Prompt Tuning [2021]
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm [2021]
Show Your Work: Scratchpads for Intermediate Computation with Language Models [2021]
Generated Knowledge Prompting for Commonsense Reasoning [2021]
Making Pre-trained Language Models Better Few-shot Learners [2021]
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts [2020]
How Can We Know What Language Models Know? [2020]
A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT [2023]
Synthetic Prompting: Generating Chain-of-Thought Demonstrations for LLMs [2023]
Progressive Prompts: Continual Learning for Language Models [2023]
Successive Prompting for Decompleting Complex Questions [2022]
Decomposed Prompting: A Modular Approach for Solving Complex Tasks [2022]
PromptChainer: Chaining Large Language Model Prompts through Visual Programming [2022]
Ask Me Anything: A Simple Strategy for Prompting Language Models [2022]
Prompting GPT-3 To Be Reliable [2022]
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning [2022]

Tools and Code

🔧

Prompt Management and Testing

Name	Description	Link
Promptfoo	Open-source CLI for testing, evaluating, and red-teaming LLM prompts. YAML configs, CI/CD integration, adversarial testing. ~9K+ ⭐	GitHub
Promptify	Solve NLP Problems with LLM's & Easily generate different NLP Task prompts for popular generative models like GPT, PaLM, and more with Promptify	[Github]
Agenta	Open-source LLM developer platform for prompt management, evaluation, human feedback, and deployment.	GitHub
PromptLayer	Version, test, and monitor every prompt and agent with robust evals, tracing, and regression sets.	Website
Helicone	Production prompt monitoring and optimization platform.	Website
LangGPT	Framework for structured and meta-prompt design. 10K+ ⭐	GitHub
ChainForge	Visual toolkit for building, testing, and comparing LLM prompt responses without code.	GitHub
LMQL	A query language for LLMs making complex prompt logic programmable.	GitHub
Promptotype	Platform for developing, testing, and managing structured LLM prompts.	Website
PromptPanda	AI-powered prompt management system for streamlining prompt workflows.	Website
Promptimize AI	Browser extension to automatically improve user prompts for any AI model.	Website
PROMPTMETHEUS	Web-based "Prompt Engineering IDE" for iteratively creating and running prompts.	Website
Better Prompt	Test suite for LLM prompts before pushing to production.	GitHub
OpenPrompt	Open-source framework for prompt-learning research.	GitHub
Prompt Source	Toolkit for creating, sharing, and using natural language prompts.	GitHub
Prompt Engine	NPM utility library for creating and maintaining prompts for LLMs (Microsoft).	GitHub
PromptInject	Framework for quantitative analysis of LLM robustness to adversarial prompt attacks.	GitHub

LLM Evaluation Tools

Name	Description	Link
DeepEval	Open-source evaluation framework covering RAG, agents, and conversations with CI/CD integration. ~7K+ ⭐	GitHub
Ragas	RAG evaluation with knowledge-graph-based test set generation and 30+ metrics. ~8K+ ⭐	GitHub
LangSmith	LangChain's platform for debugging, testing, evaluating, and monitoring LLM applications.	Website
Langfuse	Open-source LLM observability with tracing, prompt management, and human annotation. ~7K+ ⭐	GitHub
Braintrust	End-to-end AI evaluation platform, SOC2 Type II certified.	Website
Arize AI / Phoenix	Real-time LLM monitoring with drift detection and tracing.	GitHub
TruLens	Evaluating and explaining LLM apps; tracks hallucinations, relevance, groundedness.	GitHub
InspectAI	Purpose-built for evaluating agents against benchmarks (UK AISI).	GitHub
Opik	Evaluate, test, and ship LLM applications across dev and production lifecycles.	GitHub

Agent Frameworks

Name	Description	Link
LangChain / LangGraph	Most widely adopted LLM app framework; LangGraph adds graph-based multi-step agent workflows. ~100K+ / ~10K+ ⭐	GitHub · LangGraph
CrewAI	Role-playing AI agent orchestration with 700+ integrations. ~44K+ ⭐	GitHub
AutoGen (AG2)	Microsoft's multi-agent conversational framework. ~40K+ ⭐	GitHub
DSPy	Stanford's framework for programming LLMs with automatic prompt/weight optimization. ~22K+ ⭐	GitHub
OpenAI Agents SDK	Official agent framework with function calling, guardrails, and handoffs. ~10K+ ⭐	GitHub
Semantic Kernel	Microsoft's AI framework powering M365 Copilot; C#, Python, Java. ~24K+ ⭐	GitHub
LlamaIndex	Data framework for RAG and agent capabilities. ~40K+ ⭐	GitHub
Haystack	Open-source NLP framework with pipeline architecture for RAG and agents. ~20K+ ⭐	GitHub
Agno (formerly Phidata)	Python agent framework with microsecond instantiation. ~20K+ ⭐	GitHub
Smolagents	Hugging Face's minimalist code-centric agent framework (~1000 LOC). ~15K+ ⭐	GitHub
Pydantic AI	Type-safe agent framework using Pydantic for structured validation. ~8K+ ⭐	GitHub
Mastra	TypeScript AI agent framework with assistants, RAG, and observability. ~20K+ ⭐	GitHub
Google ADK	Agent Development Kit deeply integrated with Gemini and Google Cloud.	GitHub
Strands Agents (AWS)	Model-agnostic framework with deep AWS integrations.	GitHub
Langflow	Node-based visual agent builder with drag-and-drop. ~50K+ ⭐	GitHub
n8n	Workflow automation with AI agent capabilities and 400+ integrations. ~60K+ ⭐	GitHub
Dify	All-in-one backend for agentic workflows with tool-using agents and RAG.	GitHub
PraisonAI	Multi-AI Agents framework with 100+ LLM support, MCP integration, and built-in memory.	GitHub
Neurolink	Multi-provider AI agent framework unifying 12+ providers with workflow orchestration.	GitHub
Composio	Connect 100+ tools to AI agents with zero setup.	GitHub

Prompt Optimization Tools

Name	Description	Link
DSPy	Multiple optimizers (MIPROv2, BootstrapFewShot, COPRO) for automatic prompt tuning. ~22K+ ⭐	GitHub
TextGrad	Automatic differentiation via text (Stanford). ~2K+ ⭐	GitHub
OPRO	Google DeepMind's optimization by prompting.	GitHub

Red Teaming and Prompt Security

Name	Description	Link
Garak (NVIDIA)	LLM vulnerability scanner for hallucination, injection, and jailbreaks — the "nmap for LLMs." ~3K+ ⭐	GitHub
PyRIT (Microsoft)	Python Risk Identification Tool for automated red-teaming. ~3K+ ⭐	GitHub
DeepTeam	40+ vulnerabilities, 10+ attack methods, OWASP Top 10 support.	GitHub
LLM Guard	Security toolkit for LLM I/O validation. ~2K+ ⭐	GitHub
NeMo Guardrails (NVIDIA)	Programmable guardrails for conversational systems. ~5K+ ⭐	GitHub
Guardrails AI	Define strict output formats (JSON schemas) to ensure system reliability.	Website
Lakera	AI security platform for real-time prompt injection detection.	Website
Purple Llama (Meta)	Open-source LLM safety evaluation including CyberSecEval.	GitHub
GPTFuzz	Automated jailbreak template generation achieving >90% success rates.	GitHub
Rebuff	Open-source tool for detection and prevention of prompt injection.	GitHub

MCP (Model Context Protocol)

MCP is an open standard developed by Anthropic (Nov 2024, donated to Linux Foundation Dec 2025) for connecting AI assistants to external data sources and tools through a standardized interface. It has 97M+ monthly SDK downloads and has been adopted by GitHub, Google, and most major AI providers.

Name	Description	Link
MCP Specification	The core protocol specification and SDKs. ~15K+ ⭐	GitHub
MCP Reference Servers	Official implementations: fetch, filesystem, GitHub, Slack, Postgres.	GitHub
FastMCP (Python)	High-level Pythonic framework for building MCP servers. ~5K+ ⭐	GitHub
GitHub MCP Server	GitHub's official MCP server for repo, issue, PR, and Actions interaction. ~15K+ ⭐	GitHub
Awesome MCP Servers	Curated list of 10,000+ community MCP servers. ~30K+ ⭐	GitHub
Context7	MCP server providing version-specific documentation to reduce code hallucination.	GitHub
GitMCP	Creates remote MCP servers for any GitHub repo by changing the domain.	Website
MCP Inspector	Visual testing tool for MCP server development.	GitHub

Vibe Coding and AI Coding Assistants

Name	Description	Link
Claude Code	Anthropic's command-line AI coding tool; widely considered one of the best AI coding assistants (2026).	Docs
Cursor	AI-native code editor; Composer feature generates entire applications from natural language.	Website
Windsurf (Codeium)	"First agentic IDE" with multi-file editing and project-wide context.	Website
GitHub Copilot	AI pair programmer; ~30% of new GitHub code comes from Copilot.	Website
Aider	Open-source terminal AI pair programmer with Git integration. ~25K+ ⭐	GitHub
Cline	Open-source VS Code AI assistant connecting editor and terminal through MCP. ~20K+ ⭐	GitHub
Continue	Open-source IDE extensions for custom AI code assistants. ~22K+ ⭐	GitHub
OpenAI Codex CLI	Lightweight terminal coding agent.	GitHub
Gemini CLI	Google's open-source terminal AI agent.	GitHub
Bolt.new	Browser-based prompt-to-app generation with one-click deployment.	Website
Lovable	Full-stack apps from natural language descriptions.	Website
v0 (Vercel)	AI assistant for building Next.js frontend components from text.	Website
Firebase Studio	Google's agentic cloud-based development environment.	Website

Other Notable Repositories

Name	Description	Link
Prompt Engineering Guide (DAIR.AI)	The definitive open-source guide and resource hub. 3M+ learners. ~55K+ ⭐	GitHub
Awesome ChatGPT Prompts / Prompts.chat	World's largest open-source prompt library. 1000s of prompts for all major models.	GitHub
12-Factor Agents	Principles for building production-grade LLM-powered software. ~17K+ ⭐	GitHub
NirDiamant/Prompt_Engineering	22 hands-on Jupyter Notebook tutorials. ~3K+ ⭐	GitHub
Context Engineering Repository	First-principles handbook for moving beyond prompt engineering to context design.	GitHub
AI Agent System Prompts Library	Collection of system prompts from production AI coding agents (Claude Code, Gemini CLI, Cline, Aider, Roo Code).	GitHub
Awesome Vibe Coding	Curated list of 245+ tools and resources for building software through natural language prompts.	GitHub
OpenAI Cookbook	Official recipes for prompts, tools, RAG, and evaluations.	GitHub
Embedchain	Framework to create ChatGPT-like bots over your dataset.	GitHub
ThoughtSource	Framework for the science of machine thinking.	GitHub
Promptext	Extracts and formats code context for AI prompts with token counting.	GitHub
Price Per Token	Compare LLM API pricing across 200+ models.	Website

APIs

💻

OpenAI

Model	Context	Price (Input/Output per 1M tokens)	Key Feature
GPT-5.2 / 5.2 Thinking	400K	$1.75 / $14	Latest flagship, 90% cached discount, configurable reasoning
GPT-5.1	400K	$1.25 / $10	Previous generation flagship
GPT-4.1 / 4.1 mini / nano	1M	$2 / $8	Best non-reasoning model, 40% faster and 80% cheaper than GPT-4o
o3 / o3-pro	200K	Varies	Reasoning models with native tool use
o4-mini	200K	Cost-efficient	Fast reasoning, best on AIME at its cost class
GPT-OSS-120B / 20B	128K	$0.03 / $0.30	First open-weight models, Apache 2.0

Key features: Responses API, Agents SDK, Structured Outputs, function calling, prompt caching (90% discount), Batch API (50% discount), MCP support. Platform Docs

Anthropic (Claude)

Model	Context	Price (Input/Output per 1M tokens)	Key Feature
Claude Opus 4.6	1M (beta)	$5 / $25	Most powerful, state-of-the-art coding and agentic tasks
Claude Sonnet 4.5	200K	$3 / $15	Best coding model, 61.4% OSWorld (computer use)
Claude Haiku 4.5	200K	Fast tier	Near-frontier, fastest model class
Claude Opus 4 / Sonnet 4	200K	$15/$75 (Opus)	Opus: 72.5% SWE-bench, Sonnet 4 powers GitHub Copilot

Key features: Extended Thinking with tool use, Computer Use, MCP (originated here), prompt caching, Claude Code CLI, available on AWS Bedrock and Google Vertex AI. API Docs

Google (Gemini)

Model	Context	Price (Input/Output per 1M tokens)	Key Feature
Gemini 3 Pro Preview	1M	$2 / $12	Most intelligent Google model, deployed to 2B+ Search users
Gemini 2.5 Pro	1M	$1.25 / $10	Best for coding/agentic tasks, thinking model
Gemini 2.5 Flash / Flash-Lite	1M	$0.30/$1.50 · $0.10/$0.40	Price-performance leaders

Key features: Thinking (all 2.5+ models), Google Search grounding, code execution, Live API (real-time audio/video), context caching. Google AI Studio

Meta (Llama)

Model	Architecture	Context	Key Feature
Llama 4 Scout	109B MoE / 17B active	10M	Fits single H100, multimodal, open-weight
Llama 4 Maverick	400B MoE / 17B active, 128 experts	1M	Beats GPT-4o, open-weight
Llama 3.3 70B	Dense	128K	Matches Llama 3.1 405B

Available on 25+ cloud partners, Hugging Face, and inference APIs. Llama

Other Notable Providers

Provider	Description	Link
Mistral AI	Mistral Large 3 (675B MoE), Devstral 2, Ministral 3. Apache 2.0.	Website
DeepSeek	V3.2 (671B MoE), R1 (reasoning, MIT license). $0.15/$0.75 per 1M tokens.	Website
xAI (Grok)	Grok 4.1 Fast: 2M context, $0.20/$0.50 per 1M tokens.	Website
Cohere	Command A (111B, 256K context), Embed v4, Rerank 4.0. Excels at RAG.	Website
Together AI	200+ open models with sub-100ms latency.	Website
Groq	LPU hardware with ~300+ tokens/sec inference.	Website
Fireworks AI	Fast inference with HIPAA + SOC2 compliance.	Website
OpenRouter	Unified API for 300+ models from all providers.	Website
Cerebras	Wafer-scale chips with best total response time.	Website
Perplexity AI	Search-augmented API with citations.	Website
Amazon Bedrock	Managed multi-model service with Claude, Llama, Mistral, Cohere.	Website
Hugging Face Inference	Access to open models via API.	Website

Datasets and Benchmarks

💾

Major Benchmarks (2024–2026)

Name	Description	Link
Chatbot Arena / LM Arena	6M+ user votes for Elo-rated pairwise LLM comparisons. De facto standard for human preference.	Website
MMLU-Pro	12,000+ graduate-level questions across 14 domains. NeurIPS 2024 Spotlight.	GitHub
GPQA	448 "Google-proof" STEM questions; non-expert validators achieve only 34%.	arXiv
SWE-bench Verified	Human-validated 500-task subset for real-world GitHub issue resolution.	Website
SWE-bench Pro	1,865 tasks across 41 professional repos; best models score only ~23%.	Leaderboard
Humanity's Last Exam (HLE)	2,500 expert-vetted questions; top AI scores only ~10–30%.	Website
BigCodeBench	1,140 coding tasks across 7 domains; AI achieves ~35.5% vs. 97% human success.	Leaderboard
LiveBench	Contamination-resistant with frequently updated questions.	Paper
FrontierMath	Research-level math; AI solves only ~2% of problems.	Research
ARC-AGI v2	Abstract reasoning measuring fluid intelligence.	Research
IFEval	Instruction-following evaluation with formatting/content constraints.	arXiv
MLE-bench	OpenAI's ML engineering evaluation via Kaggle-style tasks.	GitHub
PaperBench	Evaluates AI's ability to replicate 20 ICML 2024 papers from scratch.	GitHub

Leaderboards and Meta-Benchmarks

Name	Description	Link
Hugging Face Open LLM Leaderboard v2	Evaluates open models on MMLU-Pro, GPQA, IFEval, MATH.	Leaderboard
Artificial Analysis Intelligence Index v3	Aggregates 10 evaluations.	Website
SEAL by Scale AI	Hosts SWE-bench Pro and agentic evaluations.	Leaderboard

Prompt and Instruction Datasets

Name	Description	Link
P3 (Public Pool of Prompts)	Prompt templates for 270+ NLP tasks used to train T0 and similar models.	HuggingFace
System Prompts Dataset	944 system prompt templates for agent workflows (by Daniel Rosehill, Aug 2025).	HuggingFace
OpenAssistant Conversations (OASST)	161,443 messages in 35 languages with 461,292 quality ratings.	HuggingFace
UltraChat / UltraFeedback	Large-scale synthetic instruction and preference datasets for alignment training.	HuggingFace
SoftAge Prompt Engineering Dataset	1,000 diverse prompts across 10 categories for benchmarking prompt performance.	HuggingFace
Text Transformation Prompt Library	Comprehensive collection of text transformation prompts (May 2025).	HuggingFace
Writing Prompts	~300K human-written stories paired with prompts from r/WritingPrompts.	Kaggle
Midjourney Prompts	Text prompts and image URLs scraped from MidJourney's public Discord.	HuggingFace
CodeAlpaca-20k	20,000 programming instruction-output pairs.	HuggingFace
ProPEX-RAG	Dataset for prompt optimization in RAG workflows.	HuggingFace
NanoBanana Trending Prompts	1,000+ curated AI image prompts from X/Twitter, ranked by engagement.	GitHub

Red Teaming and Adversarial Datasets

Name	Description	Link
HarmBench	510 harmful behaviors across standard, contextual, copyright, and multimodal categories.	Website
JailbreakBench	Open robustness benchmark for jailbreaking with 100 prompts.	Research
AgentHarm	110 malicious agent tasks across 11 harm categories.	arXiv
DecodingTrust	243,877 prompts evaluating trustworthiness across 8 perspectives.	Research
SafetyPrompts.com	Aggregator tracking 50+ safety/red-teaming datasets.	Website

Models

🧠

Frontier Models (2025–2026)

Model	Provider	Context	Key Strength
GPT-5.2	OpenAI	400K	General intelligence, 100% AIME 2025
Claude Opus 4.6	Anthropic	1M (beta)	Coding, agentic tasks, extended thinking
Gemini 3 Pro	Google	1M	#1 LMArena (~1500 Elo), multimodal
Grok 4.1	xAI	2M	#2 LMArena (1483 Elo), low hallucination
Mistral Large 3	Mistral AI	256K	Best open-weight (675B MoE/41B active), Apache 2.0
DeepSeek-V3.2	DeepSeek	128K	Best value (671B MoE/37B active), MIT license
Llama 4 Maverick	Meta	1M	Beats GPT-4o (400B MoE/17B active), open-weight

Reasoning Models

Model	Key Detail
OpenAI o3 / o3-pro	87.7% GPQA Diamond. Native tool use.
OpenAI o4-mini	Best AIME at its cost class with visual reasoning.
DeepSeek-R1 / R1-0528	Open-weight, RL-trained. 87.5% on AIME 2025. MIT license.
QwQ (Qwen with Questions)	32B reasoning model. Apache 2.0. Comparable to R1.
Gemini 2.5 Pro/Flash (Thinking)	Built-in reasoning with configurable thinking budget.
Claude Extended Thinking	Hybrid mode with visible chain-of-thought and tool use.
Phi-4 Reasoning / Plus	14B reasoning models rivaling much larger models. Open-weight.
GPT-OSS-120B	OpenAI's open-weight with CoT. Near-parity with o4-mini. Apache 2.0.

Notable Open-Source Models

Model	Provider	Key Detail
Qwen3-235B-A22B	Alibaba	Flagship MoE. Strong reasoning/code/multilingual. Apache 2.0. Most downloaded family on HuggingFace.
Gemma 3	Google	270M to 27B. Multimodal. 128K context. 140+ languages.
OLMo 2/3	Allen AI	Fully open (data, code, weights, logs). OLMo 2 32B surpasses GPT-3.5. Apache 2.0.
SmolLM3-3B	Hugging Face	Outperforms Llama-3.2-3B. Dual-mode reasoning. 128K context.
Kimi K2	Moonshot AI	32B active. Open-weight. Tailored for coding/agentic use.
Llama 4 Scout	Meta	109B MoE/17B active. 10M token context. Fits single H100.

Code-Specialized Models

Model	Key Detail
Qwen3-Coder (480B-A35B)	69.6% SWE-bench — milestone for open-source coding. 256K context. Apache 2.0.
Devstral 2 (123B)	72.2% SWE-bench Verified. 7x more cost-efficient than Claude Sonnet.
Codestral 25.01	Mistral's code model. 80+ languages. Fill-in-the-Middle support.
DeepSeek-Coder-V2	236B MoE / 21B active. 338 programming languages.
Qwen 2.5-Coder	7B/32B. 92 programming languages. 88.4% HumanEval. Apache 2.0.

Foundational Models (Historical Reference)

These models established key concepts but are largely superseded for practical use:

Model	Provider	Significance
BLOOM 176B	BigScience	First major open multilingual LLM (2022)
GLM-130B	Tsinghua	Open bilingual English/Chinese LLM (2023)
Falcon 180B	TII	Large open generative model (2023)
Mixtral 8x7B	Mistral AI	Pioneered MoE architecture for open models (2023)
GPT-NeoX-20B	EleutherAI	Early open autoregressive LLM
GPT-J-6B	EleutherAI	Early open causal language model

AI Content Detectors

🔎

Leading Commercial Detectors

Name	Accuracy	Key Feature	Link
GPTZero	99% claimed	10M+ users, #1 on G2 (2025). Detects GPT-4/5, Gemini, Claude, Llama. Free tier available.	Website
Originality.ai	98–100% (peer-reviewed)	Consistently rated most accurate. Combines AI detection + plagiarism + fact checking. From $14.95/month.	Website
Turnitin AI Detection	98%+ on unmodified AI text	Dominant in academia. Launched AI bypasser/humanizer detection (Aug 2025). Institutional licensing.	Website
Copyleaks	99%+ claimed	Enterprise tool detecting AI in 30+ languages. LMS integrations.	Website
Winston AI	99.98% claimed	OCR for scanned documents, AI image/deepfake detection. 11 languages.	Website
Pangram Labs	99.3% (COLING 2025)	Highest score in COLING 2025 Shared Task. 100% TPR on "humanized" text. 97.7% adversarial robustness.	Website

Free and Research Detectors

Name	Description	Link
Binoculars	Open-source research detector using cross-perplexity between two LLMs.	arXiv
DetectGPT / Fast-DetectGPT	Statistical method comparing log-probabilities of original text vs. perturbations.	arXiv
Openai Detector	AI classifier for indicating AI-written text (OpenAI Detector Python wrapper)	[GitHub]
Sapling AI Detector	Free browser-based detector (up to 2,000 chars). 97% accuracy in some studies.	Website
QuillBot AI Detector	Free, no sign-up required.	Website
Writer AI Content Detector	Free tool with color-coded results.	Website
ZeroGPT	Popular free detector evaluated in multiple academic studies.	Website

Watermarking Approaches

Name	Description	Link
SynthID (Google DeepMind)	Watermarking for AI text, images, and audio via statistical token sampling. Deployed in Google products.	Website
OpenAI Text Watermarking	Developed but still experimental as of 2025. Research shows fragility concerns.	Experimental

Important caveat: No detector claims 100% accuracy. Mixed human/AI text remains hardest to detect (50–70% accuracy). Adversarial robustness varies widely. The AI detection market is projected to grow from ~$2.3B (2025) to $15B by 2035.

Books

📖

Prompt Engineering

Title	Author(s)	Publisher	Year
Prompt Engineering for LLMs	John Berryman & Albert Ziegler	O'Reilly	2024
Prompt Engineering for Generative AI	James Phoenix & Mike Taylor	O'Reilly	2024
Prompt Engineering for LLMs	Thomas R. Caldwell	Independent	2025

LLM Application Development

Title	Author(s)	Publisher	Year
AI Engineering: Building Applications with Foundation Models	Chip Huyen	O'Reilly	2025
Build a Large Language Model (From Scratch)	Sebastian Raschka	Manning	2024
Building LLMs for Production	Louis-François Bouchard & Louie Peters	O'Reilly	2024
LLM Engineer's Handbook	Paul Iusztin & Maxime Labonne	Packt	2024
The Hundred-Page Language Models Book	Andriy Burkov	Self-Published	2025

AI Agents

Title	Author(s)	Publisher	Year
Building Applications with AI Agents	Michael Albada	O'Reilly	2025
AI Agents and Applications	Roberto Infante	Manning	2025
AI Agents in Action	Micheal Lanham	Manning	2025

Production, Reliability, and Security

Title	Author(s)	Publisher	Year
LLMs in Production	Christopher Brousseau & Matthew Sharp	Manning	2025
Building Reliable AI Systems	Rush Shahani	Manning	2025
The Developer's Playbook for LLM Security	Steve Wilson	O'Reilly	2024

Courses

👩‍🏫

Free Short Courses

ChatGPT Prompt Engineering for Developers — Co-taught by Andrew Ng and OpenAI's Isa Fulford. The foundational starting point. (DeepLearning.AI)
Building Systems with the ChatGPT API — Multi-step LLM system design for production. (DeepLearning.AI)
AI Agents in LangGraph — Agentic dataflows with tool use and research agents. (DeepLearning.AI)
Building Agentic RAG with LlamaIndex — RAG research agent construction. (DeepLearning.AI)
Functions, Tools and Agents with LangChain — Function calling and agent building. (DeepLearning.AI)
Prompt Engineering for Vision Models — Visual prompting techniques. (DeepLearning.AI)

University and Platform Courses

Prompt Engineering Specialization (Vanderbilt) — 3-course series by Dr. Jules White covering foundational to advanced PE. (Coursera)
Generative AI with LLMs (DeepLearning.AI + AWS) — LLM lifecycle, transformers, RLHF, deployment. (Coursera)
Stanford CS336: Language Modeling from Scratch — Build an LLM end-to-end. (Stanford, 2024–2026)
MIT 6.S191: Introduction to Deep Learning — Annual course including LLMs and generative AI. (MIT, 2024–2026)
The Complete Prompt Engineering for AI Bootcamp — Covers GPT-5, DSPy, LangGraph, agent architectures. 58K+ ratings. (Udemy, updated Feb 2026)

Free Platform Courses

Google Prompting Essentials — 5-step prompt design, meta-prompting, Gemini. Under 6 hours.
Microsoft Azure AI Fundamentals: Generative AI — Free learning path covering LLMs, prompts, agents, Azure OpenAI.
Hugging Face LLM Course — Community-driven course covering transformers, fine-tuning, building reasoning models.
Hugging Face AI Agents Course — Agent theory to practice. 100K+ registered students.

Learn Prompting Courses

Tutorials and Guides

📚

Official Provider Guides

OpenAI Prompt Engineering Guide — Comprehensive, covering GPT-4.1/5 prompting, reasoning models, structured outputs, agentic workflows. Continuously updated.
OpenAI GPT-4.1 Prompting Guide [2025] — Structured agent-like prompt design: goal persistence, tool integration, long-context processing.
Anthropic Prompt Engineering Overview — Iterative prompt design, XML tags, chain-of-thought, role assignment. Includes prompt generator.
Anthropic Claude 4 Best Practices [2025–2026] — Parallel tool execution, thinking capabilities, image processing.
Anthropic: Effective Context Engineering for AI Agents [2025] — The evolution from prompt engineering to context engineering: agent state, memory, tools, MCP.
Google Gemini Prompting Strategies — Multimodal prompting for Gemini via Vertex AI and AI Studio.
Microsoft Prompt Engineering in Azure AI Studio — Tool calling, function design, few-shot prompting, prompt chaining.

Community and Independent Guides

Prompt Engineering Guide (DAIR.AI / promptingguide.ai) — Most comprehensive open-source guide. 18+ techniques, model-specific guides, research papers. 3M+ learners. Now includes context engineering.
Learn Prompting (learnprompting.org) — Structured free platform. Beginner to advanced PE, AI security, HackAPrompt competition.
IBM 2026 Guide to Prompt Engineering [2026] — Curated tools, tutorials, real-world examples with Python code.
Anthropic Interactive Tutorial — 9-chapter Jupyter notebook course with hands-on exercises.
Lilian Weng's Prompt Engineering Guide [2023] — Highly respected technical blog from OpenAI researcher.
Google Prompt Engineering Guide (68-page PDF) [2025] — Internal-style best-practice guide for Gemini with concrete patterns.
DigitalOcean: Prompt Engineering Best Practices [2025] — Updated guide summarizing techniques: few-shot, chain-of-thought, role prompting, etc.
Aakash Gupta: Prompt Engineering in 2025 [2025] — Practical guide with wisdom from shipping AI at OpenAI, Shopify, and Google.
Best practices for prompt engineering with OpenAI API — OpenAI's introductory best practices.
OpenAI Cookbook — Official recipes for function calling, RAG, evaluation, and complex workflows.
Microsoft Prompt Engineering Docs — Microsoft's open prompt engineering resources.
DALLE Prompt Book — Visual guide for text-to-image prompting.
Best 100+ Stable Diffusion Prompts — Community-curated image generation prompts.
Vibe Engineering (Manning) — Book by Tomasz Lelek & Artur Skowronski on building software through natural language prompts.

Videos

🎥

Andrej Karpathy: "Deep Dive into LLMs" & "How I Use LLMs" [2024–2025] — Two of the most influential AI videos of 2024–2025. Comprehensive technical deep dive followed by practical usage patterns.
Karpathy: "Software in the Era of AI" (YC AI Startup School) [2025] — Coined "vibe coding" (Feb 2025) and championed "context engineering" (Jun 2025).
Karpathy: Neural Networks: Zero to Hero [2023–2024] — Full lecture series building from backpropagation to GPT.
3Blue1Brown: Neural Networks Series [Updated 2024] — Iconic animated visual explanations of transformers and attention mechanisms. 7M+ subscribers.
AI Explained [2024–2025] — Long-form analysis breaking down papers, model capabilities, and PE developments.
Sam Witteveen [2024–2025] — Practical tutorials on prompt engineering, LangChain, RAG, and agents.
Matthew Berman [2024–2025] — Popular channel covering model releases and practical LLM usage. 600K+ subscribers.
DeepLearning.AI YouTube [2024–2026] — Structured lessons, course previews, and Andrew Ng talks on agents and AI careers.
Lex Fridman Podcast (AI Episodes) [2024–2025] — Long-form interviews with Altman, Hinton, Amodei on LLMs, prompting, and safety.
ICSE 2025: AIware Prompt Engineering Tutorial [2025] — Conference tutorial covering prompt patterns, fragility, anti-patterns, and optimization DSLs.
CMU Advanced NLP 2022: Prompting — Foundational academic lecture on prompting methods.
ChatGPT: 5 Prompt Engineering Secrets For Beginners — Accessible intro for beginners.

Communities

🤝

Discord Servers

Learn Prompting — 40,000+ members. Largest PE Discord with courses, hackathons, HackAPrompt competitions.
PromptsLab Discord - Community
Midjourney — 1M+ members. Primary hub for text-to-image prompt sharing.
OpenAI Discord — Official community with channels for GPTs, Sora, DALL-E, and API help.
Anthropic Discord — Official Claude community for AI development collaboration.
Hugging Face Discord — Model discussions, library support, community events.
FlowGPT — 33K+ members. 100K+ prompts across ChatGPT, DALL-E, Stable Diffusion, Claude.

r/PromptEngineering — Dedicated subreddit for prompt crafting techniques and discussions.
r/ChatGPT — 10M+ members. Primary hub for ChatGPT users and prompt sharing.
r/LocalLLaMA — Highly technical community for running open-source LLMs locally.
r/ClaudeAI — Anthropic's Claude community: prompt sharing, API tips, model comparisons.
r/MachineLearning — Academic-oriented ML research discussions.
r/OpenAI — OpenAI product and API discussions.
r/StableDiffusion — 450K+ members for AI art prompting and workflows.
r/ChatGPTPromptGenius — 35K+ members sharing and refining prompts.

Forums and Platforms

OpenAI Developer Community — Official forum for API help, best practices, project sharing.
Hugging Face Community — Hub for open-source AI collaboration.
DeepLearning.AI Community — Forum for learners discussing courses and AI careers.
LessWrong — In-depth technical posts on AI capabilities and safety.
AI Alignment Forum — Specialized alignment research discussions.
CivitAI — Generative AI creators platform for sharing models, LoRAs, and prompts.

GitHub Organizations

LangChain — Open-source LLM app framework. 100K+ stars.
Promptslab — Generative Models | Prompt-Engineering | LLMs
Hugging Face — Central hub: Transformers, Diffusers, Datasets, TRL.
DSPy (Stanford NLP) — Growing community for systematic prompt optimization.
OpenAI — Open-source models, benchmarks, and tools.

How to Contribute

We welcome contributions to this list! Before contributing, please take a moment to review our contribution guidelines. These guidelines will help ensure that your contributions align with our objectives and meet our standards for quality and relevance.

What we're looking for:

New high-quality papers, tools, or resources with a brief description of why they matter
Updates to existing entries (broken links, outdated information)
Corrections to star counts, pricing, or model details
Translations and accessibility improvements

Quality standards:

All tools should be actively maintained (updated within the last 6 months)
Papers should be from peer-reviewed venues or have significant community adoption
Datasets should be publicly accessible
Please include a one-line description explaining why the resource is valuable

Thank you for your interest in contributing to this project!

_{Maintained by PromptsLab · Star this repo if you find it useful!}

Contents

View Original Source

Related Skills

General

PromptBeginner5 minmarkdown

Untitled Skill

193

Jan 12, 2026

General

PromptBeginner5 minmarkdown

Frontend Typescript Linting.mdc

TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend li...

160

Feb 15, 2026

General

PromptBeginner5 minmarkdown

2. Apply Deepthink Protocol (reason about dependencies

risks

126

Jan 15, 2026

Skill content

Awesome Prompt Engineering 🧙‍♂️

Master Prompt Engineering. Join the Course at https://promptslab.github.io

🚀 Start Here

Table of Contents

Papers

Major Surveys

Prompt Optimization and Automatic Prompting

Prompt Compression

Reasoning Advances

In-Context Learning

Agentic Prompting and Multi-Agent Systems

Multimodal Prompting

Structured Output and Format Control

Prompt Injection and Security

Applications of Prompt Engineering

Text-to-Image Generation

Text-to-Music/Audio Generation

Foundational Papers (Pre-2024)

Tools and Code

Prompt Management and Testing

LLM Evaluation Tools

Agent Frameworks

Prompt Optimization Tools

Red Teaming and Prompt Security

MCP (Model Context Protocol)

Vibe Coding and AI Coding Assistants

Other Notable Repositories

APIs

OpenAI

Anthropic (Claude)

Google (Gemini)

Meta (Llama)

Other Notable Providers

Datasets and Benchmarks

Major Benchmarks (2024–2026)

Leaderboards and Meta-Benchmarks

Prompt and Instruction Datasets

Red Teaming and Adversarial Datasets

Models

Frontier Models (2025–2026)

Reasoning Models

Notable Open-Source Models

Code-Specialized Models

Foundational Models (Historical Reference)

AI Content Detectors

Leading Commercial Detectors

Free and Research Detectors

Watermarking Approaches

Books

Prompt Engineering

LLM Application Development

AI Agents

Production, Reliability, and Security

Courses

Free Short Courses

University and Platform Courses

Free Platform Courses

Learn Prompting Courses

Tutorials and Guides

Official Provider Guides

Community and Independent Guides

Videos

Communities

Discord Servers

Reddit

Forums and Platforms

GitHub Organizations

How to Contribute

Related Skills

Untitled Skill

Frontend Typescript Linting.mdc

2. Apply Deepthink Protocol (reason about dependencies

`Master Prompt Engineering. Join the Course at https://promptslab.github.io`