[🏠Home](README.md)

text2image:

PublishedFeb 1, 2026

Loading actions...

5 minBeginnerpromptSingle file

Skill content

Main instructions and any bundled files for this skill.

markdown

🏠Home

Image Generation

Models

text2image:

karlo text2image model
DeepFloyd if by StabilityAI open-source text-to-image model with photorealism and language understanding. code
Kandinsky multilingual text2image latent diffusion model
stable diffusion 1.5
stable diffusion 2.0
stable diffusion 2.1
stable diffusion xl (SDXL) base 0.9 & refinder 0.9
AnimateDiff Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
PixArt-alpha Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis, paper
Latent Consistency Models LoRAs for high quality few step image generation
OnnxStream Stable Diffusion XL 1.0 Base with 298MB of RAM
StreamDiffusion A Pipeline-Level Solution for Real-Time Interactive Generation
AnyText Code and Model for a diffusion pipeline covering a latent module and text embedding to generate and manipulate text in images
InstantID Zero-shot Identity-Preserving Generation in Seconds, ComfyUI plugin
PhotoMaker Rapid customization within seconds, with no additional LoRA training preserving ID with high fidelity and text controllability which can serve as an adapter for other models
StableCascade successor to Stable Diffusion by Stability AI with smaller latent space, higher speeds and better quality
IDM-VTON Virtual Try-on for clothes and fashion
ConsistentID Portrait Generation with Multimodal Fine-Grained Identity Preservation
Flux Black Forrest Labs consisting of ex stabilityAi staff built a SOTA text-to-image model Flux and Flux schnell, a 13B parameter transformer capable of writing text, following complex prompts released under apache 2 license
Lumina-mGPT multimodal autoregressive LLMs capable of generating flexible and photorealistic images from text descriptions

text to 3d:

OpenAI shap-E a text/image to 3D model
shap-e local run text-to-3d locally
stable-dreamfusion A PyTorch implementation of the text-to-3D model Dreamfusion using the Stable Diffusion text-to-2D model

image to 3d:

Wonder3D A cross-domain diffusion model for 3D reconstruction from a single image
DreamCraft3D Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Spann3R is a transformer-based model for dense 3D reconstruction from images, with spatial memory to track and predict 3D structures and capable of real-time processing

image to text (OCR):

pix2tex LaTeX OCR

other:

facebookresearch/segment-anything image segmentation
- YOLOv8 SOTA object detection, segmentation, classification and tracking
- DINOv2 1B-parameter ViT model to generate robust all-purpose visual features that outperform OpenCLIP benchmarks at image and pixel levels
- segment-anything-fast A batched offline inference oriented version of segment-anything
Final2x Image super-resolution through interpolation supporting multiple models like RealCUGAN, ESRGAN, Waifu2x, SRMD
text-to-room text to room
DragGAN Interactive Point-based Manipulation on Generative Images, demo
DragDiffusion Harnessing Diffusion Models for Interactive Point-based Image Editing
HQTrack Tracking Anything in High Quality (HQTrack) is a framework for high performance video object tracking and segmentation
CoTracker It is Better to Track Together. A fast transformer-based model that can track any point in a video
ZeroNVS Zero shot 460 degree view synthesis from single images
x-stable-diffusion Real-time inference for Stable Diffusion - 0.88s latency
Depth-Anything Better depth estimation including a ControlNet for ComfyUI and ONNX and TensorRT versions
SUPIR Super Resolution and Image Restoration
RMBG BRIA Background Removal model hf demo space

Wrappers & GUIs

ComfyUI powerful and modular stable diffusion pipelines using a graph/nodes/flowchart based interface, runs SDXL 0.9, SD2.1, SD2.0, SD1.5
- ComfyUI-Manager installs missing custom nodes automatically
- SeargeSDXL Custom SDXL Node for easier SDXL usage and img2img workflow that utilizes base & refiner
- Sytan ComfyUI SDXL workflow with txt2img using base and refiner
Automatic1111/stable-diffusion-webui well known UI for Stable Diffusion
- sd-webui-cloud-inference extension via omniinfer.io
- stable-diffusion-webui-forge platform on top of SDWebUI to make development easier, optimize resource management, and speed up inference
SD.Next vladmandic/automatic Fork, seemingly more active development efforts compared to automatic1111's original repo
Fooocus Midjourney alike GUI for SDXL to focus on prompting and generating
- RuinedFooocus A Fooocus fork
- Fooocus-MRE A Fooocus fork
stable-diffusion-xl-demo runs SDXL 0.9 in a basic interface
imaginAIry a Stable Diffusion UI
InvokeAI Alternative, polished stable diffusion UI with less features than automatic1111
mlc-ai/web-stable-diffusion
anapnoe/stable-diffusion-webui-ux Redesigned from automatic1111's UI, adding mobile and desktop layouts and UX improvements
refacer One-Click Deepfake Multi-Face Swap Tool
stable-diffusion.cpp CPU inference of Stable Diffusion in pure C/C++ with huge performance gains, supporting ggml, 16/32 bit float, 4/5/8 bit quantization, AVX/AVX2/AVX512, SD1.x, SD2.x, txt2img/img2img
FaceFusion Next generation face swapper and enhancer
OneFlow Backend for diffusers and ComfyUI
StabilityMatrix is a portable package manager and UI for GUIs like Forge, SD.Next, ComfyUI and more, supporting multiple packages, offering built-in Git and Python dependencies, and features like syntax highlighting, workspace management, and model browsing
OneDiff is a PyTorch-based acceleration library for diffusion models, offering out-of-the-box speedups, GPU optimization, and broad model and NVIDIA GPU support

Fine Tuning

https://github.com/JoePenna/Dreambooth-Stable-Diffusion
fast-stable-diffusion TheLastBen's Repo for SD, SDXL fine-tuning and DreamBooth on RunPod, Paperspace, Colab and others
https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth
https://github.com/cloneofsimo/lora
OneTrainer all in one training for SD, SDXL and inpainting models supporting fine-tuning, LoRA, embeddings
sd-scripts by kohya-ss
- LoRA Easy Training Scripts GUI for Kohya's Scripts
- Kohya_ss Windows-focused Gradio GUI for Kohya's Stable Diffusion trainers, experimental sdxl support, reddit thread
Fine tuning concepts explained visually
text2image-gui a Stable Diffusion GUI by NMKD
sd-webui-EasyPhoto / easyphoto plugin for generating AI portraits that can be used to train digital doppelgangers with 5-10 photos and a quick LoRA fine tune, paper
StableTuner Windows GUI for Finetuning / Dreambooth Stable Diffusion models (abandoned)
SimpleTuner fine-tuning for StableDiffusion, PixArt, Flux with LoRA and full U-Net training, multi GPU support, DeepSpeed
x-flux LoRA and ControlNet training scripts for Flux model by Black Forest Labs using DeepSpeed
ai-toolkit Flux LoRA training on local and runpod

Research

Speed Is All You Need up to 50% speed increase for Latent Diffusion Models
ORCa converts glossy objects into radiance-field cameras, enabling depth estimation and novel-view synthesis, project, code
cocktail Mixing Multi-Modality Controls for Text-Conditional Image Generation, project, code
SnapFusion Fast text-to-image diffusion on mobile phones in 2 seconds
Objaverse-xl dataset of 10 million annotated high quality 3D objects, hf
LightGlue Local Feature Matching at Light Speed, a lightweight feature matcher with high accuracy and blazing fast inference. It takes as input a set of keypoints and descriptors for each image and returns the indices of corresponding points
ml-mgie Guiding Instruction-based Image Editing via Multimodal Large Language Models
VAR GPT beats diffusion
InstantStyle towards Style-Preserving in Text-to-Image Generation

Contents

Prompt Playground

3 Variables

Fill Variables

SUPIR

RMBG

VAR

Preview

[🏠Home](README.md)

# Image Generation

## Models

 text2image:

- [karlo](https://github.com/kakaobrain/karlo) text2image model
- [DeepFloyd if by StabilityAI](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0) open-source text-to-image model with photorealism and language understanding. [code](https://github.com/deep-floyd/IF)
- [Kandinsky](https://github.com/ai-forever/Kandinsky-2) multilingual text2image latent diffusion model
- [stable diffusion 1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5)
- [stable diffusion 2.0](https://huggingface.co/stabilityai/stable-diffusion-2)
- [stable diffusion 2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1)
- stable diffusion xl (SDXL) [base 0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) & [refinder 0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9)
- [AnimateDiff](https://github.com/guoyww/AnimateDiff) Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
- [PixArt-alpha](https://github.com/PixArt-alpha/PixArt-alpha) Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis, [paper](https://arxiv.org/abs/2310.00426)
- [Latent Consistency Models](https://github.com/luosiallen/latent-consistency-model) LoRAs for high quality few step image generation
- [OnnxStream](https://github.com/vitoplantamura/OnnxStream) Stable Diffusion XL 1.0 Base with 298MB of RAM
- [StreamDiffusion](https://github.com/cumulo-autumn/streamdiffusion) A Pipeline-Level Solution for Real-Time Interactive Generation
- [AnyText](https://github.com/tyxsspa/AnyText) Code and Model for a diffusion pipeline covering a latent module and text embedding to generate and manipulate text in images
- [InstantID](https://github.com/InstantID/InstantID) Zero-shot Identity-Preserving Generation in Seconds, [ComfyUI plugin](https://github.com/ZHO-ZHO-ZHO/ComfyUI-InstantID)
- [PhotoMaker](https://github.com/TencentARC/PhotoMaker) Rapid customization within seconds, with no additional LoRA training preserving ID with high fidelity and text controllability which can serve as an adapter for other models
- [StableCascade](https://github.com/Stability-AI/StableCascade) successor to Stable Diffusion by Stability AI with smaller latent space, higher speeds and better quality
- [IDM-VTON](https://github.com/yisol/IDM-VTON) Virtual Try-on for clothes and fashion
- [ConsistentID](https://github.com/JackAILab/ConsistentID) Portrait Generation with Multimodal Fine-Grained Identity Preservation
- [Flux](https://huggingface.co/black-forest-labs) Black Forrest Labs consisting of ex stabilityAi staff built a SOTA text-to-image model Flux and Flux schnell, a 13B parameter transformer capable of writing text, following complex prompts released under apache 2 license
- [Lumina-mGPT](https://github.com/Alpha-VLLM/Lumina-mGPT) multimodal autoregressive LLMs capable of generating flexible and photorealistic images from text descriptions 

 text to 3d:

- [OpenAI shap-E](https://github.com/openai/shap-e) a text/image to 3D model
- [shap-e local](https://github.com/kedzkiest/shap-e-local) run text-to-3d locally
- [stable-dreamfusion](https://github.com/ashawkey/stable-dreamfusion) A PyTorch implementation of the text-to-3D model Dreamfusion using the Stable Diffusion text-to-2D model

 image to 3d:

- [Wonder3D](https://github.com/xxlong0/Wonder3D) A cross-domain diffusion model for 3D reconstruction from a single image
- [DreamCraft3D](https://github.com/deepseek-ai/DreamCraft3D) Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
- [Spann3R](https://github.com/HengyiWang/spann3r) is a transformer-based model for dense 3D reconstruction from images, with spatial memory to track and predict 3D structures and capable of real-time processing

 image to text (OCR):

- [pix2tex](https://github.com/lukas-blecher/LaTeX-OCR) LaTeX OCR

other:

- [facebookresearch/segment-anything](https://github.com/facebookresearch/segment-anything) image segmentation
  - [YOLOv8](https://github.com/ultralytics/ultralytics) SOTA object detection, segmentation, classification and tracking
  - [DINOv2](https://github.com/facebookresearch/dinov2) 1B-parameter ViT model to generate robust all-purpose visual features that outperform OpenCLIP benchmarks at image and pixel levels
  - [segment-anything-fast](https://github.com/pytorch-labs/segment-anything-fast) A batched offline inference oriented version of segment-anything
- [Final2x](https://github.com/Tohrusky/Final2x) Image super-resolution through interpolation supporting multiple models like RealCUGAN, ESRGAN, Waifu2x, SRMD
- [text-to-room](https://lukashoel.github.io/text-to-room/) text to room
- [DragGAN](https://github.com/XingangPan/DragGAN) Interactive Point-based Manipulation on Generative Images, [demo](https://vcai.mpi-inf.mpg.de/projects/DragGAN/)
- [DragDiffusion](https://github.com/Yujun-Shi/DragDiffusion) Harnessing Diffusion Models for Interactive Point-based Image Editing
- [HQTrack](https://github.com/jiawen-zhu/hqtrack) Tracking Anything in High Quality (HQTrack) is a framework for high performance video object tracking and segmentation
- [CoTracker](https://github.com/facebookresearch/co-tracker) It is Better to Track Together. A fast transformer-based model that can track any point in a video
- [ZeroNVS](https://arxiv.org/pdf/2310.17994.pdf) Zero shot 460 degree view synthesis from single images
- [x-stable-diffusion](https://github.com/stochasticai/x-stable-diffusion) Real-time inference for Stable Diffusion - 0.88s latency
- [Depth-Anything](https://github.com/LiheYoung/Depth-Anything) Better depth estimation including a ControlNet for ComfyUI and ONNX and TensorRT versions
- [SUPIR](https://github.com/Fanghua-Yu/SUPIR) Super Resolution and Image Restoration
- [RMBG](https://huggingface.co/briaai/RMBG-2.0) BRIA Background Removal model [hf demo space](https://huggingface.co/spaces/briaai/BRIA-RMBG-2.0)

## Wrappers & GUIs

- [ComfyUI](https://github.com/comfyanonymous/ComfyUI) powerful and modular stable diffusion pipelines using a graph/nodes/flowchart based interface, runs SDXL 0.9, SD2.1, SD2.0, SD1.5
  - [ComfyUI-Manager](https://github.com/ltdrdata/ComfyUI-Manager) installs missing custom nodes automatically
  - [SeargeSDXL](https://github.com/SeargeDP/SeargeSDXL) Custom SDXL Node for easier SDXL usage and img2img workflow that utilizes base & refiner
  - [Sytan ComfyUI SDXL workflow](https://github.com/SytanSD/Sytan-SDXL-ComfyUI/tree/main) with txt2img using base and refiner
- [Automatic1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) well known UI for Stable Diffusion
  - [sd-webui-cloud-inference](https://github.com/omniinfer/sd-webui-cloud-inference) extension via omniinfer.io
  - [stable-diffusion-webui-forge](https://github.com/lllyasviel/stable-diffusion-webui-forge) platform on top of SDWebUI to make development easier, optimize resource management, and speed up inference
- [SD.Next](https://github.com/vladmandic/automatic) vladmandic/automatic Fork, seemingly more active development efforts compared to automatic1111's original repo
- [Fooocus](https://github.com/lllyasviel/Fooocus) Midjourney alike GUI for SDXL to focus on prompting and generating
  - [RuinedFooocus](https://github.com/runew0lf/RuinedFooocus) A Fooocus fork
  - [Fooocus-MRE](https://github.com/MoonRide303/Fooocus-MRE) A Fooocus fork
- [stable-diffusion-xl-demo](https://github.com/FurkanGozukara/stable-diffusion-xl-demo) runs SDXL 0.9 in a basic interface
- [imaginAIry](https://github.com/brycedrennan/imaginAIry/blob/master/README.md) a Stable Diffusion UI
- [InvokeAI](https://github.com/invoke-ai/InvokeAI)  Alternative, polished stable diffusion UI with less features than automatic1111
- [mlc-ai/web-stable-diffusion](https://github.com/mlc-ai/web-stable-diffusion)
- [anapnoe/stable-diffusion-webui-ux](https://github.com/anapnoe/stable-diffusion-webui-ux) Redesigned from automatic1111's UI, adding mobile and desktop layouts and UX improvements
- [refacer](https://github.com/xaviviro/refacer) One-Click Deepfake Multi-Face Swap Tool
- [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) CPU inference of Stable Diffusion in pure C/C++ with huge performance gains, supporting ggml, 16/32 bit float, 4/5/8 bit quantization, AVX/AVX2/AVX512, SD1.x, SD2.x, txt2img/img2img
- [FaceFusion](https://github.com/facefusion/facefusion) Next generation face swapper and enhancer
- [OneFlow](https://github.com/Oneflow-Inc/diffusers) Backend for diffusers and ComfyUI
- [StabilityMatrix](https://github.com/LykosAI/StabilityMatrix) is a portable package manager and UI for GUIs like Forge, SD.Next, ComfyUI and more, supporting multiple packages, offering built-in Git and Python dependencies, and features like syntax highlighting, workspace management, and model browsing
- [OneDiff](https://github.com/siliconflow/onediff) is a PyTorch-based acceleration library for diffusion models, offering out-of-the-box speedups, GPU optimization, and broad model and NVIDIA GPU support

## Fine Tuning

- https://github.com/JoePenna/Dreambooth-Stable-Diffusion
- [fast-stable-diffusion](https://github.com/TheLastBen/fast-stable-diffusion) TheLastBen's Repo for SD, SDXL fine-tuning and DreamBooth on RunPod, Paperspace, Colab and others
- https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth
- https://github.com/cloneofsimo/lora
- [OneTrainer](https://github.com/Nerogar/OneTrainer) all in one training for SD, SDXL and inpainting models supporting fine-tuning, LoRA, embeddings
- [sd-scripts](https://github.com/kohya-ss/sd-scripts) by kohya-ss
  - [LoRA Easy Training Scripts](https://github.com/derrian-distro/LoRA_Easy_Training_Scripts) GUI for Kohya's Scripts
  - [Kohya_ss](https://github.com/bmaltais/kohya_ss) Windows-focused Gradio GUI for Kohya's Stable Diffusion trainers, [experimental](https://github.com/bmaltais/kohya_ss/tree/sdxl) sdxl support, [reddit thread](https://www.reddit.com/r/StableDiffusion/comments/14xhpxm/dreambooth_sdxl_09/)
- [Fine tuning concepts explained visually](https://github.com/cloneofsimo/lora/discussions/67)
- [text2image-gui](https://github.com/n00mkrad/text2image-gui) a Stable Diffusion GUI by NMKD
- [sd-webui-EasyPhoto](https://github.com/aigc-apps/sd-webui-EasyPhoto) / [easyphoto](https://github.com/aigc-apps/easyphoto) plugin for generating AI portraits that can be used to train digital doppelgangers with 5-10 photos and a quick LoRA fine tune, [paper](https://arxiv.org/abs/2310.04672v1)
- [StableTuner](https://github.com/devilismyfriend/StableTuner) Windows GUI for Finetuning / Dreambooth Stable Diffusion models (abandoned)
- [SimpleTuner](https://github.com/bghira/SimpleTuner) fine-tuning for StableDiffusion, PixArt, Flux with LoRA and full U-Net training, multi GPU support, DeepSpeed
- [x-flux](https://github.com/XLabs-AI/x-flux) LoRA and ControlNet training scripts for Flux model by Black Forest Labs using DeepSpeed
- [ai-toolkit](https://github.com/ostris/ai-toolkit) Flux LoRA training on local and runpod

## Research

- [Speed Is All You Need](https://arxiv.org/abs/2304.11267) up to 50% speed increase for Latent Diffusion Models
- [ORCa](https://arxiv.org/abs/2212.04531) converts glossy objects into radiance-field cameras, enabling depth estimation and novel-view synthesis, [project](https://ktiwary2.github.io/objectsascam/), [code](https://github.com/ktiwary2/orca)
- [cocktail](https://mhh0318.github.io/cocktail/) Mixing Multi-Modality Controls for Text-Conditional Image Generation, [project](https://mhh0318.github.io/cocktail/), [code](https://github.com/mhh0318/Cocktail)
- [SnapFusion](https://snap-research.github.io/SnapFusion/) Fast text-to-image diffusion on mobile phones in 2 seconds
- [Objaverse-xl](https://objaverse.allenai.org/objaverse-xl-paper.pdf) dataset of 10 million annotated high quality 3D objects, [hf](https://huggingface.co/datasets/allenai/objaverse)
- [LightGlue](https://github.com/cvg/LightGlue) Local Feature Matching at Light Speed, a lightweight feature matcher with high accuracy and blazing fast inference. It takes as input a set of keypoints and descriptors for each image and returns the indices of corresponding points
- [ml-mgie](https://github.com/apple/ml-mgie) Guiding Instruction-based Image Editing via Multimodal Large Language Models
- [VAR](https://github.com/FoundationVision/VAR) GPT beats diffusion
- [InstantStyle](https://github.com/InstantStyle/InstantStyle) towards Style-Preserving in Text-to-Image Generation
-

View Original Source

Related Skills

General

PromptBeginner5 minmarkdown

Untitled Skill

193

Jan 12, 2026

General

PromptBeginner5 minmarkdown

Frontend Typescript Linting.mdc

TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend li...

160

Feb 15, 2026

General

PromptBeginner5 minmarkdown

2. Apply Deepthink Protocol (reason about dependencies

risks

126

Jan 15, 2026