base on A general fine-tuning kit geared toward image/video/audio diffusion models. # SimpleTuner πŸ’Ή > ℹ️ No data is sent to any third parties except through opt-in flag `report_to`, `push_to_hub`, or webhooks which must be manually configured. **SimpleTuner** is geared towards simplicity, with a focus on making the code easily understood. This codebase serves as a shared academic exercise, and contributions are welcome. If you'd like to join our community, we can be found [on Discord](https://discord.gg/JGkSwEbjRb) via Terminus Research Group. If you have any questions, please feel free to reach out to us there. <img width="1944" height="1657" alt="image" src="https://github.com/user-attachments/assets/af3a24ec-7347-4ddf-8edf-99818a246de1" /> ## Table of Contents - [Design Philosophy](#design-philosophy) - [Tutorial](#tutorial) - [Features](#features) - [Core Training Features](#core-training-features) - [Model Architecture Support](#model-architecture-support) - [Advanced Training Techniques](#advanced-training-techniques) - [Model-Specific Features](#model-specific-features) - [Quickstart Guides](#quickstart-guides) - [Hardware Requirements](#hardware-requirements) - [Toolkit](#toolkit) - [Setup](#setup) - [Troubleshooting](#troubleshooting) ## Design Philosophy - **Simplicity**: Aiming to have good default settings for most use cases, so less tinkering is required. - **Versatility**: Designed to handle a wide range of image quantities - from small datasets to extensive collections. - **Cutting-Edge Features**: Only incorporates features that have proven efficacy, avoiding the addition of untested options. ## Tutorial Please fully explore this README before embarking on the [new web UI tutorial](/documentation/webui/TUTORIAL.md) or [the class command-line tutorial](/documentation/TUTORIAL.md), as this document contains vital information that you might need to know first. For a manually configured quick start without reading the full documentation or using any web interfaces, you can use the [Quick Start](/documentation/QUICKSTART.md) guide. For memory-constrained systems, see the [DeepSpeed document](/documentation/DEEPSPEED.md) which explains how to use πŸ€—Accelerate to configure Microsoft's DeepSpeed for optimiser state offload. For DTensor-based sharding and context parallelism, read the [FSDP2 guide](/documentation/FSDP2.md) which covers the new FullyShardedDataParallel v2 workflow inside SimpleTuner. For multi-node distributed training, [this guide](/documentation/DISTRIBUTED.md) will help tweak the configurations from the INSTALL and Quickstart guides to be suitable for multi-node training, and optimising for image datasets numbering in the billions of samples. --- ## Features SimpleTuner provides comprehensive training support across multiple diffusion model architectures with consistent feature availability: ### Core Training Features - **User-friendly web UI** - Manage your entire training lifecycle through a sleek dashboard - **Multi-modal training** - Unified pipeline for **Image, Video, and Audio** generative models - **Multi-GPU training** - Distributed training across multiple GPUs with automatic optimization - **Advanced caching** - Image, video, audio, and caption embeddings cached to disk for faster training - **Aspect bucketing** - Support for varied image/video sizes and aspect ratios - **Concept sliders** - Slider-friendly targeting for LoRA/LyCORIS/full (via LyCORIS `full`) with positive/negative/neutral sampling and per-prompt strength; see [Slider LoRA guide](/documentation/SLIDER_LORA.md) - **Memory optimization** - Most models trainable on 24G GPU, many on 16G with optimizations - **DeepSpeed & FSDP2 integration** - Train large models on smaller GPUs with optim/grad/parameter sharding, context parallel attention, gradient checkpointing, and optimizer state offload - **S3 training** - Train directly from cloud storage (Cloudflare R2, Wasabi S3) - **EMA support** - Exponential moving average weights for improved stability and quality - **Custom experiment trackers** - Drop an `accelerate.GeneralTracker` into `simpletuner/custom-trackers` and use `--report_to=custom-tracker --custom_tracker=<name>` - **Custom experiment trackers** - Drop an `accelerate.GeneralTracker` into `simpletuner/custom-trackers` and use `--report_to=custom-tracker --custom_tracker=<name>` ### Model Architecture Support | Model | Parameters | PEFT LoRA | Lycoris | Full-Rank | ControlNet | Quantization | Flow Matching | Text Encoders | |-------|------------|-----------|---------|-----------|------------|--------------|---------------|---------------| | **Stable Diffusion XL** | 3.5B | βœ“ | βœ“ | βœ“ | βœ“ | int8/nf4 | βœ— | CLIP-L/G | | **Stable Diffusion 3** | 2B-8B | βœ“ | βœ“ | βœ“* | βœ“ | int8/fp8/nf4 | βœ“ | CLIP-L/G + T5-XXL | | **Flux.1** | 12B | βœ“ | βœ“ | βœ“* | βœ“ | int8/fp8/nf4 | βœ“ | CLIP-L + T5-XXL | | **Flux.2** | 32B | βœ“ | βœ“ | βœ“* | βœ— | int8/fp8/nf4 | βœ“ | Mistral-3 Small | | **ACE-Step** | 3.5B | βœ“ | βœ“ | βœ“* | βœ— | int8 | βœ“ | UMT5 | | **Chroma 1** | 8.9B | βœ“ | βœ“ | βœ“* | βœ— | int8/fp8/nf4 | βœ“ | T5-XXL | | **Auraflow** | 6.8B | βœ“ | βœ“ | βœ“* | βœ“ | int8/fp8/nf4 | βœ“ | UMT5-XXL | | **PixArt Sigma** | 0.6B-0.9B | βœ— | βœ“ | βœ“ | βœ“ | int8 | βœ— | T5-XXL | | **Sana** | 0.6B-4.8B | βœ— | βœ“ | βœ“ | βœ— | int8 | βœ“ | Gemma2-2B | | **Lumina2** | 2B | βœ“ | βœ“ | βœ“ | βœ— | int8 | βœ“ | Gemma2 | | **Kwai Kolors** | 5B | βœ“ | βœ“ | βœ“ | βœ— | βœ— | βœ— | ChatGLM-6B | | **LTX Video** | 5B | βœ“ | βœ“ | βœ“ | βœ— | int8/fp8 | βœ“ | T5-XXL | | **Wan Video** | 1.3B-14B | βœ“ | βœ“ | βœ“* | βœ— | int8 | βœ“ | UMT5 | | **HiDream** | 17B (8.5B MoE) | βœ“ | βœ“ | βœ“* | βœ“ | int8/fp8/nf4 | βœ“ | CLIP-L + T5-XXL + Llama | | **Cosmos2** | 2B-14B | βœ— | βœ“ | βœ“ | βœ— | int8 | βœ“ | T5-XXL | | **OmniGen** | 3.8B | βœ“ | βœ“ | βœ“ | βœ— | int8/fp8 | βœ“ | T5-XXL | | **Qwen Image** | 20B | βœ“ | βœ“ | βœ“* | βœ— | int8/nf4 (req.) | βœ“ | T5-XXL | | **SD 1.x/2.x (Legacy)** | 0.9B | βœ“ | βœ“ | βœ“ | βœ“ | int8/nf4 | βœ— | CLIP-L | *βœ“ = Supported, βœ— = Not supported, * = Requires DeepSpeed for full-rank training* ### Advanced Training Techniques - **TREAD** - Token-wise dropout for transformer models, including Kontext training - **Masked loss training** - Superior convergence with segmentation/depth guidance - **Prior regularization** - Enhanced training stability for character consistency - **Gradient checkpointing** - Configurable intervals for memory/speed optimization - **Loss functions** - L2, Huber, Smooth L1 with scheduling support - **SNR weighting** - Min-SNR gamma weighting for improved training dynamics - **Group offloading** - Diffusers v0.33+ module-group CPU/disk staging with optional CUDA streams - **Validation adapter sweeps** - Temporarily attach LoRA adapters (single or JSON presets) during validation to measure adapter-only or comparison renders without touching the training loop - **External validation hooks** - Swap the built-in validation pipeline or post-upload steps for your own scripts, so you can run checks on another GPU or forward artifacts to any cloud provider of your choice ([details](/documentation/OPTIONS.md#validation_method)) - **CREPA regularization** - Cross-frame representation alignment for video DiTs ([guide](/documentation/experimental/VIDEO_CREPA.md)) - **LoRA I/O formats** - Load/save PEFT LoRAs in standard Diffusers layout or ComfyUI-style `diffusion_model.*` keys (Flux/Flux2/Lumina2/Z-Image auto-detect ComfyUI inputs) ### Model-Specific Features - **Flux Kontext** - Edit conditioning and image-to-image training for Flux models - **PixArt two-stage** - eDiff training pipeline support for PixArt Sigma - **Flow matching models** - Advanced scheduling with beta/uniform distributions - **HiDream MoE** - Mixture of Experts gate loss augmentation - **T5 masked training** - Enhanced fine details for Flux and compatible models - **QKV fusion** - Memory and speed optimizations (Flux, Lumina2) - **TREAD integration** - Selective token routing for most models - **Wan 2.x I2V** - High/low stage presets plus a 2.1 time-embedding fallback (see Wan quickstart) - **Classifier-free guidance** - Optional CFG reintroduction for distilled models ### Quickstart Guides Detailed quickstart guides are available for all supported models: - **[TwinFlow Few-Step (RCGM) Guide](/documentation/distillation/TWINFLOW.md)** - Enable RCGM auxiliary loss for few-step/one-step generation (flow models or diffusion via diff2flow) - **[Flux.1 Guide](/documentation/quickstart/FLUX.md)** - Includes Kontext editing support and QKV fusion - **[Flux.2 Guide](/documentation/quickstart/FLUX2.md)** - **NEW!** Latest enormous Flux model with Mistral-3 text encoder - **[Z-Image Guide](/documentation/quickstart/ZIMAGE.md)** - Base/Turbo LoRA with assistant adapter + TREAD acceleration - **[ACE-Step Guide](/documentation/quickstart/ACE_STEP.md)** - **NEW!** Audio generation model training (text-to-music) - **[Chroma Guide](/documentation/quickstart/CHROMA.md)** - Lodestone's flow-matching transformer with Chroma-specific schedules - **[Stable Diffusion 3 Guide](/documentation/quickstart/SD3.md)** - Full and LoRA training with ControlNet - **[Stable Diffusion XL Guide](/documentation/quickstart/SDXL.md)** - Complete SDXL training pipeline - **[Auraflow Guide](/documentation/quickstart/AURAFLOW.md)** - Flow-matching model training - **[PixArt Sigma Guide](/documentation/quickstart/SIGMA.md)** - DiT model with two-stage support - **[Sana Guide](/documentation/quickstart/SANA.md)** - Lightweight flow-matching model - **[Lumina2 Guide](/documentation/quickstart/LUMINA2.md)** - 2B parameter flow-matching model - **[Kwai Kolors Guide](/documentation/quickstart/KOLORS.md)** - SDXL-based with ChatGLM encoder - **[LongCat-Video Guide](/documentation/quickstart/LONGCAT_VIDEO.md)** - Flow-matching text-to-video and image-to-video with Qwen-2.5-VL - **[LongCat-Video Edit Guide](/documentation/quickstart/LONGCAT_VIDEO_EDIT.md)** - Conditioning-first flavour (image-to-video) - **[LongCat-Image Guide](/documentation/quickstart/LONGCAT_IMAGE.md)** - 6B bilingual flow-matching model with Qwen-2.5-VL encoder - **[LongCat-Image Edit Guide](/documentation/quickstart/LONGCAT_EDIT.md)** - Image editing flavour requiring reference latents - **[LTX Video Guide](/documentation/quickstart/LTXVIDEO.md)** - Video diffusion training - **[Hunyuan Video 1.5 Guide](/documentation/quickstart/HUNYUANVIDEO.md)** - 8.3B flow-matching T2V/I2V with SR stages - **[Wan Video Guide](/documentation/quickstart/WAN.md)** - Video flow-matching with TREAD support - **[HiDream Guide](/documentation/quickstart/HIDREAM.md)** - MoE model with advanced features - **[Cosmos2 Guide](/documentation/quickstart/COSMOS2IMAGE.md)** - Multi-modal image generation - **[OmniGen Guide](/documentation/quickstart/OMNIGEN.md)** - Unified image generation model - **[Qwen Image Guide](/documentation/quickstart/QWEN_IMAGE.md)** - 20B parameter large-scale training - **[Stable Cascade Stage C Guide](/quickstart/STABLE_CASCADE_C.md)** - Prior LoRAs with combined prior+decoder validation - **[Kandinsky 5.0 Image Guide](/documentation/quickstart/KANDINSKY5_IMAGE.md)** - Image generation with Qwen2.5-VL + Flux VAE - **[Kandinsky 5.0 Video Guide](/documentation/quickstart/KANDINSKY5_VIDEO.md)** - Video generation with HunyuanVideo VAE --- ## Hardware Requirements ### General Requirements - **NVIDIA**: RTX 3080+ recommended (tested up to H200) - **AMD**: 7900 XTX 24GB and MI300X verified (higher memory usage vs NVIDIA) - **Apple**: M3 Max+ with 24GB+ unified memory for LoRA training ### Memory Guidelines by Model Size - **Large models (12B+)**: A100-80G for full-rank, 24G+ for LoRA/Lycoris - **Medium models (2B-8B)**: 16G+ for LoRA, 40G+ for full-rank training - **Small models (<2B)**: 12G+ sufficient for most training types **Note**: Quantization (int8/fp8/nf4) significantly reduces memory requirements. See individual [quickstart guides](#quickstart-guides) for model-specific requirements. ## Setup SimpleTuner can be installed via pip for most users: ```bash # Base installation (CPU-only PyTorch) pip install simpletuner # CUDA users (NVIDIA GPUs) pip install simpletuner[cuda] # ROCm users (AMD GPUs) pip install simpletuner[rocm] # Apple Silicon users (M1/M2/M3/M4 Macs) pip install simpletuner[apple] ``` For manual installation or development setup, see the [installation documentation](/documentation/INSTALL.md). ## Troubleshooting Enable debug logs for a more detailed insight by adding `export SIMPLETUNER_LOG_LEVEL=DEBUG` to your environment (`config/config.env`) file. For performance analysis of the training loop, setting `SIMPLETUNER_TRAINING_LOOP_LOG_LEVEL=DEBUG` will have timestamps that highlight any issues in your configuration. For a comprehensive list of options available, consult [this documentation](/documentation/OPTIONS.md). ", Assign "at most 3 tags" to the expected json: {"id":"11388","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"