Trendshift - Ask AI

base on [NeurIPS 2024] Official implementation of "Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models" # 🚀 [NeurIPS 2024] Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models <div align="center"> <img src="./doc/demo.jpg" alt="demo" style="zoom:150%;" /> <br> <em> Our approach can easily be combined with various diffusion model-based tasks 🧠 (such as text-to-image, personalized generation, video generation, etc.) and various sampling strategies (like DDIM-50 steps, Dpm-solver-20 steps) to achieve training-free acceleration. </em> </div> ## 🔥Stellar Features + 🎯 Training free acceleration, plug-and-play; + 🎯 Supports popular text-to-image models such as stable-diffusion, deepfloyd-if, and Civitai community models like [Realistic Vision V6.0](https://civitai.com/models/4201/realistic-vision-v60-b1), [ReV Animated](https://civitai.com/models/7371), as well as ControlNet; + 🎯 Compatible with various schedulers and timesteps, such as DDIM (50 steps), Dpm-solver++ (20 steps), and more; ## 📘 Introduction > **Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models** > > [Senmao Li](https://github.com/sen-mao)\*, [Taihang Hu](https://github.com/hutaiHang)\*, [Fahad Khan](https://sites.google.com/view/fahadkhans/home), [Linxuan Li](https://github.com/Potato-lover), [Shiqi Yang](https://www.shiqiyang.xyz/), [Yaxing Wang](https://yaxingwang.netlify.app/author/yaxing-wang/), [Ming-Ming Cheng](https://mmcheng.net/), [Jian Yang](https://scholar.google.com.hk/citations?user=6CIDtZQAAAAJ&hl=en) > > 📚[arXiv](https://arxiv.org/abs/2312.09608) 🌈[Project Page](https://sen-mao.github.io/FasterDiffusion/) 🚩[Jittor Version](https://github.com/hutaiHang/Faster-Diffusion/tree/main/jittor_version) ***Denotes equal contribution.** We propose FasterDiffusion, a training-free diffusion model acceleration scheme that can be widely integrated with various generative tasks and sampling strategies. Quantitative evaluation metrics such as FID, Clipscore, and user studies all indicate that our approach is on par with the original model in terms of genenrated-image quality. Specifically, we have observed the similarity of internal features in the Unet Encoder at adjacent time steps in the diffusion model. Consequently, it is possible to reuse Encoder features from previous time steps at specific time steps to reduce computational load. We propose a feature propagation scheme for accelerated generation, and this feature propagation enables independent computation at certain time steps, allowing us to further leverage GPU acceleration through a parallel strategy. Additionally, we introduced a prior noise injection method to improve the texture details of generated images. Our method is not only suitable for standard text-to-image(**~1.8x acceleration for [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5) and ~1.3x acceleration for [DeepFloyd-IF](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0)** ) tasks but can also be applied to diverse tasks such as text-to-video(**~1.5x acceleration on [VideoDiffusion](https://modelscope.cn/models/damo/text-to-video-synthesis/summary))**, personalized generation(**~1.8x acceleration for [DreamBooth](https://github.com/XavierXiao/Dreambooth-Stable-Diffusion) and [Custom Diffusion](https://github.com/adobe-research/custom-diffusion)**), and reference-guided generation(**~2.1x acceleration for [ControlNet](https://github.com/lllyasviel/ControlNet)**), among others. <img src=".\doc\method.png" alt="method" /> <div align="center"> <em>Method Overview. For more details, please see our paper. </em> </div> ## 🔧 Quick Start - Create environment： ```shell conda create -n fastersd python=3.9 conda activate fastersd pip install -r requirements.txt ``` - Execute ```shell # if using `stable diffusion` python sd_demo.py # if using `deepfloyd if` python if_demo.py #if using ControlNet(canny condition) python controlnet_demo.py ``` sd_demo.py output ```python Origin Pipeline: 2.524 seconds Faster Diffusion: 1.476 seconds ``` controlnet_demo.py output ```python Origin Pipeline: 3.264 seconds Faster Diffusion: 1.526 seconds ``` The above results were conducted using a 3090 GPU. + Usage Our method can easily integrate with the [diffusers](https://huggingface.co/docs/diffusers/index) library. Below is an example of integration with [stable-diffusion v1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5). <details> <summary>For Stable Diffusion</summary> ```python from diffusers import StableDiffusionPipeline import torch from utils_sd import register_normal_pipeline, register_faster_forward, register_parallel_pipeline, seed_everything # 1.import package seed_everything(2023) model_id = "runwayml/stable-diffusion-v1-5" pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16) pipe = pipe.to("cuda") #------------------------------ # 2. enable parallel. If memory is limited, replace it with `register_normal_pipeline(pipe)` register_parallel_pipeline(pipe, mod = '50ls') # 3. encoder propagation register_faster_forward(pipe.unet, mod = '50ls') #------------------------------ prompt = "a cat wearing sunglasses" image = pipe.call(prompt).images[0] image.save("cat.png") ``` </details> When the hyperparameter `mod` is set to `50ls`, it means that the keytime is set to the hyperparameter mentioned in the [our paper](https://arxiv.org/abs/2312.09608). When `mod` is set to a constant, such as `4`, it means that uniformly setting the keytime at a 1:4 ratio. For the Civitai community model, we recommend setting the uniform mod to 4. ## ✨ Qualitative results ### Text to Image <div align="center"> <b> ~1.8x acceleration for stable diffusion, 50 DDIM steps </b> </div> <img src=".\doc\sd-ddim50.png" alt="sd-ddim50" /> <div align="center"> <b> ~1.8x acceleration for stable diffusion, 20 Dpm-solver++ steps </b> </div> <img src=".\doc\sd-dpm++20.png" alt="sd-dpm++20" /> <div align="center"> <b> ~1.3x acceleration for DeepFloyd-IF </b> </div> <img src=".\doc\if-demo.png" alt="if-demo" /> ### Text to Video <div align="center"> <b> ~1.4x acceleration for Text2Video-Zero </b> </div> <img src=".\doc\t2v-zero.png" alt="t2v-zero" /> <p align="center"> <img src="./doc/videofusion-origin-demo1.gif" alt="origin" style="width: 95%;" /><img src="./doc/videofusion-ours-demo1.gif" alt="ours" style="width: 95%;" /> <div align="center"> <b>~1.5x acceleration for VideoFusion, origin video(left) and ours(right)</b> </div> </p> ### ControlNet <div align="center"> <b> ~2.1x acceleration for ControlNet </b> </div> <img src=".\doc\controlnet-demo.png" alt="controlnet-demo" style="zoom:50%;" /> ### Personalized Generation <div align="center"> <b> ~1.8x acceleration for DreamBooth and Custom Diffusion </b> </div> <img src=".\doc\personalized-demo.png" alt="personalized-demo" style="zoom:50%;" /> ### Other tasks based on Diffusion Model <img src=".\doc\other-task.png" alt="other-task" style="zoom: 43%;" /> <div align="center"> <b> Integrate our method with other tasks, such as Image Editing(<a href="https://github.com/google/prompt-to-prompt">P2P</a>) and <a href="https://github.com/ziqihuangg/ReVersion">Reversion</a> </b> </div> ## 📈 Quantitative results <p align="center"> <img src="./doc/rst1.png" alt="origin" style="width: 45%;margin-right: 20px;" /> <img src="./doc/rst2.png" alt="ours" style="width: 45%;" /> </p> ## Citation ``` @misc{li2023faster, title={Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models}, author={Senmao Li and Taihang Hu and Fahad Shahbaz Khan and Linxuan Li and Shiqi Yang and Yaxing Wang and Ming-Ming Cheng and Jian Yang}, year={2023}, eprint={2312.09608}, archivePrefix={arXiv}, primaryClass={cs.CV} } ``` ", Assign "at most 3 tags" to the expected json: {"id":"6146","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"

AI prompts