Trendshift - Ask AI

base on Let us democratise high-resolution generation! (CVPR 2024) # DemoFusion [![Project Page](https://img.shields.io/badge/Project-Page-green.svg)](https://ruoyidu.github.io/demofusion/demofusion.html) [![arXiv](https://img.shields.io/badge/arXiv-2311.16973-b31b1b.svg)](https://arxiv.org/pdf/2311.16973.pdf) [![Replicate](https://img.shields.io/badge/Demo-%F0%9F%9A%80%20Replicate-blue)](https://replicate.com/lucataco/demofusion) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/camenduru/DemoFusion-colab/blob/main/DemoFusion_colab.ipynb) [![Hugging Face](https://img.shields.io/badge/i2i-%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/radames/Enhance-This-DemoFusion-SDXL) [![Page Views Count](https://badges.toozhao.com/badges/01HFMAPCVTA1T32KN2PASNYGYK/blue.svg)](https://badges.toozhao.com/stats/01HFMAPCVTA1T32KN2PASNYGYK "Get your own page views count badge on badges.toozhao.com") Code release for "DemoFusion: Democratising High-Resolution Image Generation With No 💰" <img src="figures/illustration.jpg" width="800"/> **Abstract**: High-resolution image generation with Generative Artificial Intelligence (GenAI) has immense potential but, due to the enormous capital investment required for training, it is increasingly centralised to a few large corporations, and hidden behind paywalls. This paper aims to democratise high-resolution GenAI by advancing the frontier of high-resolution generation while remaining accessible to a broad audience. We demonstrate that existing Latent Diffusion Models (LDMs) possess untapped potential for higher-resolution image generation. Our novel DemoFusion framework seamlessly extends open-source GenAI models, employing Progressive Upscaling, Skip Residual, and Dilated Sampling mechanisms to achieve higher-resolution image generation. The progressive nature of DemoFusion requires more passes, but the intermediate results can serve as "previews", facilitating rapid prompt iteration. # News - **2024.02.27**: 🔥 DemoFusion has been accepted to CVPR'24! - **2023.12.15**: 🚀 A [ComfyUI Demofusion Custom Node](https://github.com/deroberon/demofusion-comfyui) is available! Thank [Andre](https://github.com/deroberon) for the implementation! - **2023.12.12**: ✨ DemoFusion with ControNet is availabe now! Check it out at `pipeline_demofusion_sdxl_controlnet`! The local [Gradio Demo](https://github.com/PRIS-CV/DemoFusion#DemoFusionControlNet-with-local-Gradio-demo) is also available. - **2023.12.10**: ✨ Image2Image is supported by `pipeline_demofusion_sdxl` now! The local [Gradio Demo](https://github.com/PRIS-CV/DemoFusion#Image2Image-with-local-Gradio-demo) is also available. - **2023.12.08**: 🚀 A HuggingFace Demo for Img2Img is now available! [![Hugging Face](https://img.shields.io/badge/i2i-%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/radames/Enhance-This-DemoFusion-SDXL) Thank [Radamés](https://github.com/radames) for the implementation and [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Diffusers-orange.svg)](https://huggingface.co/docs/diffusers/index) for the support! - **2023.12.07**: 🚀 Add Colab demo [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/camenduru/DemoFusion-colab/blob/main/DemoFusion_colab.ipynb). Check it out! Thank [camenduru](https://github.com/camenduru) for the implementation! - **2023.12.06**: ✨ The local [Gradio Demo](https://github.com/PRIS-CV/DemoFusion#Text2Image-with-local-Gradio-demo) is now available! Better interaction and presentation! - **2023.12.04**: ✨ A [low-vram version](https://github.com/PRIS-CV/DemoFusion#Text2Image-on-Windows-with-8-GB-of-VRAM) of DemoFusion is available! Thank [klimaleksus](https://github.com/klimaleksus) for the implementation! - **2023.12.01**: 🚀 Integrated to [Replicate](https://replicate.com/explore). Check out the online demo: [![Replicate](https://img.shields.io/badge/Demo-%F0%9F%9A%80%20Replicate-blue)](https://replicate.com/lucataco/demofusion) Thank [Luis C.](https://github.com/lucataco) for the implementation! - **2023.11.29**: 💰 `pipeline_demofusion_sdxl` is released. # Usage ## A quick try with integrated demos - HuggingFace Space: Try Text2Image generation at [![Hugging Face](https://img.shields.io/badge/t2i-%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/fffiloni/DemoFusion) and Image2Image enhancement at [![Hugging Face](https://img.shields.io/badge/i2i-%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/radames/Enhance-This-DemoFusion-SDXL). - Colab: Try Text2Image generation at [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/camenduru/DemoFusion-colab/blob/main/DemoFusion_colab.ipynb) and Image2Image enhancement at [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/camenduru/DemoFusion-colab/blob/main/DemoFusion_img2img_colab.ipynb). - Replicate: Try Text2Image generation at [![Replicate](https://img.shields.io/badge/Demo-%F0%9F%9A%80%20Replicate-blue)](https://replicate.com/lucataco/demofusion) and Image2Image enhancement at [![Replicate](https://img.shields.io/badge/Demo-%F0%9F%9A%80%20Replicate-blue)](https://replicate.com/lucataco/demofusion-enhance). ## Starting with our code ### Hyper-parameters - `view_batch_size` (`int`, defaults to 16): The batch size for multiple denoising paths. Typically, a larger batch size can result in higher efficiency but comes with increased GPU memory requirements. - `stride` (`int`, defaults to 64): The stride of moving local patches. A smaller stride is better for alleviating seam issues, but it also introduces additional computational overhead and inference time. - `cosine_scale_1` (`float`, defaults to 3): Control the decreasing rate of skip-residual. A smaller value results in better consistency with low-resolution results, but it may lead to more pronounced upsampling noise. Please refer to Appendix C in the DemoFusion paper. - `cosine_scale_2` (`float`, defaults to 1): Control the decreasing rate of dilated sampling. A smaller value can better address the repetition issue, but it may lead to grainy images. For specific impacts, please refer to Appendix C in the DemoFusion paper. - `cosine_scale_3` (`float`, defaults to 1): Control the decrease rate of the Gaussian filter. A smaller value results in less grainy images, but it may lead to over-smoothing images. Please refer to Appendix C in the DemoFusion paper. - `sigma` (`float`, defaults to 1): The standard value of the Gaussian filter. A larger sigma promotes the global guidance of dilated sampling, but it has the potential of over-smoothing. - `multi_decoder` (`bool`, defaults to True): Determine whether to use a tiled decoder. Generally, a tiled decoder becomes necessary when the resolution exceeds 3072*3072 on an RTX 3090 GPU. - `show_image` (`bool`, defaults to False): Determine whether to show intermediate results during generation. ### Text2Image (will take about 17 GB of VRAM) - Set up the dependencies as: ``` conda create -n demofusion python=3.9 conda activate demofusion pip install -r requirements.txt ``` - Download `pipeline_demofusion_sdxl.py` and run it as follows. A use case can be found in `demo.ipynb`. ``` from pipeline_demofusion_sdxl import DemoFusionSDXLPipeline import torch model_ckpt = "stabilityai/stable-diffusion-xl-base-1.0" pipe = DemoFusionSDXLPipeline.from_pretrained(model_ckpt, torch_dtype=torch.float16) pipe = pipe.to("cuda") prompt = "Envision a portrait of an elderly woman, her face a canvas of time, framed by a headscarf with muted tones of rust and cream. Her eyes, blue like faded denim. Her attire, simple yet dignified." negative_prompt = "blurry, ugly, duplicate, poorly drawn, deformed, mosaic" images = pipe(prompt, negative_prompt=negative_prompt, height=3072, width=3072, view_batch_size=16, stride=64, num_inference_steps=50, guidance_scale=7.5, cosine_scale_1=3, cosine_scale_2=1, cosine_scale_3=1, sigma=0.8, multi_decoder=True, show_image=True ) for i, image in enumerate(images): image.save('image_' + str(i) + '.png') ``` - ⚠️ When you have enough VRAM (e.g., generating 2048*2048 images on hardware with more than 18GB RAM), you can set `multi_decoder=False`, which can make the decoding process faster. - Please feel free to try different prompts and resolutions. - Default hyper-parameters are recommended, but they may not be optimal for all cases. For specific impacts of each hyper-parameter, please refer to Appendix C in the DemoFusion paper. - The code was cleaned before the release. If you encounter any issues, please contact us. ### Text2Image on Windows with 8 GB of VRAM - Set up the environment as: ``` cmd git clone "https://github.com/PRIS-CV/DemoFusion" cd DemoFusion python -m venv venv venv\Scripts\activate pip install -U "xformers==0.0.22.post7+cu118" --index-url https://download.pytorch.org/whl/cu118 pip install "diffusers==0.21.4" "matplotlib==3.8.2" "transformers==4.35.2" "accelerate==0.25.0" ``` - Launch DemoFusion as follows. The use case can be found in `demo_lowvram.py`. ``` python from pipeline_demofusion_sdxl import DemoFusionSDXLPipeline import torch from diffusers.models import AutoencoderKL vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16) model_ckpt = "stabilityai/stable-diffusion-xl-base-1.0" pipe = DemoFusionSDXLPipeline.from_pretrained(model_ckpt, torch_dtype=torch.float16, vae=vae) pipe = pipe.to("cuda") prompt = "Envision a portrait of an elderly woman, her face a canvas of time, framed by a headscarf with muted tones of rust and cream. Her eyes, blue like faded denim. Her attire, simple yet dignified." negative_prompt = "blurry, ugly, duplicate, poorly drawn, deformed, mosaic" images = pipe(prompt, negative_prompt=negative_prompt, height=2048, width=2048, view_batch_size=4, stride=64, num_inference_steps=40, guidance_scale=7.5, cosine_scale_1=3, cosine_scale_2=1, cosine_scale_3=1, sigma=0.8, multi_decoder=True, show_image=False, lowvram=True ) for i, image in enumerate(images): image.save('image_' + str(i) + '.png') ``` ### Text2Image with local Gradio demo - Make sure you have installed `gradio` and `gradio_imageslider`. - Launch DemoFusion via Gradio demo now -- try `python gradio_demo.py`! Better Interaction and Presentation！ <img src="figures/gradio_demo.png" width="600"/> ### Image2Image with local Gradio demo - Make sure you have installed `gradio` and `gradio_imageslider`. - Launch DemoFusion Image2Image by `python gradio_demo_img2img.py`. <img src="figures/gradio_demo_img2img.png" width="600"/> - ⚠️ Please note that, as a tuning-free framework, DemoFusion's Image2Image capability is strongly correlated with the SDXL's training data distribution and will show a significant bias. An accurate prompt to describe the content and style of the input also significantly improves performance. Have fun and regard it as a side application of text+image based generation. ### DemoFusion+ControlNet with local Gradio demo - Make sure you have installed `gradio` and `gradio_imageslider`. - Launch DemoFusion+ControNet Text2Image by `python gradio_demo.py`. - <img src="figures/gradio_demo_controlnet.png" width="600"/> - Launch DemoFusion+ControNet Image2Image by `python gradio_demo_img2img.py`. - <img src="figures/gradio_demo_controlnet_img2img.png" width="600"/> ## Citation If you find this paper useful in your research, please consider citing: ``` @inproceedings{du2024demofusion, title={DemoFusion: Democratising High-Resolution Image Generation With No \$\$\$}, author={Du, Ruoyi and Chang, Dongliang and Hospedales, Timothy and Song, Yi-Zhe and Ma, Zhanyu}, booktitle={CVPR}, year={2024} } ``` ", Assign "at most 3 tags" to the expected json: {"id":"5707","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"

AI prompts

AI prompts