AI prompts
base on Repository for the Paper "Multi-LoRA Composition for Image Generation" # <img src="images/tangram.png" alt="title" width="4%"> Multi-LoRA Composition for Image Generation
<p align="center">
<a href="https://maszhongming.github.io/Multi-LoRA-Composition/"><img src="https://img.shields.io/badge/š-Website-red" height="25"></a>
<a href="https://arxiv.org/abs/2402.16843"><img src="https://img.shields.io/badge/š-Paper-blue" height="25"></a>
<a href="https://drive.google.com/file/d/1SuwRgV1LtEud8dfjftnw-zxBMgzSCwIT/view?usp=sharing" ><img src="https://img.shields.io/badge/šØ-ComposLoRA-green" height="25"></a>
<a href="https://colab.research.google.com/drive/1eSTj6qGOtSY5NaazwwN3meXOzEZxgaZq?usp=sharing"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="25"></a>
</p>
š **Authors:** [Ming Zhong](https://maszhongming.github.io/), [Yelong Shen](https://scholar.google.com/citations?user=S6OFEFEAAAAJ&hl=en), [Shuohang Wang](https://www.microsoft.com/en-us/research/people/shuowa/), [Yadong Lu](https://adamlu123.github.io/), [Yizhu Jiao](https://yzjiao.github.io/), [Siru Ouyang](https://ozyyshr.github.io/), [Donghan Yu](https://plusross.github.io/), [Jiawei Han](https://hanj.cs.illinois.edu/), [Weizhu Chen](https://www.microsoft.com/en-us/research/people/wzchen/)
## š Overview
Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images.
Our project presents two training-free methods: **LoRA Switch** and **LoRA Composite** for integrating any number of elements in an image through multi-LoRA composition.
The figure below illustrates differences between the traditional LoRA Merge approach and our newly proposed techniques:
<p align="center">
<img src="images/intro_fig.png" width="100%" alt="intro_case">
</p>
## š Getting Started
### Setting Up the Environment
To begin, set up your environment with the necessary packages:
```bash
conda create --name multi-lora python=3.10
conda activate multi-lora
pip install -r requirements.txt
```
### Downloading Pre-trained LoRAs
Our **ComposLoRA** testbed collects 22 pre-trained LoRAs, spanning characters, clothing, styles, backgrounds, and objects. Download `ComposLoRA.zip` from [this link](https://drive.google.com/file/d/1SuwRgV1LtEud8dfjftnw-zxBMgzSCwIT/view?usp=sharing), put it in the [models](./models) folder, and unzip it.
## š¼ļø Image Generation with Multi-LoRA Composition
To compose multiple LoRAs using different methods during image generation, follow these steps:
First, load the base model:
```python
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
'SG161222/Realistic_Vision_V5.1_noVAE',
custom_pipeline="MingZhong/StableDiffusionPipeline-with-LoRA-C",
use_safetensors=True
).to("cuda")
```
This model from Hugging Face is selected for realistic-style image generation. Additionally, our custom pipeline integrates the LoRA composite method into the standard Stable Diffusion pipeline.
Next, choose a character LoRA and a clothing LoRA from ComposLoRA for composition:
```python
# Load LoRAs
lora_path = 'models/lora/reality'
pipeline.load_lora_weights(lora_path, weight_name="character_2.safetensors", adapter_name="character")
pipeline.load_lora_weights(lora_path, weight_name="clothing_2.safetensors", adapter_name="clothing")
# List of LoRAs to be composed
cur_loras = ["character", "clothing"]
```
Select a composition method. "switch" and "composite" are our new proposals, offering alternatives to the traditional "merge" method:
```python
from callbacks import make_callback
method = 'switch'
# Initialize based on the selected composition method
if method == "merge":
pipeline.set_adapters(cur_loras)
switch_callback = None
elif method == "switch":
pipeline.set_adapters([cur_loras[0]])
switch_callback = make_callback(switch_step=args.switch_step, loras=cur_loras)
else:
pipeline.set_adapters(cur_loras)
switch_callback = None
```
Finally, set your prompt and generate the image:.
```python
# Set the prompts for image generation
prompt = "RAW photo, subject, 8k uhd, dslr, high quality, Fujifilm XT3, half-length portrait from knees up, scarlett, short red hair, blue eyes, school uniform, white shirt, red tie, blue pleated microskirt"
negative_prompt = "extra heads, nsfw, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck"
# Generate and save the image
generator = torch.manual_seed(11)
image = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
height=1024,
width=768,
num_inference_steps=100,
guidance_scale=7,
generator=generator,
cross_attention_kwargs={"scale": 0.8},
callback_on_step_end=switch_callback,
lora_composite=True if method == "composite" else False
).images[0]
image.save('example.png')
```
Refer to `example.py` for the full code, and adjust the following command to see results from different composition methods:
```bash
python example.py --method switch
```
Images generated by each of the three methods are showcased below:
| Merge | Switch | Composite |
|-------|--------|-----------|
| <p align="center"><img src="images/merge_example.png" alt="merge_example"></p> | <p align="center"><img src="images/switch_example.png" alt="switch_example"></p> | <p align="center"><img src="images/composite_example.png" alt="composite_example"></p> |
### Examples for LCM and LCM-LoRA
Our methods can seamlessly integrate with both [LCM and LCM-LoRA](https://github.com/luosiallen/latent-consistency-model), significantly accelerating the image generation process by reducing the need for denoising steps to just 2-8. Below are example code and generated images:
```bash
# LCM
python lcm_example.py --method switch
```
| Merge | Switch | Composite |
|-------|--------|-----------|
| <p align="center"><img src="images/merge_lcm_example.png" alt="merge_lcm_example"></p> | <p align="center"><img src="images/switch_lcm_example.png" alt="switch_lcm_example"></p> | <p align="center"><img src="images/composite_lcm_example.png" alt="composite_lcm_example"></p> |
```bash
# LCM-LoRA
python lcm_lora_example.py --method composite
```
| Merge | Switch | Composite |
|-------|--------|-----------|
| <p align="center"><img src="images/merge_lcm_lora_example.png" alt="merge_lcm_lora_example"></p> | <p align="center"><img src="images/switch_lcm_lora_example.png" alt="switch_lcm_lora_example"></p> | <p align="center"><img src="images/composite_lcm_lora_example.png" alt="composite_lcm_lora_example"></p> |
### Example for SDXL
In addition to SD1.5-based backbones, our method can also be applied to SDXL. The following example demonstrates how to combine [Pikachu](https://huggingface.co/TheLastBen/Pikachu_SDXL) and [Vision Pro](https://huggingface.co/fofr/sdxl-vision-pro) in an image:
```bash
python sdxl_example.py --method switch
```
| Merge | Switch | Composite |
|-------|--------|-----------|
| <p align="center"><img src="images/merge_sdxl_example.png" alt="merge_sdxl_example"></p> | <p align="center"><img src="images/switch_sdxl_example.png" alt="switch_sdxl_example"></p> | <p align="center"><img src="images/composite_sdxl_example.png" alt="composite_sdxl_example"></p> |
## šØ Experiments on ComposLoRA
**ComposLoRA** features 22 LoRAs and 480 different composition sets, allowing for the generation of images with any composition of 2-5 LoRAs, including at least one character LoRA.
### Image Generation
To generate anime-style images incorporating 2 LoRAs using LoRA Composite method, use the following command:
```bash
export CUDA_VISIBLE_DEVICES=0
python compose_lora.py \
--method composite \
--compos_num 2 \
--save_path output \
--lora_scale 0.8 \
--image_style anime \
--denoise_steps 200 \
--cfg_scale 10 \
```
Adjust the parameters in `compos_reality.sh` and `compose_anime.sh` for different compositions.
### Comparative Evaluation with GPT-4V
For comparative evaluation on composition efficacy and image quality, we use GPT-4V. Set your OpenAI API key first:
```bash
export OPENAI_API_KEY='your_openai_api_key_here'
```
Then, compare the composite and merge methods with this command:
```bash
python evaluate.py \
--base_method merge \
--comp_method composite \
--compos_num 2 \
--image_style anime \
--image_path output \
--save_path eval_result \
```
Modify `eval.sh` for comparative evaluation under different conditions. Note the position bias of GPT-4V as mentioned in our paper, making it essential to input images in both orders and average the scores for a fair final assessment.
## Human Evaluation
We also conduct human evaluations on 120 generated images to assess composition and image quality from a human perspective. These evaluations offer additional insights into the performance of our Multi-LoRA Composition methods and metrics. For detailed information on the evaluation process and results, please visit the [human_eval](./human_eval) folder.
## š Citation
If you find this work useful, please kindly cite our paper:
```
@article{zhong2024multi,
title={Multi-LoRA Composition for Image Generation},
author={Zhong, Ming and Shen, Yelong and Wang, Shuohang and Lu, Yadong and Jiao, Yizhu and Ouyang, Siru and Yu, Donghan and Han, Jiawei and Chen, Weizhu},
journal={arXiv preprint arXiv:2402.16843},
year={2024}
}
```
", Assign "at most 3 tags" to the expected json: {"id":"8205","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"