AI prompts
base on Official Pytorch implementation of "Visual Style Prompting with Swapping Self-Attention" ## 🎨 Visual Style Prompting with Swapping Self-Attention
### : Text-to-Stylized image with Training-free
### ArXiv | 📖 [Paper](https://arxiv.org/abs/2402.12974) | ✨ [Project page](https://curryjung.github.io/VisualStylePrompt)
> #### Authors    [Jaeseok Jeong](https://drive.google.com/file/d/19I3s70cfQ45dC_JiD2kmkv0MZ8yu4kBZ/view)<sup>1,2*</sup>, [Junho Kim](https://github.com/taki0112)<sup>1*</sup>, [Yunjey Choi](https://www.linkedin.com/in/yunjey-choi-27b347175/?originalSubdomain=kr)<sup>1</sup>, [Gayoung Lee](https://www.linkedin.com/in/gayoung-lee-0824548a/?originalSubdomain=kr)<sup>1</sup>, [Youngjung Uh](https://vilab.yonsei.ac.kr/member)<sup>2†</sup> <br> <sub>          <sup>1</sup>NAVER AI Lab, <sup>2</sup>Yonsei University</sub> <br> <sub>          <sup>*</sup>Equal Contribution, <sup>†</sup>Corresponding author</sub>
![teaser](./assets/git_image/teaser.png)
> #### 🔆 Abstract
> *In the evolving domain of text-to-image generation, diffusion models have emerged as powerful tools in content creation. Despite their remarkable capability, existing models still face challenges in achieving controlled generation with a consistent style, requiring costly fine-tuning or often inadequately transferring the visual elements due to content leakage. ***To address these challenges, we propose a novel approach, visual style prompting, to produce a diverse range of images while maintaining specific style elements and nuances. During the denoising process, we keep the query from original features while swapping the key and value with those from reference features in the late self-attention layers.*** This approach allows for the visual style prompting without any fine-tuning, ensuring that generated images maintain a faithful style. Through extensive evaluation across various styles and text prompts, our method demonstrates superiority over existing approaches, best reflecting the style of the references and ensuring that resulting images match the text prompts most accurately.*
---
### 🔥 To do
* [x] color calibration to use a real image as reference
* [x] user image in demo
* [x] gpu upgrade in demo (Thanks for HF)
---
### 🤗 HuggingFace Demo
* 👉 [Default](https://huggingface.co/spaces/naver-ai/VisualStylePrompting)
* 👉 [w/ ControlNet](https://huggingface.co/spaces/naver-ai/VisualStylePrompting_Controlnet)
---
### ✨ Requirements
```
> pytorch 1.13.1
> pip install --upgrade diffusers accelerate transformers einops kornia gradio triton xformers==0.0.16
```
### ✨ Usage
#### w/ Predefined styles in config file
```
> python vsp_script.py --style fire
```
![vsp_img](./assets/git_image/vsp.png)
#### 👉 w/ Controlnet
```
> python vsp_control-edge_script.py --style fire --controlnet_scale 0.5 --canny_img_path assets/edge_dir
> python vsp_control-depth_script.py --style fire --controlnet_scale 0.5 --depth_img_path assets/depth_dir
```
![control_img](./assets/git_image/vsp_control.png)
#### 👉 w/ User image
```
> python vsp_real_script.py --img_path assets/real_dir --tar_obj cat --output_num 5 --color_cal_start_t 150 --color_cal_window_size 50
```
* For better results, you can add more style description only to inference image by directly editing code.
* `vsp_real_script.py -> def create_prompt`
* Save your images in the `style_name.png` format.
* e.g.,) The starry night.png
![real_img](./assets/git_image/vsp_real.png)
---
### ✨ Misc
#### 👉 How to visualize the attention map ?
1. Save the attention map.
```
> python visualize_attention_src/save_attn_map_script.py
```
2. Visualize the attention map.
```
> python visualize_attention_src/visualize_attn_map_script.py
```
<div align="center">
<img src="./assets/git_image/attention_map.png" width="394" height="469">
</div>
---
### 📚 Citation
```bibtex
@article{jeong2024visual,
title={Visual Style Prompting with Swapping Self-Attention},
author={Jeong, Jaeseok and Kim, Junho and Choi, Yunjey and Lee, Gayoung and Uh, Youngjung},
journal={arXiv preprint arXiv:2402.12974},
year={2024}
}
```
---
### ✨ License
```
Visual Style Prompting with Swapping Self-Attention
Copyright (c) 2024-present NAVER Cloud Corp.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```
", Assign "at most 3 tags" to the expected json: {"id":"8601","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"