base on [CVPR 2024] PIA, your Personalized Image Animator. Animate your images by text prompt, combing with Dreambooth, achieving stunning videos. PIA,你的个性化图像动画生成器,利用文本提示将图像变为奇妙的动画 # CVPR 2024 | PIA:Personalized Image Animator
[**PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models**](https://arxiv.org/abs/2312.13964)
[Yiming Zhang*](https://github.com/ymzhang0319), [Zhening Xing*](https://github.com/LeoXing1996/), [Yanhong Zeng†](https://zengyh1900.github.io/), [Youqing Fang](https://github.com/FangYouqing), [Kai Chen†](https://chenkai.site/)
(*equal contribution, †corresponding Author)
[![arXiv](https://img.shields.io/badge/arXiv-2312.13964-b31b1b.svg)](https://arxiv.org/abs/2312.13964)
[![Project Page](https://img.shields.io/badge/PIA-Website-green)](https://pi-animator.github.io)
[![Open in OpenXLab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/zhangyiming/PiaPia)
[![Third Party Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/camenduru/PIA-colab/blob/main/PIA_colab.ipynb)
[![HuggingFace Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue)](https://huggingface.co/Leoxing/PIA)
<a target="_blank" href="https://huggingface.co/spaces/Leoxing/PIA">
<img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HugginFace"/>
</a>
[![Replicate](https://replicate.com/cjwbw/pia/badge)](https://replicate.com/cjwbw/pia)
PIA is a personalized image animation method which can generate videos with **high motion controllability** and **strong text and image alignment**.
If you find our project helpful, please give it a star :star: or [cite](#bibtex) it, we would be very grateful :sparkling_heart: .
<img src="__assets__/image_animation/teaser/teaser.gif">
## What's New
- [x] `2024/01/03` [Replicate Demo & API](https://replicate.com/cjwbw/pia) support!
- [x] `2024/01/03` [Colab](https://github.com/camenduru/PIA-colab) support from [camenduru](https://github.com/camenduru)!
- [x] `2023/12/28` Support `scaled_dot_product_attention` for 1024x1024 images with just 16GB of GPU memory.
- [x] `2023/12/25` HuggingFace demo is available now! [🤗 Hub](https://huggingface.co/spaces/Leoxing/PIA/)
- [x] `2023/12/22` Release the demo of PIA on [OpenXLab](https://openxlab.org.cn/apps/detail/zhangyiming/PiaPia) and checkpoints on [Google Drive](https://drive.google.com/file/d/1RL3Fp0Q6pMD8PbGPULYUnvjqyRQXGHwN/view?usp=drive_link) or [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/zhangyiming/PIA)
## Setup
### Prepare Environment
Use the following command to install a conda environment for PIA from scratch:
```
conda env create -f pia.yml
conda activate pia
```
You may also want to install it based on an existing environment, then you can use `environment-pt2.yaml` for Pytorch==2.0.0. If you want to use lower version of Pytorch (e.g. 1.13.1), you can use the following command:
```
conda env create -f environment.yaml
conda activate pia
```
We strongly recommend you to use Pytorch==2.0.0 which supports `scaled_dot_product_attention` for memory-efficient image animation.
### Download checkpoints
<li>Download the Stable Diffusion v1-5</li>
```
conda install git-lfs
git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 models/StableDiffusion/
```
<li>Download PIA</li>
```
git clone https://huggingface.co/Leoxing/PIA models/PIA/
```
<li>Download Personalized Models</li>
```
bash download_bashscripts/1-RealisticVision.sh
bash download_bashscripts/2-RcnzCartoon.sh
bash download_bashscripts/3-MajicMix.sh
```
You can also download *pia.ckpt* manually through link on [Google Drive](https://drive.google.com/file/d/1RL3Fp0Q6pMD8PbGPULYUnvjqyRQXGHwN/view?usp=drive_link)
or [HuggingFace](https://huggingface.co/Leoxing/PIA).
Put checkpoints as follows:
```
└── models
├── DreamBooth_LoRA
│ ├── ...
├── PIA
│ ├── pia.ckpt
└── StableDiffusion
├── vae
├── unet
└── ...
```
## Inference
### Image Animation
Image to Video result can be obtained by:
```
python inference.py --config=example/config/lighthouse.yaml
python inference.py --config=example/config/harry.yaml
python inference.py --config=example/config/majic_girl.yaml
```
Run the command above, then you can find the results in example/result:
<table class="center">
<tr>
<td><p style="text-align: center">Input Image</p></td>
<td><p style="text-align: center">lightning, lighthouse</p></td>
<td><p style="text-align: center">sun rising, lighthouse</p></td>
<td><p style="text-align: center">fireworks, lighthouse</p></td>
</tr>
<tr>
<td><img src="example/img/lighthouse.jpg"></td>
<td><img src="__assets__/image_animation/real/1.gif"></td>
<td><img src="__assets__/image_animation/real/2.gif"></td>
<td><img src="__assets__/image_animation/real/3.gif"></td>
</tr>
<tr>
<td><p style="text-align: center">Input Image</p></td>
<td><p style="text-align: center">1boy smiling</p></td>
<td><p style="text-align: center">1boy playing the magic fire</p></td>
<td><p style="text-align: center">1boy is waving hands</p></td>
</tr>
<tr>
<td><img src="example/img/harry.png"></td>
<td><img src="__assets__/image_animation/rcnz/1.gif"></td>
<td><img src="__assets__/image_animation/rcnz/2.gif"></td>
<td><img src="__assets__/image_animation/rcnz/3.gif"></td>
</tr>
<tr>
<td><p style="text-align: center">Input Image</p></td>
<td><p style="text-align: center">1girl is smiling</p></td>
<td><p style="text-align: center">1girl is crying</p></td>
<td><p style="text-align: center">1girl, snowing </p></td>
</tr>
<tr>
<td><img src="example/img/majic_girl.jpg"></td>
<td><img src="__assets__/image_animation/majic/1.gif"></td>
<td><img src="__assets__/image_animation/majic/2.gif"></td>
<td><img src="__assets__/image_animation/majic/3.gif"></td>
</tr>
</table>
<!-- More results:
<table class="center">
<tr>
<td><p style="text-align: center">Input Image</p></td>
</tr>
<tr>
</tr>
</table> -->
### Motion Magnitude
You can control the motion magnitude through the parameter **magnitude**:
```sh
python inference.py --config=example/config/xxx.yaml --magnitude=0 # Small Motion
python inference.py --config=example/config/xxx.yaml --magnitude=1 # Moderate Motion
python inference.py --config=example/config/xxx.yaml --magnitude=2 # Large Motion
```
Examples:
```sh
python inference.py --config=example/config/labrador.yaml
python inference.py --config=example/config/bear.yaml
python inference.py --config=example/config/genshin.yaml
```
<table class="center">
<tr>
<td><p style="text-align: center">Input Image<br>& Prompt</p></td>
<td><p style="text-align: center">Small Motion</p></td>
<td><p style="text-align: center">Moderate Motion</p></td>
<td><p style="text-align: center">Large Motion</p></td>
</tr>
<tr>
<td><img src="example/img/labrador.png" style="width: 220px">a golden labrador is running</td>
<td><img src="__assets__/image_animation/magnitude/labrador/1.gif"></td>
<td><img src="__assets__/image_animation/magnitude/labrador/2.gif"></td>
<td><img src="__assets__/image_animation/magnitude/labrador/3.gif"></td>
</tr>
<tr>
<td><img src="example/img/bear.jpg" style="width: 220px">1bear is walking, ...</td>
<td><img src="__assets__/image_animation/magnitude/bear/1.gif"></td>
<td><img src="__assets__/image_animation/magnitude/bear/2.gif"></td>
<td><img src="__assets__/image_animation/magnitude/bear/3.gif"></td>
</tr>
<tr>
<td><img src="example/img/genshin.jpg" style="width: 220px">cherry blossom, ...</td>
<td><img src="__assets__/image_animation/magnitude/genshin/1.gif"></td>
<td><img src="__assets__/image_animation/magnitude/genshin/2.gif"></td>
<td><img src="__assets__/image_animation/magnitude/genshin/3.gif"></td>
</tr>
</table>
### Style Transfer
To achieve style transfer, you can run the command(*Please don't forget set the base model in xxx.yaml*):
Examples:
```sh
python inference.py --config example/config/concert.yaml --style_transfer
python inference.py --config example/config/anya.yaml --style_transfer
```
<table class="center">
<tr>
<td><p style="text-align: center">Input Image<br> & Base Model</p></td>
<td><p style="text-align: center">1man is smiling</p></td>
<td><p style="text-align: center">1man is crying</p></td>
<td><p style="text-align: center">1man is singing</p></td>
</tr>
<tr>
<td style="text-align: center"><img src="example/img/concert.png" style="width:220px">Realistic Vision</td>
<td><img src="__assets__/image_animation/style_transfer/concert/1.gif"></td>
<td><img src="__assets__/image_animation/style_transfer/concert/2.gif"></td>
<td><img src="__assets__/image_animation/style_transfer/concert/3.gif"></td>
</tr>
<tr>
<td style="text-align: center"><img src="example/img/concert.png" style="width:220px">RCNZ Cartoon 3d</td>
<td><img src="__assets__/image_animation/style_transfer/concert/4.gif"></td>
<td><img src="__assets__/image_animation/style_transfer/concert/5.gif"></td>
<td><img src="__assets__/image_animation/style_transfer/concert/6.gif"></td>
</tr>
<tr>
<td><p style="text-align: center"></p></td>
<td><p style="text-align: center">1girl smiling</p></td>
<td><p style="text-align: center">1girl open mouth</p></td>
<td><p style="text-align: center">1girl is crying, pout</p></td>
</tr>
<tr>
<td style="text-align: center"><img src="example/img/anya.jpg" style="width:220px">RCNZ Cartoon 3d</td>
<td><img src="__assets__/image_animation/style_transfer/anya/1.gif"></td>
<td><img src="__assets__/image_animation/style_transfer/anya/2.gif"></td>
<td><img src="__assets__/image_animation/style_transfer/anya/3.gif"></td>
</tr>
</table>
### Loop Video
You can generate loop by using the parameter --loop
```sh
python inference.py --config=example/config/xxx.yaml --loop
```
Examples:
```sh
python inference.py --config=example/config/lighthouse.yaml --loop
python inference.py --config=example/config/labrador.yaml --loop
```
<table>
<tr>
<td><p style="text-align: center">Input Image</p></td>
<td><p style="text-align: center">lightning, lighthouse</p></td>
<td><p style="text-align: center">sun rising, lighthouse</p></td>
<td><p style="text-align: center">fireworks, lighthouse</p></td>
</tr>
<tr>
<td style="text-align: center"><img src="example/img/lighthouse.jpg" style="width:auto"></td>
<td><img src="__assets__/image_animation/loop/lighthouse/1.gif"></td>
<td><img src="__assets__/image_animation/loop/lighthouse/2.gif"></td>
<td><img src="__assets__/image_animation/loop/lighthouse/3.gif"></td>
</tr>
<tr>
<td><p style="text-align: center">Input Image</p></td>
<td><p style="text-align: center">labrador jumping</p></td>
<td><p style="text-align: center">labrador walking</p></td>
<td><p style="text-align: center">labrador running</p></td>
</tr>
<tr>
<td style="text-align: center"><img src="example/img/labrador.png" style="width:auto"></td>
<td><img src="__assets__/image_animation/loop/labrador/1.gif"></td>
<td><img src="__assets__/image_animation/loop/labrador/2.gif"></td>
<td><img src="__assets__/image_animation/loop/labrador/3.gif"></td>
</tr>
</table>
## Training
We provide [training script]("train.py") for PIA. It borrows from [AnimateDiff](https://github.com/guoyww/AnimateDiff/tree/main) heavily, so please prepare the dataset and configuration files according to the [guideline](https://github.com/guoyww/AnimateDiff/blob/main/__assets__/docs/animatediff.md#steps-for-training).
After preparation, you can train the model by running the following command using torchrun:
```shell
torchrun --nnodes=1 --nproc_per_node=1 train.py --config example/config/train.yaml
```
or by slurm,
```shell
srun --quotatype=reserved --job-name=pia --gres=gpu:8 --ntasks-per-node=8 --ntasks=8 --cpus-per-task=4 --kill-on-bad-exit=1 python train.py --config example/config/train.yaml
```
## AnimateBench
We have open-sourced AnimateBench on [HuggingFace](https://huggingface.co/datasets/ymzhang319/AnimateBench) which includes images, prompts and configs to evaluate PIA and other image animation methods.
## BibTex
```
@inproceedings{zhang2024pia,
title={Pia: Your personalized image animator via plug-and-play modules in text-to-image models},
author={Zhang, Yiming and Xing, Zhening and Zeng, Yanhong and Fang, Youqing and Chen, Kai},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7747--7756},
year={2024}
}
```
## Contact Us
**Yiming Zhang**:
[email protected]
**Zhening Xing**:
[email protected]
**Yanhong Zeng**:
[email protected]
## Acknowledgements
The code is built upon [AnimateDiff](https://github.com/guoyww/AnimateDiff), [Tune-a-Video](https://github.com/showlab/Tune-A-Video) and [PySceneDetect](https://github.com/Breakthrough/PySceneDetect)
You may also want to try other project from our team:
<a target="_blank" href="https://github.com/open-mmlab/mmagic">
<img src="https://github.com/open-mmlab/mmagic/assets/28132635/15aab910-f5c4-4b76-af9d-fe8eead1d930" height=20 alt="MMagic"/>
</a>
", Assign "at most 3 tags" to the expected json: {"id":"6310","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"