Trendshift - Ask AI

base on [ECCV 2024] Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance <h1 align='Center'>Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance</h1> <div align='Center'> <a href='https://github.com/ShenhaoZhu' target='_blank'>Shenhao Zhu</a><sup>*1</sup>&emsp; <a href='https://github.com/Leoooo333' target='_blank'>Junming Leo Chen</a><sup>*2</sup>&emsp; <a href='https://github.com/daizuozhuo' target='_blank'>Zuozhuo Dai</a><sup>3</sup>&emsp; <a href='https://ai3.fudan.edu.cn/info/1088/1266.htm' target='_blank'>Yinghui Xu</a><sup>2</sup>&emsp; <a href='https://cite.nju.edu.cn/People/Faculty/20190621/i5054.html' target='_blank'>Xun Cao</a><sup>1</sup>&emsp; <a href='https://yoyo000.github.io/' target='_blank'>Yao Yao</a><sup>1</sup>&emsp; <a href='http://zhuhao.cc/home/' target='_blank'>Hao Zhu</a><sup>+1</sup>&emsp; <a href='https://sites.google.com/site/zhusiyucs/home' target='_blank'>Siyu Zhu</a><sup>+2</sup> </div> <div align='Center'> <sup>1</sup>Nanjing University <sup>2</sup>Fudan University <sup>3</sup>Alibaba Group </div> <div align='Center'> <i><strong><a href='https://eccv2024.ecva.net' target='_blank'>ECCV 2024</a></strong></i> </div> <div align='Center'> <a href='https://fudan-generative-vision.github.io/champ/#/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2403.14781'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://youtu.be/2XVsy9tQRAY'><img src='https://badges.aleen42.com/src/youtube.svg'></a> <a href='assets/wechat.jpeg'><img src='https://badges.aleen42.com/src/wechat.svg'></a> </div> https://github.com/fudan-generative-vision/champ/assets/82803297/b4571be6-dfb0-4926-8440-3db229ebd4aa # Framework ![framework](assets/framework.jpg) # News - **`2024/05/05`**: 🎉🎉🎉[Sample training data on HuggingFace](https://huggingface.co/datasets/fudan-generative-ai/champ_trainning_sample) released. - **`2024/05/02`**: 🌟🌟🌟Training source code released [#99](https://github.com/fudan-generative-vision/champ/pull/99). - **`2024/04/28`**: 👏👏👏Smooth SMPLs in Blender method released [#96](https://github.com/fudan-generative-vision/champ/pull/96). - **`2024/04/26`**: 🚁Great Blender Adds-on [CEB Studios ](https://www.patreon.com/cebstudios/posts) for various SMPL process! - **`2024/04/12`**: ✨✨✨SMPL & Rendering scripts released! Champ your dance videos now💃🤸‍♂️🕺. See [docs](https://github.com/fudan-generative-vision/champ/blob/master/docs/data_process.md). - **`2024/03/30`**: 🚀🚀🚀Amazing [ComfyUI Wrapper](https://github.com/kijai/ComfyUI-champWrapper) by community. Here is the [video tutorial](https://www.youtube.com/watch?app=desktop&v=cbElsTBv2-A). Thanks to [@kijai](https://github.com/kijai)🥳 - **`2024/03/27`**: Cool Demo on [replicate](https://replicate.com/camenduru/champ)🌟. Thanks to [@camenduru](https://github.com/camenduru)👏 - **`2024/03/27`**: Visit our [roadmap🕒](#roadmap) to preview the future of Champ. # Installation - System requirement: Ubuntu20.04/Windows 11, Cuda 12.1 - Tested GPUs: A100, RTX3090 Create conda environment: ```bash conda create -n champ python=3.10 conda activate champ ``` Install packages with `pip` ```bash pip install -r requirements.txt ``` Install packages with [poetry](https://python-poetry.org/) > If you want to run this project on a Windows device, we strongly recommend to use `poetry`. ```shell poetry install --no-root ``` # Inference The inference entrypoint script is `${PROJECT_ROOT}/inference.py`. Before testing your cases, there are two preparations need to be completed: 1. [Download all required pretrained models](#download-pretrained-models). 2. [Prepare your guidance motions](#preparen-your-guidance-motions). 2. [Run inference](#run-inference). ## Download pretrained models You can easily get all pretrained models required by inference from our [HuggingFace repo](https://huggingface.co/fudan-generative-ai/champ). Clone the the pretrained models into `${PROJECT_ROOT}/pretrained_models` directory by cmd below: ```shell git lfs install git clone https://huggingface.co/fudan-generative-ai/champ pretrained_models ``` Or you can download them separately from their source repo: - [Champ ckpts](https://huggingface.co/fudan-generative-ai/champ/tree/main): Consist of denoising UNet, guidance encoders, Reference UNet, and motion module. - [StableDiffusion V1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5): Initialized and fine-tuned from Stable-Diffusion-v1-2. (*Thanks to runwayml*) - [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse): Weights are intended to be used with the diffusers library. (*Thanks to stablilityai*) - [image_encoder](https://huggingface.co/lambdalabs/sd-image-variations-diffusers/tree/main/image_encoder): Fine-tuned from CompVis/stable-diffusion-v1-4-original to accept CLIP image embedding rather than text embeddings. (*Thanks to lambdalabs*) Finally, these pretrained models should be organized as follows: ```text ./pretrained_models/ |-- champ | |-- denoising_unet.pth | |-- guidance_encoder_depth.pth | |-- guidance_encoder_dwpose.pth | |-- guidance_encoder_normal.pth | |-- guidance_encoder_semantic_map.pth | |-- reference_unet.pth | `-- motion_module.pth |-- image_encoder | |-- config.json | `-- pytorch_model.bin |-- sd-vae-ft-mse | |-- config.json | |-- diffusion_pytorch_model.bin | `-- diffusion_pytorch_model.safetensors `-- stable-diffusion-v1-5 |-- feature_extractor | `-- preprocessor_config.json |-- model_index.json |-- unet | |-- config.json | `-- diffusion_pytorch_model.bin `-- v1-inference.yaml ``` ## Prepare your guidance motions Guidance motion data which is produced via SMPL & Rendering is necessary when performing inference. You can download our pre-rendered samples on our [HuggingFace repo](https://huggingface.co/datasets/fudan-generative-ai/champ_motions_example) and place into `${PROJECT_ROOT}/example_data` directory: ```shell git lfs install git clone https://huggingface.co/datasets/fudan-generative-ai/champ_motions_example example_data ``` Or you can follow the [SMPL & Rendering doc](https://github.com/fudan-generative-vision/champ/blob/master/docs/data_process.md) to produce your own motion datas. Finally, the `${PROJECT_ROOT}/example_data` will be like this: ``` ./example_data/ |-- motions/ # Directory includes motions per subfolder | |-- motion-01/ # A motion sample | | |-- depth/ # Depth frame sequance | | |-- dwpose/ # Dwpose frame sequance | | |-- mask/ # Mask frame sequance | | |-- normal/ # Normal map frame sequance | | `-- semantic_map/ # Semanic map frame sequance | |-- motion-02/ | | |-- ... | | `-- ... | `-- motion-N/ | |-- ... | `-- ... `-- ref_images/ # Reference image samples(Optional) |-- ref-01.png |-- ... `-- ref-N.png ``` ## Run inference Now we have all prepared models and motions in `${PROJECT_ROOT}/pretrained_models` and `${PROJECT_ROOT}/example_data` separately. Here is the command for inference: ```bash python inference.py --config configs/inference/inference.yaml ``` If using `poetry`, command is ```shell poetry run python inference.py --config configs/inference/inference.yaml ``` Animation results will be saved in `${PROJECT_ROOT}/results` folder. You can change the reference image or the guidance motion by modifying `inference.yaml`. The default motion-02 in `inference.yaml` has about 250 frames, requires ~20GB VRAM. **Note**: If your VRAM is insufficient, you can switch to a shorter motion sequence or cut out a segment from a long sequence. We provide a frame range selector in `inference.yaml`, which you can replace with a list of `[min_frame_index, max_frame_index]` to conveniently cut out a segment from the sequence. # Train the Model The training process consists of two distinct stages. For more information, refer to the `Training Section` in the [paper on arXiv](https://arxiv.org/abs/2403.14781). ## Prepare Datasets Prepare your own training videos with human motion (or use [our sample training data on HuggingFace](https://huggingface.co/datasets/fudan-generative-ai/champ_trainning_sample)) and modify `data.video_folder` value in training config yaml. All training videos need to be processed into SMPL & DWPose format. Refer to the [Data Process doc](https://github.com/fudan-generative-vision/champ/blob/master/docs/data_process.md). The directory structure will be like this: ```txt /training_data/ |-- video01/ # A video data frame | |-- depth/ # Depth frame sequance | |-- dwpose/ # Dwpose frame sequance | |-- mask/ # Mask frame sequance | |-- normal/ # Normal map frame sequance | `-- semantic_map/ # Semanic map frame sequance |-- video02/ | |-- ... | `-- ... `-- videoN/ |-- ... `-- ... ``` Select another small batch of data as the validation set, and modify the `validation.ref_images` and `validation.guidance_folders` roots in training config yaml. ## Run Training Scripts To train the Champ model, use the following command: ```shell # Run training script of stage1 accelerate launch train_s1.py --config configs/train/stage1.yaml # Modify the `stage1_ckpt_dir` value in yaml and run training script of stage2 accelerate launch train_s2.py --config configs/train/stage2.yaml ``` # Datasets | Type | HuggingFace | ETA | | :----: | :----------------------------------------------------------------------------------------- | :-------------: | | Inference | **[SMPL motion samples](https://huggingface.co/datasets/fudan-generative-ai/champ_motions_example)** | Thu Apr 18 2024 | | Training | **[Sample datasets for Training](https://huggingface.co/datasets/fudan-generative-ai/champ_trainning_sample)** | Sun May 05 2024 | # Roadmap | Status | Milestone | ETA | | :----: | :----------------------------------------------------------------------------------------- | :-------------: | | ✅ | **[Inference source code meet everyone on GitHub first time](https://github.com/fudan-generative-vision/champ)** | Sun Mar 24 2024 | | ✅ | **[Model and test data on Huggingface](https://huggingface.co/fudan-generative-ai/champ)** | Tue Mar 26 2024 | | ✅ | **[Optimize dependencies and go well on Windows](https://github.com/fudan-generative-vision/champ?tab=readme-ov-file#installation)** | Sun Mar 31 2024 | | ✅ | **[Data preprocessing code release](https://github.com/fudan-generative-vision/champ/blob/master/docs/data_process.md)** | Fri Apr 12 2024 | | ✅ | **[Training code release](https://github.com/fudan-generative-vision/champ/pull/99)** | Thu May 02 2024 | | ✅ | **[Sample of training data release on HuggingFace](https://huggingface.co/datasets/fudan-generative-ai/champ_trainning_sample)** | Sun May 05 2024 | | ✅ | **[Smoothing SMPL motion](https://github.com/fudan-generative-vision/champ/pull/96)** | Sun Apr 28 2024 | | 🚀🚀🚀 | **[Gradio demo on HuggingFace]()** | TBD | # Citation If you find our work useful for your research, please consider citing the paper: ``` @inproceedings{zhu2024champ, title={Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance}, author={Shenhao Zhu and Junming Leo Chen and Zuozhuo Dai and Yinghui Xu and Xun Cao and Yao Yao and Hao Zhu and Siyu Zhu}, booktitle={European Conference on Computer Vision (ECCV)}, year={2024} } ``` # Opportunities available Multiple research positions are open at the **Generative Vision Lab, Fudan University**! Include: - Research assistant - Postdoctoral researcher - PhD candidate - Master students Interested individuals are encouraged to contact us at [[email protected]](mailto://[email protected]) for further information. ", Assign "at most 3 tags" to the expected json: {"id":"8869","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"

AI prompts

AI prompts