Trendshift - Ask AI

base on Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation" # Sonic Sonic: Shifting Focus to Global Audio Perception in Portrait Animation, CVPR 2025. <a href='https://jixiaozhong.github.io/Sonic/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href="http://demo.sonic.jixiaozhong.online/" style="margin: 0 2px;"> <img src='https://img.shields.io/badge/Demo-Gradio-gold?style=flat&logo=Gradio&logoColor=red' alt='Demo'> </a> <a href='https://arxiv.org/pdf/2411.16331'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href="https://huggingface.co/spaces/xiaozhongji/Sonic" style="margin: 0 2px;"> <img src='https://img.shields.io/badge/Space-ZeroGPU-orange?style=flat&logo=Gradio&logoColor=red' alt='Demo'> </a> <a href="https://raw.githubusercontent.com/jixiaozhong/Sonic/refs/heads/main/LICENSE" style="margin: 0 2px;"> <img src='https://img.shields.io/badge/License-CC BY--NC--SA--4.0-lightgreen?style=flat&logo=Lisence' alt='License'> </a> <p align="center"> 👋 Join our <a href="examples/image/QQ2.jpg" target="_blank">QQ Chat Group</a> </p> <p align="center"> ## 🔥🔥🔥 NEWS **`2025/05/06`**: We have open-sourced [**DICE-Talk**](https://github.com/toto222/DICE-Talk), a portrait-driven system with emotional expression. Welcome to try it out! **`2025/03/14`**: Super stoked to share that our Sonic is accpted by the CVPR 2025! See you Nashville!! **`2025/02/08`**: Many thanks to the open-source community contributors for making the ComfyUI version of Sonic a reality. Your efforts are truly appreciated! [**ComfyUI version of Sonic**](https://github.com/smthemex/ComfyUI_Sonic) **`2025/02/06`**: Commercialization: Note that our license is **non-commercial**. If commercialization is required, please use Tencent Cloud Video Creation Large Model: [**Introduction**](https://cloud.tencent.com/product/vclm) / [**API documentation**](https://cloud.tencent.com/document/api/1616/109378) **`2025/01/17`**: Our [**Online huggingface Demo**](https://huggingface.co/spaces/xiaozhongji/Sonic/) is released. **`2025/01/17`**: Thank you to NewGenAI for promoting our Sonic and creating a Windows-based tutorial on [**YouTube**](https://www.youtube.com/watch?v=KiDDtcvQyS0). **`2024/12/16`**: Our [**Online Demo**](http://demo.sonic.jixiaozhong.online/) is released. ## 🎥 Demo | Input | Output | Input | Output | |----------------------|-----------------------|----------------------|-----------------------| |<img src="examples/image/anime1.png" width="360">|<video src="https://github.com/user-attachments/assets/636c3ff5-210e-44b8-b901-acf828071133" width="360"> </video>|<img src="examples/image/female_diaosu.png" width="360">|<video src="https://github.com/user-attachments/assets/e8207300-2569-47d1-9ad4-4b4c9b0f0bd4" width="360"> </video>| |<img src="examples/image/hair.png" width="360">|<video src="https://github.com/user-attachments/assets/dcb755c1-de01-4afe-8b4f-0e0b2c2439c1" width="360"> </video>|<img src="examples/image/leonnado.jpg" width="360">|<video src="https://github.com/user-attachments/assets/b50e61bb-62d4-469d-b402-b37cda3fbd27" width="360"> </video>| For more visual demos, please visit our [**Page**](https://jixiaozhong.github.io/Sonic/). ## 🧩 Community Contributions If you develop/use Sonic in your projects, welcome to let us know. - ComfyUI version of Sonic: [**ComfyUI_Sonic**](https://github.com/smthemex/ComfyUI_Sonic) ## 📑 Updates **`2025/01/14`**: Our inference code and weights are released. Stay tuned, we will continue to polish the model. ## 📜 Requirements * An NVIDIA GPU with CUDA support is required. * The model is tested on a single 32G GPU. * Tested operating system: Linux ## 🔑 Inference ### Installtion - install pytorch ```shell pip3 install -r requirements.txt ``` - All models are stored in `checkpoints` by default, and the file structure is as follows ```shell Sonic ├──checkpoints │ ├──Sonic │ │ ├──audio2bucket.pth │ │ ├──audio2token.pth │ │ ├──unet.pth │ ├──stable-video-diffusion-img2vid-xt │ │ ├──... │ ├──whisper-tiny │ │ ├──... │ ├──RIFE │ │ ├──flownet.pkl │ ├──yoloface_v5m.pt ├──... ``` Download by `huggingface-cli` follow ```shell python3 -m pip install "huggingface_hub[cli]" huggingface-cli download LeonJoe13/Sonic --local-dir checkpoints huggingface-cli download stabilityai/stable-video-diffusion-img2vid-xt --local-dir checkpoints/stable-video-diffusion-img2vid-xt huggingface-cli download openai/whisper-tiny --local-dir checkpoints/whisper-tiny ``` or manully download [pretrain model](https://drive.google.com/drive/folders/1oe8VTPUy0-MHHW2a_NJ1F8xL-0VN5G7W?usp=drive_link), [svd-xt](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) and [whisper-tiny](https://huggingface.co/openai/whisper-tiny) to checkpoints/ ### Run demo ```shell python3 demo.py \ '/path/to/input_image' \ '/path/to/input_audio' \ '/path/to/output_video' ``` ## 🔗 Citation If you find our work helpful for your research, please consider citing our work. ```bibtex @inproceedings{ji2025sonic, title={Sonic: Shifting focus to global audio perception in portrait animation}, author={Ji, Xiaozhong and Hu, Xiaobin and Xu, Zhihong and Zhu, Junwei and Lin, Chuming and He, Qingdong and Zhang, Jiangning and Luo, Donghao and Chen, Yi and Lin, Qin and others}, booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference}, pages={193--203}, year={2025} } @article{ji2024realtalk, title={Realtalk: Real-time and realistic audio-driven face generation with 3d facial prior-guided identity alignment network}, author={Ji, Xiaozhong and Lin, Chuming and Ding, Zhonggan and Tai, Ying and Zhu, Junwei and Hu, Xiaobin and Luo, Donghao and Ge, Yanhao and Wang, Chengjie}, journal={arXiv preprint arXiv:2406.18284}, year={2024} } @article{tan2025dicetalk, title={Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation}, author={Tan, Weipeng and Lin, Chuming and Xu, Chengming and Xu, FeiFan and Hu, Xiaobin and Ji, Xiaozhong and Zhu, Junwei and Wang, Chengjie and Fu, Yanwei}, journal={arXiv preprint arXiv:2504.18087}, year={2025} } ``` ## 📜 Related Works Explore our related researches: - **[Super-fast talk：real-time and less GPU computation]** [Realtalk: Real-time and realistic audio-driven face generation with 3d facial prior-guided identity alignment network](https://arxiv.org/pdf/2406.18284) ## 📈 Star History [![Star History Chart](https://api.star-history.com/svg?repos=jixiaozhong/Sonic&type=Date)](https://star-history.com/#jixiaozhong/Sonic&Date) ", Assign "at most 3 tags" to the expected json: {"id":"13467","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"

AI prompts