base on Mora: More like Sora for Generalist Video Generation # Mora: More like Sora for Generalist Video Generation
> π See our newest Video Generation paper: [**"Mora: Enabling Generalist Video Generation via A Multi-Agent Framework"**](http://arxiv.org/abs/2403.13248) [![Paper](https://img.shields.io/badge/Paper-%F0%9F%8E%93-lightblue?style=flat-square)](http://arxiv.org/abs/2403.13248) [![GitHub](https://img.shields.io/badge/Gtihub-%F0%9F%8E%93-lightblue?style=flat-square) ](https://github.com/lichao-sun/Mora)[![Project](https://img.shields.io/badge/Project-%F0%9F%8E%93-lightblue?style=flat-square))](https://llizhaoxu.github.io/moraframework/)
>
> π§ Please let us know if you find a mistake or have any suggestions by e-mail:
[email protected]
## π° News
ποΈ Oct 9: Our Mora update v2 [paper](https://arxiv.org/pdf/2403.13248) and training code will coming soon.
ποΈ Jun 13: Our code is released!
ποΈ Mar 20: Our paper "[Mora: Enabling Generalist Video Generation via A Multi-Agent Framework](https://arxiv.org/abs/2403.13248)" is released!
## What is Mora
Mora is a multi-agent framework designed to facilitate generalist video generation tasks, leveraging a collaborative approach with multiple visual agents. It aims to replicate and extend the capabilities of OpenAI's Sora.
![Task](/image/task.jpg)
## πΉ Demo for Artist Creation
Inspired by OpenAI [Sora: First Impressions](https://openai.com/blog/sora-first-impressions), we utilize Mora to generate Shy kids video. Even though Mora has reached the similar level as Sora in terms of video duration, 80s, Mora still has a significant gap in terms of resolution, object consistency, motion smoothness, etc.
https://github.com/JHL328/test/assets/55661930/abe276f7-12d3-4d24-aff3-7474296e854e
## π₯ Demo (1024Γ576 resolution, 12 seconds and more!)
<p align="left">
<img src="./image/demo1.gif" width="49%" height="auto" />
<img src="./image/demo2.gif" width="49%" height="auto" />
<img src="./image/demo3.gif" width="49%" height="auto" />
<img src="./image/demo4.gif" width="49%" height="auto" />
</p>
## Mora: A Multi-Agent Framework for Video Generation
![test image](/image/method.jpg)
- **Multi-Agent Collaboration**: Utilizes several advanced visual AI agents, each specializing in different aspects of the video generation process, to achieve high-quality outcomes across various tasks.
- **Broad Spectrum of Tasks**: Capable of performing text-to-video generation, text-conditional image-to-video generation, extending generated videos, video-to-video editing, connecting videos, and simulating digital worlds, thereby covering an extensive range of video generation applications.
- **Open-Source and Extendable**: Moraβs open-source nature fosters innovation and collaboration within the community, allowing for continuous improvement and customization.
- **Proven Performance**: Experimental results demonstrate Mora's ability to achieve performance that is close to that of Sora in various tasks, making it a compelling open-source alternative for the video generation domain.
## Results
### Text-to-video generation
<table class="left">
<tr>
<th align="left"><b>Input prompt</b></th>
<th align="left"><b>Output video</b></th>
</tr>
<tr>
<td>A vibrant coral reef teeming with life under the crystal-clear blue ocean, with colorful fish swimming among the coral, rays of sunlight filtering through the water, and a gentle current moving the sea plants. </td>
<td><img src="./image/task_1_demo_1.gif" width=480 height="auto"></td>
</tr>
<tr>
<td>A majestic mountain range covered in snow, with the peaks touching the clouds and a crystal-clear lake at its base, reflecting the mountains and the sky, creating a breathtaking natural mirror.</td>
<td><img src="./image/task_1_demo_2.gif" width=480 height="auto"></td>
</tr>
<tr>
<td>In the middle of a vast desert, a golden desert city appears on the horizon, its architecture a blend of ancient Egyptian and futuristic elements.The city is surrounded by a radiant energy barrier, while in the air, seve</td>
<td><img src="./image/task_1_demo_3.gif" width=480 height="auto"></td>
</tr>
</table>
### Text-conditional image-to-video generation
<table class="left">
<tr>
<th align="left"><b>Input prompt</b></th>
<th align="left"><b>Input image</b></th>
<th align="left"><b>Mora generated Video</b></th>
<th align="left"><b>Sora generated Video</b></th>
</tr>
<tr>
<td>Monster Illustration in the flat design style of a diverse family of monsters. The group includes a furry brown monster, a sleek black monster with antennas, a spotted green monster, and a tiny polka-dotted monster, all interacting in a playful environment. </td>
<td><img src="./image/input1.jpg" width=600 height=90></td>
<td><img src="./image/task2_demo1.gif" width=160 height=90></td>
<td><img src="./image/sora_demo1.gif" width=160 height=90></td>
</tr>
<tr>
<td>An image of a realistic cloud that spells βSORAβ.</td>
<td><img src="./image/input2.jpg" width=600 height=90></td>
<td><img src="./image/task2_demo2.gif" width=160 height=90></td>
<td><img src="./image/sora_demo2.gif" width=160 height=90></td>
</tr>
</table>
### Extend generated video
<table class="left">
<tr>
<th align="left"><b>Original video</b></th>
<th align="left"><b>Mora extended video</b></th>
<th align="left"><b>Sora extended video</b></th>
</tr>
<tr>
<td><img src="./image/original video.gif" width=330 height="auto"></td>
<td><img src="./image/mora_task3.gif" width=330 height="auto"></td>
<td><img src="./image/task3_sora.gif" width=330 height="auto"></td>
</tr>
</table>
### Video-to-video editing
<table class="left">
<tr>
<th align="left"><b>Instruction</b></th>
<th align="left"><b>Original video</b></th>
<th align="left"><b>Mora edited Video</b></th>
<th align="left"><b>Sora edited Video</b></th>
</tr>
<tr>
<td>Change the setting to the 1920s with an old school car. make sure to keep the red color.</td>
<td><img src="./image/task4_original.gif" width=240 height="auto"></td>
<td><img src="./image/task4_mora_1920.gif" width=240 height="auto"></td>
<td><img src="./image/task4_sora_1920.gif" width=240 height="auto"></td>
</tr>
<tr>
<td>Put the video in space with a rainbow road</td>
<td><img src="./image/task4_original.gif" width=240 height="auto"></td>
<td><img src="./image/task4_mora_rainbow.gif" width=240 height="auto"></td>
<td><img src="./image/task4_sora_rainbow.gif" width=240 height="auto"></td>
</tr>
</table>
### Connect videos
<table class="left">
<tr>
<th align="left"><b>Input previous video</b></th>
<th align="left"><b>Input next video</b></th>
<th align="left"><b>Output connect Video</b></th>
</tr>
<tr>
<td><img src="./image/task5_mora1.gif" width=300 height="auto"></td>
<td><img src="./image/task5_mora2.gif" width=300 height="auto"></td>
<td><img src="./image/task5_mora.gif" width=300 height="auto"></td>
</tr>
<tr>
<td><img src="./image/task5_sora1.gif" width=300 height="auto"></td>
<td><img src="./image/task5_sora2.gif" width=300 height="auto"></td>
<td><img src="./image/task5_sora.gif" width=300 height="auto"></td>
</tr>
</table>
### Simulate digital worlds
<table class="left">
<tr>
<th align="left"><b>Mora simulating video</b></th>
<th align="left"><b>Sora simulating video</b></th>
</tr>
<tr>
<td><img src="./image/task6_mora1.gif" width="100%" height="auto"></td>
<td><img src="./image/task6_sora1.gif" width="100%" height="auto"></td>
</tr>
<tr>
<td><img src="./image/task6_mora2.gif" width="100%" height="auto"></td>
<td><img src="./image/task6_sora2.gif" width="100%" height="auto"></td>
</tr>
</table>
## Getting Started
Code will be released as soon as possible!
## Citation
```
@article{yuan2024mora,
title={Mora: Enabling Generalist Video Generation via A Multi-Agent Framework},
author={Yuan, Zhengqing and Chen, Ruoxi and Li, Zhaoxu and Jia, Haolong and He, Lifang and Wang, Chi and Sun, Lichao},
journal={arXiv preprint arXiv:2403.13248},
year={2024}
}
```
```
@article{liu2024sora,
title={Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models},
author={Liu, Yixin and Zhang, Kai and Li, Yuan and Yan, Zhiling and Gao, Chujie and Chen, Ruoxi and Yuan, Zhengqing and Huang, Yue and Sun, Hanchi and Gao, Jianfeng and others},
journal={arXiv preprint arXiv:2402.17177},
year={2024}
}
```
```
@misc{openai2024sorareport,
title={Video generation models as world simulators},
author={OpenAI},
year={2024},
howpublished={https://openai.com/research/video-generation-models-as-world-simulators},
}
```
", Assign "at most 3 tags" to the expected json: {"id":"8806","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"