AI prompts
base on [CVPR 2024] Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis # Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
## CVPR 2024
[Project Page](https://oppo-us-research.github.io/SpacetimeGaussians-website/) | [Paper](https://arxiv.org/abs/2312.16812) | [Video](https://youtu.be/YsPPmf-E6Lg) | [Viewer & Pre-Trained Models](https://huggingface.co/stack93/spacetimegaussians/tree/main)
This is an official implementation of the paper "Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis".</br>
[Zhan Li](https://lizhan17.github.io/web/)<sup>1,2</sup>,
[Zhang Chen](https://zhangchen8.github.io/)<sup>1,†</sup>,
[Zhong Li](https://sites.google.com/site/lizhong19900216)<sup>1,†</sup>,
[Yi Xu](https://www.linkedin.com/in/yi-xu-42654823/)<sup>1</sup> </br>
<sup>1</sup> OPPO US Research Center, <sup>2</sup> Portland State University </br>
<sup>†</sup> Corresponding authors </br>
<img src="assets/output.gif" width="100%"/></br>
## Updates and News
- `Jun 16, 2024`: Added fully fused mlp for testing ours-full models on Technicolor and Neural 3D dataset (40 FPS improvement compared to paper).
- `Jun 13, 2024`: Fixed minors for reproducity on the scenes ```coffee_martini``` and ```flame_salmon_1``` (~ 0.1 PSNR).
- `Jun 9, 2024` : Supported lazy loading and ground truth image as int8 in GPU.
- `Dec 28, 2023`: Paper and Code are released.
## Table of Contents
1. [Installation](#installation)
1. [Preprocess Datasets](#processing-datasets)
1. [Training](#training)
1. [Testing](#testing)
1. [Real-Time Viewer](#real-time-viewer)
1. [Creating Your Gaussians](#create-your-new-representations-and-rendering-pipeline)
1. [License Infomration](#license-information)
1. [Acknowledgement](#acknowledgement)
1. [Citations](#citations)
## Installation
### Windows users with WSL2 :
Please first refer to [here](./script/wsl.md) to install the WSL2 system (Windows Subsystem for Linux 2) and install dependencies inside WSL2. Then you can set up our repo inside the Linux sub-system same as other Linux users.
### Linux users :
Clone the source code of this repo.
```
git clone https://github.com/oppo-us-research/SpacetimeGaussians.git --recursive
cd SpacetimeGaussians
```
Then run the following command to install the environments with conda.
Note we will create two environments, one for preprocessing with colmap (```colmapenv```) and one for training and testing (```feature_splatting```). Training, testing and preprocessing have been tested on Ubuntu 20.04. </br>
```
bash script/setup.sh
```
Note that you may need to manually install the following packages if you encounter errors during the installation of the above command. </br>
```
conda activate feature_splatting
pip install thirdparty/gaussian_splatting/submodules/gaussian_rasterization_ch9
pip install thirdparty/gaussian_splatting/submodules/gaussian_rasterization_ch3
pip install thirdparty/gaussian_splatting/submodules/forward_lite
pip install thirdparty/gaussian_splatting/submodules/forward_full
```
## Processing Datasets
Note, our paper uses the sparse points that follow 3DGS. Our per frame SfM points only use ```point_triangulator``` in Colmap instead of dense points.
### Neural 3D Dataset
Download the dataset from [here](https://github.com/facebookresearch/Neural_3D_Video.git).
After downloading the dataset, you can run the following command to preprocess the dataset. </br>
```
conda activate colmapenv
python script/pre_n3d.py --videopath <location>/<scene>
```
```<location>``` is the path to the dataset root folder, and ```<scene>``` is the name of a scene in the dataset. </br>
- For example if you put the dataset at ```/home/Neural3D```, and want to preprocess the ```cook_spinach``` scene, you can run the following command
```
conda activate colmapenv
python script/pre_n3d.py --videopath /home/Neural3D/cook_spinach/
```
Our codebase expects the following directory structure for the Neural 3D Dataset after preprocessing:
```
<location>
|---cook_spinach
| |---colmap_<0>
| |---colmap_<...>
| |---colmap_<299>
|---flame_salmon1
```
### Technicolor Dataset
Please reach out to the authors of the paper "Dataset and Pipeline for Multi-View Light-Field Video" for access to the Technicolor dataset. </br>
Our codebase expects the following directory structure for this dataset before preprocessing:
```
<location>
|---Fabien
| |---Fabien_undist_<00257>_<08>.png
| |---Fabien_undist_<.....>_<..>.png
|---Birthday
```
Then run the following command to preprocess the dataset. </br>
```
conda activate colmapenv
python script/pre_technicolor.py --videopath <location>/<scene>
```
### Google Immersive Dataset
Download the dataset from [here](https://github.com/augmentedperception/deepview_video_dataset).
After downloading and unzip the dataset, you can run the following command to preprocess the dataset. </br>
```
conda activate colmapenv
python script/pre_immersive_distorted.py --videopath <location>/<scene>
python script/pre_immersive_undistorted.py --videopath <location>/<scene>
```
```<location>``` is the path to the dataset root folder, and ```<scene>``` is the name of a scene in the dataset. Please rename the orginal file to the name list ```Immersiveseven```in [here](./script/pre_immersive_distorted.py)
- For example if you put the dataset at ```/home/immersive```, and want to preprocess the ```02_Flames``` scene, you can run the following command
```
conda activate colmapenv
python script/pre_immersive_distorted.py --videopath /home/immersive/02_Flames/
```
1. Our codebase expects the following directory structure for immersive dataset before preprocessing
```
<location>
|---02_Flames
| |---camera_0001.mp4
| |---camera_0002.mp4
|---09_Alexa
```
2. Our codebase expects the following directory structure for immersive dataset (raw video, decoded images, distorted and undistorted) after preprocessing:
```
<location>
|---02_Flames
| |---camera_0001
| |---camera_0001.mp4
| |---camera_<...>
|---02_Flames_dist
| |---colmap_<0>
| |---colmap_<...>
| |---colmap_<299>
|---02_Flames_undist
| |---colmap_<0>
| |---colmap_<...>
| |---colmap_<299>
|---09_Alexa
|---09_Alexa_dist
|---09_Alexa_undist
```
3. Copy the picked views files to the scene dir. The views is generated by inferencing our model initialized with ```duration=1``` points without training. We provide generated views in pkl for reproducity and simplicity.
- For example, for the scene ```09_Alexa``` with distortion model.
copy ```configs/im_view/09_Alexa/pickview.pkl``` to ```<location>/09_Alexa_dist/pickview.pkl```
## Training
You can train our model by running the following command: </br>
```
conda activate feature_splatting
python train.py --quiet --eval --config configs/<dataset>_<lite|full>/<scene>.json --model_path <path to save model> --source_path <location>/<scene>/colmap_0
```
In the argument to ```--config```, ```<dataset>``` can be ```n3d``` (for Neural 3D Dataset) or ```techni``` (for Technicolor Dataset), and you can choose between ```full``` model or ```lite``` model. </br>
You need 24GB GPU memory to train on the Neural 3D Dataset. </br>
You need 48GB GPU memory to train on the Technicolor Dataset. </br>
The large memory requirement is because training images are loaded into GPU memory. </br>
- For example, if you want to train the **lite** model on the first 50 frames of the ```cook_spinach``` scene in the Neural 3D Dataset, you can run the following command </br>
```
python train.py --quiet --eval --config configs/n3d_lite/cook_spinach.json --model_path log/cook_spinach_lite --source_path <location>/cook_spinach/colmap_0
```
- If you want to train the **full** model, you can run the following command </br>
```
python train.py --quiet --eval --config configs/n3d_full/cook_spinach.json --model_path log/cook_spinach/colmap_0 --source_path <location>/cook_spinach/colmap_0
```
Please refer to the .json config files for more options.
- If you want to train the **full** model with **distorted** immersive dataset, you can run the following command </br>
```
PYTHONDONTWRITEBYTECODE=1 python train_imdist.py --quiet --eval --config configs/im_distort_full/02_Flames.json --model_path log/02_Flames/colmap_0 --source_path <location>/02_Flames_dist/colmap_0
```
Note, sometimes pycache file somehow affects the results. Please remove every pycache file and retrain the model without generating BYTECODE by ```PYTHONDONTWRITEBYTECODE=1```.
- If you want to train the **lite** model with **undistorted** immersive dataset.
Note, we remove the ```--eval``` to reuse the loader of technicolor and also to train with all cameras. ```gtmask 1``` is specially for training with undistorted fisheye images that have black pixels.
```
python train.py --quiet --gtmask 1 --config configs/im_undistort_lite/02_Flames.json --model_path log/02_Flames/colmap_0 --source_path <location>/02_Flames_undist/colmap_0
```
Please refer to the .json config files for more options.
## Testing
- Test model on Neural 3D Dataset
```
python test.py --quiet --eval --skip_train --valloader colmapvalid --configpath configs/n3d_<lite|full>/<scene>.json --model_path <path to model> --source_path <location>/<scene>/colmap_0
```
- Test model on Technicolor Dataset
```
python test.py --quiet --eval --skip_train --valloader technicolorvalid --configpath configs/techni_<lite|full>/<scene>.json --model_path <path to model> --source_path <location>/<scenename>/colmap_0
```
- Test on Google Immersive Dataset with distortion camera model
Fist Install fused mlp layer.
```
pip install thirdparty/gaussian_splatting/submodules/forward_full
```
```
PYTHONDONTWRITEBYTECODE=1 CUDA_VISIBLE_DEVICES=0 python test.py --quiet --eval --skip_train --valloader immersivevalidss --configpath configs/im_distort_<lite|full>/<scene>.json --model_path <path to model> --source_path <location>/<scenename>/colmap_0
```
## Real-Time Viewer
The viewer is based on [SIBR](https://sibr.gitlabpages.inria.fr/) and [Gaussian Splatting](https://github.com/graphdeco-inria/gaussian-splatting).
### Pre-built Windows Binary
Download the viewer binary from [this link](https://huggingface.co/stack93/spacetimegaussians/tree/main) and unzip it. The binary works for Windows with CUDA >= 11.0.
We also provide pre-trained models in the link. For example, [n3d_sear_steak_lite_allcam.zip](https://huggingface.co/stack93/spacetimegaussians/blob/main/n3d_sear_steak_lite_allcam.zip) contains the lite model that uses all views during training for the sear_steak scene in the Neural 3D Dataset.
### Installation from Source
please see bottom commented text [this link](./script/setup.sh)
### Running the Real-Time Viewer
After downloading the pre-built binary or installing from source, you can use the following command to run the real-time viewer. Adjust ```--iteration``` to match the training iterations of model. </br>
```
./<SIBR install dir>/bin/SIBR_gaussianViewer_app_rwdi.exe --iteration 25000 -m <path to trained model>
```
The above command has beed tested on Nvidia RTX 3050 Laptop GPU + Windows 10.
- For 8K rendering, you can use the following command. </br>
```
./<SIBR install dir>/bin/SIBR_gaussianViewer_app_rwdi.exe --iteration 25000 --rendering-size 8000 4000 --force-aspect-ratio 1 -m <path to trained model>
```
8K rendering has been tested on Nvidia RTX 4090 + Windows 11.
### Third Party Implemented Web Viewer
We thank Kevin Kwok (Antimatter15) for the amazing web viewer of our method: splaTV . The web viewer is released at [github](https://github.com/antimatter15/splaTV).
You can view one of our scene from the [web viewer](http://antimatter15.com/splaTV/).
## Create Your New Representations and Rendering Pipeline
If you want to customize our codebase for your own models, you can refer to the following steps </br>
- Step 1: Create a new Gaussian representation in this [folder](./thirdparty/gaussian_splatting/scene/). You can use ```oursfull.py``` or ```ourslite.py``` as a template. </br>
- Step 2: Create a new rendering pipeline in this [file](./thirdparty/gaussian_splatting/renderer/__init__.py). You can use the ```train_ours_full``` function as a template. </br>
- Step 3 (For new dataset, optional): Create a new dataloader in this [file](./thirdparty/gaussian_splatting/scene/__init__.py) and this [file](./thirdparty/gaussian_splatting/scene/dataset_readers.py). </br>
- Step 4: Update the intermidiate API in ```getmodel``` (for Step 1) and ```getrenderpip``` (for Step 2) functions in ```helper_train.py```.</br>
## License Information
The code in this repository (except the thirdparty folder) is licensed under MIT licence, see [LICENSE](LICENSE). thirdparty/gaussian_splatting is licensed under Gaussian-Splatting license, see [thirdparty/gaussian_splatting/LICENSE.md](thirdparty/gaussian_splatting/LICENSE.md). thirdparty/colmap is licensed under new BSD license, see [thirdparty/colmap/LICENSE.txt](thirdparty/colmap/LICENSE.txt).
## Acknowledgement
We sincerely thank the owners of the following source code repos, which are used by our released codes:
[Gaussian Splatting](https://github.com/graphdeco-inria/gaussian-splatting),
[Colmap](https://github.com/colmap/colmap).
Some parts of our code referenced the following repos:
[Gaussian Splatting with Depth](https://github.com/JonathonLuiten/diff-gaussian-rasterization-w-depth),
[SIBR](https://gitlab.inria.fr/sibr/sibr_core.git),
[fisheye-distortion](https://github.com/Synthesis-AI-Dev/fisheye-distortion).
We sincerely thank the anonymous reviewers of CVPR2024 for their helpful feedbacks.
we also thank Michael Rubloff for his post on [radiancefields](https://radiancefields.com/splatv-dynamic-gaussian-splatting-viewer/).
We also want to thank MrNeRF for [posts](https://x.com/janusch_patas/status/1740621964480217113?s=20) about our paper and other Guassian Splatting based papers.
## Citations
Please cite our paper if you find it useful for your research.
```
@InProceedings{Li_STG_2024_CVPR,
author = {Li, Zhan and Chen, Zhang and Li, Zhong and Xu, Yi},
title = {Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {8508-8520}
}
```
Please also cite the following paper if you use Gaussian Splatting.
```
@Article{kerbl3Dgaussians,
author = {Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George},
title = {3D Gaussian Splatting for Real-Time Radiance Field Rendering},
journal = {ACM Transactions on Graphics},
number = {4},
volume = {42},
month = {July},
year = {2023},
url = {https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/}
}
```
", Assign "at most 3 tags" to the expected json: {"id":"6517","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"