AI prompts
base on # Tracking Everything Everywhere All at Once
PyTorch Implementation for paper [Tracking Everything Everywhere All at Once]((https://omnimotion.github.io/)), ICCV 2023.
[Qianqian Wang](https://www.cs.cornell.edu/~qqw/) <sup>1,2</sup>,
[Yen-Yu Chang](https://yuyuchang.github.io/) <sup>1</sup>,
[Ruojin Cai](https://www.cs.cornell.edu/~ruojin/) <sup>1</sup>,
[Zhengqi Li](https://zhengqili.github.io/) <sup>2</sup>,
[Bharath Hariharan](https://www.cs.cornell.edu/~bharathh/) <sup>1</sup>,
[Aleksander Holynski](https://holynski.org/) <sup>2,3</sup>,
[Noah Snavely](https://www.cs.cornell.edu/~snavely/) <sup>1,2</sup>
<br>
<sup>1</sup>Cornell University, <sup>2</sup>Google Research, <sup>3</sup>UC Berkeley
#### [Project Page](https://omnimotion.github.io/) | [Paper](https://arxiv.org/pdf/2306.05422.pdf) | [Video](https://www.youtube.com/watch?v=KHoAG3gA024)
## Installation
The code is tested with `python=3.8` and `torch=1.10.0+cu111` on an A100 GPU.
```
git clone --recurse-submodules https://github.com/qianqianwang68/omnimotion/
cd omnimotion/
conda create -n omnimotion python=3.8
conda activate omnimotion
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install matplotlib tensorboard scipy opencv-python tqdm tensorboardX configargparse ipdb kornia imageio[ffmpeg]
```
## Training
1. Please refer to the [preprocessing instructions](preprocessing/README.md) for preparing input data
for training OmniMotion. We also provide some processed [data](https://omnimotion.cs.cornell.edu/dataset/)
that you can download, unzip and directly train on. (Note that depending on the network speed,
it may be faster to run the processing script locally than downloading the processed data).
2. With processed input data, run the following command to start training:
```
python train.py --config configs/default.txt --data_dir {sequence_directory}
```
You can view visualizations on tensorboard by running `tensorboard --logdir logs/`.
By default, the script trains 100k iterations which takes 8~9h on an A100 GPU and 12-13h on RTX4090.
If you want to skip the optimization and see what the results/formats look like, we provide the weights
for a few sequences [here](https://drive.google.com/drive/folders/16ekLy-4LTkYAavYrWaKk2qUpJ9TyMXlO?usp=sharing).
You can use `viz.py` to visualize the correspondences produced by the models. Please refer to the next section for more details.
## Visualization
The training pipeline generates visualizations (correspondences, pseudo-depth maps, etc) every certain number of steps (saved in `args.out_dir/vis`).
You can also visualize grid points / trails after training by running:
```
python viz.py --config configs/default.txt --data_dir {sequence_directory}
```
Make sure `expname` and `data_dir` are correctly specified, so that the
model and data can be loaded. By specifying `expname`, the latest checkpoints that match that `expname`
will be loaded. Alternatively, you can specify `ckpt_path` to select a particular checkpoint.
To generate the motion trail visualization, foreground/background segmentation mask is required.
For DAVIS videos one can just use the mask annotations provided by the dataset. For custom videos that don't come with
foreground segmentation masks, you can use [remove.bg](https://www.remove.bg/) to remove the background
for the query frame, download the masked image and set `foreground_mask_path` to its path.
[Here](https://omnimotion.cs.cornell.edu/dataset/mask_0.png) is an example of the masked image for the first frame
of the `butterfly` sequence.
```
python viz.py --config configs/default.txt --data_dir {sequence_directory} --foreground_mask_path {mask_file_path}
```
If you download the provided model weights for a sequence from [here](https://drive.google.com/drive/folders/16ekLy-4LTkYAavYrWaKk2qUpJ9TyMXlO?usp=sharing),
you can visualize the correspondences by running the `viz.py` script and
setting `data_dir` to the unzipped directory, `ckpt_path` to the path for
`model_100000.pth` in the directory, and optionally
`foreground_mask_path`as the path to `mask_0.png`
(only required for non-DAVIS sequences `butterfly`, `kangaroo`, and `swing_tire` if you want to visualize their motion trails).
## Troubleshooting
- The training code utilizes approximately 22GB of CUDA memory. If you encounter CUDA out of memory errors,
you may consider reducing the number of sampled points `num_pts` and the chunk size `chunk_size`.
- Due to the highly non-convex nature of the underlying optimization problem, we observe that the optimization process
can be sensitive to initialization for certain difficult videos. If you notice significant inaccuracies in surface
orderings (by examining the pseudo depth maps) persist after 40k steps,
it is very likely that training won't recover from that. You may consider restarting the training with a
different `loader_seed` to change the initialization.
If surfaces are incorrectly put at the nearest depth planes (which are not supposed to be the closest),
we found using `mask_near` to disable near samples in the beginning of the training could help in some cases.
- Another common failure we noticed is that instead of creating a single object in the canonical space with
correct motion, the method creates duplicated objects in the canonical space with short-ranged motion for each.
This has to do with both that the input correspondences on the object being sparse and short-ranged,
and the optimization being stuck at local minima. This issue may be alleviated with better and longer-range input correspondences
such as from [TAPIR](https://deepmind-tapir.github.io/) and [CoTracker](https://co-tracker.github.io/).
Alternatively, you may consider adjusting `loader_seed` or the learning rates.
## Citation
```
@article{wang2023omnimotion,
title = {Tracking Everything Everywhere All at Once},
author = {Wang, Qianqian and Chang, Yen-Yu and Cai, Ruojin and Li, Zhengqi and Hariharan, Bharath and Holynski, Aleksander and Snavely, Noah},
journal = {ICCV},
year = {2023}
}
```
", Assign "at most 3 tags" to the expected json: {"id":"3107","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"