Trendshift - Ask AI

base on [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation <div align="center"> <h1>Depth Anything V2</h1> [**Lihe Yang**](https://liheyoung.github.io/)<sup>1</sup> · [**Bingyi Kang**](https://bingykang.github.io/)<sup>2&dagger;</sup> · [**Zilong Huang**](http://speedinghzl.github.io/)<sup>2</sup> <br> [**Zhen Zhao**](http://zhaozhen.me/) · [**Xiaogang Xu**](https://xiaogang00.github.io/) · [**Jiashi Feng**](https://sites.google.com/site/jshfeng/)<sup>2</sup> · [**Hengshuang Zhao**](https://hszhao.github.io/)<sup>1*</sup> <sup>1</sup>HKU&emsp;&emsp;&emsp;<sup>2</sup>TikTok <br> &dagger;project lead&emsp;*corresponding author <a href="https://arxiv.org/abs/2406.09414"><img src='https://img.shields.io/badge/arXiv-Depth Anything V2-red' alt='Paper PDF'></a> <a href='https://depth-anything-v2.github.io'><img src='https://img.shields.io/badge/Project_Page-Depth Anything V2-green' alt='Project Page'></a> <a href='https://huggingface.co/spaces/depth-anything/Depth-Anything-V2'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'></a> <a href='https://huggingface.co/datasets/depth-anything/DA-2K'><img src='https://img.shields.io/badge/Benchmark-DA--2K-yellow' alt='Benchmark'></a> </div> This work presents Depth Anything V2. It significantly outperforms [V1](https://github.com/LiheYoung/Depth-Anything) in fine-grained details and robustness. Compared with SD-based models, it enjoys faster inference speed, fewer parameters, and higher depth accuracy. ![teaser](assets/teaser.png) ## News - **2025-01-22:** [Video Depth Anything](https://videodepthanything.github.io) has been released. It generates consistent depth maps for super-long videos (e.g., over 5 minutes). - **2024-12-22:** [Prompt Depth Anything](https://promptda.github.io/) has been released. It supports 4K resolution metric depth estimation when low-res LiDAR is used to prompt the DA models. - **2024-07-06:** Depth Anything V2 is supported in [Transformers](https://github.com/huggingface/transformers/). See the [instructions](https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything_v2) for convenient usage. - **2024-06-25:** Depth Anything is integrated into [Apple Core ML Models](https://developer.apple.com/machine-learning/models/). See the instructions ([V1](https://huggingface.co/apple/coreml-depth-anything-small), [V2](https://huggingface.co/apple/coreml-depth-anything-v2-small)) for usage. - **2024-06-22:** We release [smaller metric depth models](https://github.com/DepthAnything/Depth-Anything-V2/tree/main/metric_depth#pre-trained-models) based on Depth-Anything-V2-Small and Base. - **2024-06-20:** Our repository and project page are flagged by GitHub and removed from the public for 6 days. Sorry for the inconvenience. - **2024-06-14:** Paper, project page, code, models, demo, and benchmark are all released. ## Pre-trained Models We provide **four models** of varying scales for robust relative depth estimation: | Model | Params | Checkpoint | |:-|-:|:-:| | Depth-Anything-V2-Small | 24.8M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth?download=true) | | Depth-Anything-V2-Base | 97.5M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Base/resolve/main/depth_anything_v2_vitb.pth?download=true) | | Depth-Anything-V2-Large | 335.3M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true) | | Depth-Anything-V2-Giant | 1.3B | Coming soon | ## Usage ### Prepraration ```bash git clone https://github.com/DepthAnything/Depth-Anything-V2 cd Depth-Anything-V2 pip install -r requirements.txt ``` Download the checkpoints listed [here](#pre-trained-models) and put them under the `checkpoints` directory. ### Use our models ```python import cv2 import torch from depth_anything_v2.dpt import DepthAnythingV2 DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu' model_configs = { 'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]}, 'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]}, 'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]}, 'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]} } encoder = 'vitl' # or 'vits', 'vitb', 'vitg' model = DepthAnythingV2(**model_configs[encoder]) model.load_state_dict(torch.load(f'checkpoints/depth_anything_v2_{encoder}.pth', map_location='cpu')) model = model.to(DEVICE).eval() raw_img = cv2.imread('your/image/path') depth = model.infer_image(raw_img) # HxW raw depth map in numpy ``` If you do not want to clone this repository, you can also load our models through [Transformers](https://github.com/huggingface/transformers/). Below is a simple code snippet. Please refer to the [official page](https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything_v2) for more details. - Note 1: Make sure you can connect to Hugging Face and have installed the latest Transformers. - Note 2: Due to the [upsampling difference](https://github.com/huggingface/transformers/pull/31522#issuecomment-2184123463) between OpenCV (we used) and Pillow (HF used), predictions may differ slightly. So you are more recommended to use our models through the way introduced above. ```python from transformers import pipeline from PIL import Image pipe = pipeline(task="depth-estimation", model="depth-anything/Depth-Anything-V2-Small-hf") image = Image.open('your/image/path') depth = pipe(image)["depth"] ``` ### Running script on *images* ```bash python run.py \ --encoder <vits | vitb | vitl | vitg> \ --img-path <path> --outdir <outdir> \ [--input-size <size>] [--pred-only] [--grayscale] ``` Options: - `--img-path`: You can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths. - `--input-size` (optional): By default, we use input size `518` for model inference. ***You can increase the size for even more fine-grained results.*** - `--pred-only` (optional): Only save the predicted depth map, without raw image. - `--grayscale` (optional): Save the grayscale depth map, without applying color palette. For example: ```bash python run.py --encoder vitl --img-path assets/examples --outdir depth_vis ``` ### Running script on *videos* ```bash python run_video.py \ --encoder <vits | vitb | vitl | vitg> \ --video-path assets/examples_video --outdir video_depth_vis \ [--input-size <size>] [--pred-only] [--grayscale] ``` ***Our larger model has better temporal consistency on videos.*** ### Gradio demo To use our gradio demo locally: ```bash python app.py ``` You can also try our [online demo](https://huggingface.co/spaces/Depth-Anything/Depth-Anything-V2). ***Note: Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this [issue](https://github.com/LiheYoung/Depth-Anything/issues/81)).*** In V1, we *unintentionally* used features from the last four layers of DINOv2 for decoding. In V2, we use [intermediate features](https://github.com/DepthAnything/Depth-Anything-V2/blob/2cbc36a8ce2cec41d38ee51153f112e87c8e42d8/depth_anything_v2/dpt.py#L164-L169) instead. Although this modification did not improve details or accuracy, we decided to follow this common practice. ## Fine-tuned to Metric Depth Estimation Please refer to [metric depth estimation](./metric_depth). ## DA-2K Evaluation Benchmark Please refer to [DA-2K benchmark](./DA-2K.md). ## Community Support **We sincerely appreciate all the community support for our Depth Anything series. Thank you a lot!** - Apple Core ML: - https://developer.apple.com/machine-learning/models - https://huggingface.co/apple/coreml-depth-anything-v2-small - https://huggingface.co/apple/coreml-depth-anything-small - Transformers: - https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything_v2 - https://huggingface.co/docs/transformers/main/en/model_doc/depth_anything - TensorRT: - https://github.com/spacewalk01/depth-anything-tensorrt - https://github.com/zhujiajian98/Depth-Anythingv2-TensorRT-python - ONNX: https://github.com/fabio-sim/Depth-Anything-ONNX - ComfyUI: https://github.com/kijai/ComfyUI-DepthAnythingV2 - Transformers.js (real-time depth in web): https://huggingface.co/spaces/Xenova/webgpu-realtime-depth-estimation - Android: - https://github.com/shubham0204/Depth-Anything-Android - https://github.com/FeiGeChuanShu/ncnn-android-depth_anything ## Acknowledgement We are sincerely grateful to the awesome Hugging Face team ([@Pedro Cuenca](https://huggingface.co/pcuenq), [@Niels Rogge](https://huggingface.co/nielsr), [@Merve Noyan](https://huggingface.co/merve), [@Amy Roberts](https://huggingface.co/amyeroberts), et al.) for their huge efforts in supporting our models in Transformers and Apple Core ML. We also thank the [DINOv2](https://github.com/facebookresearch/dinov2) team for contributing such impressive models to our community. ## LICENSE Depth-Anything-V2-Small model is under the Apache-2.0 license. Depth-Anything-V2-Base/Large/Giant models are under the CC-BY-NC-4.0 license. ## Citation If you find this project useful, please consider citing: ```bibtex @article{depth_anything_v2, title={Depth Anything V2}, author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang}, journal={arXiv:2406.09414}, year={2024} } @inproceedings{depth_anything_v1, title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang}, booktitle={CVPR}, year={2024} } ``` ", Assign "at most 3 tags" to the expected json: {"id":"12804","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"

AI prompts