Trendshift - Ask AI

base on [ECCV 2024] The official code of paper "Open-Vocabulary SAM". # Open-Vocabulary SAM [ECCV-2024] [Haobo Yuan1](https://yuanhaobo.me), [Xiangtai Li1](https://lxtgh.github.io), [Chong Zhou1](https://chongzhou96.github.io), [Yining Li2](https://scholar.google.com/citations?user=y_cp1sUAAAAJ), [Kai Chen2](https://chenkai.site), [Chen Change Loy1](https://www.mmlab-ntu.com/person/ccloy/). [1S-Lab, Nanyang Technological University](https://www.mmlab-ntu.com/), [2Shanghai Artificial Intelligence Laboratory](https://www.shlab.org.cn/) [![arXiv](https://img.shields.io/badge/arXiv-2401.02955-b31b1b.svg)](https://arxiv.org/abs/2401.02955) [![Project Page](https://img.shields.io/badge/OVSAM-Project%20Page-green)](https://www.mmlab-ntu.com/project/ovsam) [![HuggingFace Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-App-blue)](https://huggingface.co/spaces/HarborYuan/ovsam) [![Open in OpenXLab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/houshaowei/Open-Vocabulary_SAM) # RWKV-SAM [Arxiv](https://arxiv.org/abs/2406.19369) [Haobo Yuan1](https://yuanhaobo.me), [Xiangtai Li2,1](https://lxtgh.github.io), [Tao Zhang2](https://zhang-tao-whu.github.io/), [Lu Qi3](http://luqi.info/), [Ming-Hsuan Yang3](http://faculty.ucmerced.edu/mhyang/), [Shuicheng Yan2](https://yanshuicheng.info/), [Chen Change Loy1](https://www.mmlab-ntu.com/person/ccloy/). [1S-Lab, Nanyang Technological University](https://www.mmlab-ntu.com/), [2SkyworkAI]() [3UC Merced]() ## 📰 News * **` Jul. 2, 2024`:** Open-Vocabulary SAM has been accepted by [ECCV 2024](https://eccv2024.ecva.net). * **` Jun. 27, 2024`:** Release RWKV-SAM code and model [Paper](https://arxiv.org/abs/2406.19369). Please check out the [folder](https://github.com/HarborYuan/ovsam/tree/main/projects/rwkvsam). ## 👀 Overview We introduce the Open-Vocabulary SAM, a SAM-inspired model designed for simultaneous interactive segmentation and recognition, leveraging two unique knowledge transfer modules: SAM2CLIP and CLIP2SAM. The former adapts SAM's knowledge into the CLIP via distillation and learnable transformer adapters, while the latter transfers CLIP knowledge into SAM, enhancing its recognition capabilities. <img src="https://www.mmlab-ntu.com/project/ovsam/img/ovsam_teaser.jpg" alt="OVSAM overview"> ## 🔧Usage To play with Open-Vocabulary SAM, you can: 1. Try the online demo on the [🤗Hugging Face Space](https://huggingface.co/spaces/HarborYuan/ovsam). Thanks for the generous support of the Hugging Face team. 2. Run the gradio demo locally by cloning and running the [repo](https://huggingface.co/spaces/HarborYuan/ovsam/tree/main) on 🤗Hugging Face: ```commandline git lfs install git clone https://huggingface.co/spaces/HarborYuan/ovsam ovsam_demo cd ovsam_demo conda create -n ovsam_demo python=3.10 && conda activate ovsam_demo python -m pip install gradio==4.7.1 python -m pip install -r requirements.txt python main.py ``` 3. Try to train or evaluate in this repo following the instructions below. ## ⚙️ Installation We use conda to manage the environment. Pytorch installation: ```commandline conda install pytorch torchvision torchaudio pytorch-cuda=12.1 cuda -c pytorch -c "nvidia/label/cuda-12.1.0" -c "nvidia/label/cuda-12.1.1" ``` mmengine installation: ```commandline python -m pip install https://github.com/open-mmlab/mmengine/archive/refs/tags/v0.8.5.zip ``` mmcv installation (note that older version mmcv before this commit may cause bugs): ```commandline TORCH_CUDA_ARCH_LIST="{COMCAP}" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" CUDA_HOME=$(dirname $(dirname $(which nvcc))) LD_LIBRARY_PATH=$(dirname $(dirname $(which nvcc)))/lib MMCV_WITH_OPS=1 FORCE_CUDA=1 python -m pip install git+https://github.com/open-mmlab/mmcv.git@4f65f91db6502d990ce2ee5de0337441fb69dd10 ``` Please ask ChatGPT to get `COMCAP`: ```text What is the `Compute Capability` of NVIDIA {YOUR GPU MODEL}? Please only output the number, without text. ``` Other OpenMMLab packages: ```commandline python -m pip install \ https://github.com/open-mmlab/mmdetection/archive/refs/tags/v3.1.0.zip \ https://github.com/open-mmlab/mmsegmentation/archive/refs/tags/v1.1.1.zip \ https://github.com/open-mmlab/mmpretrain/archive/refs/tags/v1.0.1.zip ``` Extra packages: ```commandline python -m pip install git+https://github.com/cocodataset/panopticapi.git \ git+https://github.com/HarborYuan/lvis-api.git \ tqdm terminaltables pycocotools scipy tqdm ftfy regex timm scikit-image kornia ``` ## 📈 Datasets Datasets should be put in the `data/` folder of this project similar to [mmdet](https://mmdetection.readthedocs.io/en/latest/user_guides/tracking_dataset_prepare.html). Please prepare dataset in the following format. ### COCO dataset ```text ├── coco │ ├── annotations │ │ ├── panoptic_{train,val}2017.json │ │ ├── instance_{train,val}2017.json │ ├── train2017 │ ├── val2017 │ ├── panoptic_{train,val}2017/ # png annotations ``` ### SAM dataset ```text ├── sam │ ├── train.txt │ ├── val.txt │ ├── sa_000020 │ │ ├── sa_223750.jpg │ │ ├── sa_223750.json │ │ ├── ... │ ├── ... ``` `train.txt` and `val.txt` should contain all the folders you need: ```text sa_000020 sa_000021 ... ``` ## 🚀 Training Please extract the language embeddings first. ```commandline bash tools/dist.sh gen_cls seg/configs/ovsam/ovsam_coco_rn50x16_point.py 8 ``` ### SAM2CLIP SAM feature extraction: ```commandline bash tools/dist.sh test seg/configs/sam2clip/sam_vith_dump.py 8 ``` SAM2CLIP training: ```commandline bash tools/dist.sh train seg/configs/sam2clip/sam2clip_vith_rn50x16.py 8 ``` ### CLIP2SAM CLIP2SAM training: ```commandline bash tools/dist.sh train seg/configs/clip2sam/clip2sam_coco_rn50x16.py 8 ``` ## 🏃‍♀️Inference ```commandline bash tools/dist.sh test seg/configs/ovsam/ovsam_coco_rn50x16_point.py 8 ``` Please refer to [🤗Hugging Face](https://huggingface.co/HarborYuan/ovsam_models) to get the pre-trained weights: ```commandline git clone https://huggingface.co/HarborYuan/ovsam_models models ``` ## RWKV-SAM See [readme.md](./projects/rwkvsam/README.md) for the details. ## 📚 Citation If you think our codebases and works are useful for your research, please consider referring us: ```bibtex @inproceedings{yuan2024ovsam, title={Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively}, author={Yuan, Haobo and Li, Xiangtai and Zhou, Chong and Li, Yining and Chen, Kai and Loy, Chen Change}, booktitle={ECCV}, year={2024} } @article{yuan2024mamba, title={Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model}, author={Yuan, Haobo and Li, Xiangtai and Qi, Lu and Zhang, Tao and Yang, Ming-Hsuan and Yan, Shuicheng and Loy, Chen Change}, journal={arXiv preprint}, year={2024} } ``` ## License <a name="license"></a> This project is licensed under <a rel="license" href="https://github.com/HarborYuan/ovsam/blob/master/LICENSE">NTU S-Lab License 1.0</a>. Redistribution and use should follow this license. ", Assign "at most 3 tags" to the expected json: {"id":"6842","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"

AI prompts