BAAI
/

tokenize-anything

Model card Files Files and versions Community

PhyscalX commited on Dec 15, 2023

Commit

2e4d43b

•

1 Parent(s): 666cd0a

Update README.md

Files changed (1) hide show

README.md +5 -4

README.md CHANGED Viewed

@@ -11,10 +11,11 @@ pipeline_tag: image-to-text
 <sup>1</sup>[ICT-CAS](http://english.ict.cas.cn/), &nbsp; <sup>2</sup>[BAAI](https://www.baai.ac.cn/english.html)<br>
 <sup>*</sup> Equal Contribution, <sup>¶</sup>Project Lead
 </div>
-We present **T**okenize **A**nything via **P**rompting, a unified and promptable model capable of simultaneously segmenting, recognizing, and captioning objects within arbitrary regions, only relaying on visual prompts (point, box and sketch). The model is trained with exhaustive segmentation
-masks sourced from SA-1B, coupled with semantic priors from a pre-trained EVA-CLIP with 5 billion parameters.
 ## Installation
@@ -50,11 +51,11 @@ Two versions of the model are available with different image encoders.
 @article{pan2023tap,
   title={Tokenize Anything via Prompting},
   author={Pan, Ting and Tang, Lulu and Wang, Xinlong and Shan, Shiguang},
-  journal={arXiv preprint arXiv:2312.yyyyy},
   year={2023}
 }
 ```
 ## Acknowledgement
-We thank the repositories: [SAM](https://github.com/facebookresearch/segment-anything), [EVA](https://github.com/baaivision/EVA), [LLaMA](https://github.com/facebookresearch/llama), [FlashAttention](https://github.com/Dao-AILab/flash-attention), [Gradio](https://github.com/gradio-app/gradio), [Detectron2](https://github.com/facebookresearch/detectron2) and [CodeWithGPU](https://github.com/seetacloud/codewithgpu).

 <sup>1</sup>[ICT-CAS](http://english.ict.cas.cn/), &nbsp; <sup>2</sup>[BAAI](https://www.baai.ac.cn/english.html)<br>
 <sup>*</sup> Equal Contribution, <sup>¶</sup>Project Lead
+[[`Paper`](https://arxiv.org/pdf/2312.09128.pdf)] [[`🤗 Demo`](https://huggingface.co/spaces/BAAI/tokenize-anything)]
 </div>
+We present **T**okenize **A**nything via **P**rompting, a unified and promptable model capable of simultaneously segmenting, recognizing, and captioning arbitrary regions, with flexible visual prompts (point, box and sketch). The model is trained with exhaustive segmentation masks sourced from SA-1B, coupled with semantic priors from a pre-trained EVA-CLIP with 5 billion parameters.
 ## Installation
 @article{pan2023tap,
   title={Tokenize Anything via Prompting},
   author={Pan, Ting and Tang, Lulu and Wang, Xinlong and Shan, Shiguang},
+  journal={arXiv preprint arXiv:2312.09128},
   year={2023}
 }
 ```
 ## Acknowledgement
+We thank the repositories: [SAM](https://github.com/facebookresearch/segment-anything), [EVA](https://github.com/baaivision/EVA), [LLaMA](https://github.com/facebookresearch/llama), [FlashAttention](https://github.com/Dao-AILab/flash-attention), [Gradio](https://github.com/gradio-app/gradio), [Detectron2](https://github.com/facebookresearch/detectron2) and [CodeWithGPU](https://github.com/seetacloud/codewithgpu).