BAAI
/

Image-to-Text
PhyscalX commited on
Commit
2e4d43b
1 Parent(s): 666cd0a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -11,10 +11,11 @@ pipeline_tag: image-to-text
11
  <sup>1</sup>[ICT-CAS](http://english.ict.cas.cn/), &nbsp; <sup>2</sup>[BAAI](https://www.baai.ac.cn/english.html)<br>
12
  <sup>*</sup> Equal Contribution, <sup>¶</sup>Project Lead
13
 
 
 
14
  </div>
15
 
16
- We present **T**okenize **A**nything via **P**rompting, a unified and promptable model capable of simultaneously segmenting, recognizing, and captioning objects within arbitrary regions, only relaying on visual prompts (point, box and sketch). The model is trained with exhaustive segmentation
17
- masks sourced from SA-1B, coupled with semantic priors from a pre-trained EVA-CLIP with 5 billion parameters.
18
 
19
  ## Installation
20
 
@@ -50,11 +51,11 @@ Two versions of the model are available with different image encoders.
50
  @article{pan2023tap,
51
  title={Tokenize Anything via Prompting},
52
  author={Pan, Ting and Tang, Lulu and Wang, Xinlong and Shan, Shiguang},
53
- journal={arXiv preprint arXiv:2312.yyyyy},
54
  year={2023}
55
  }
56
  ```
57
 
58
  ## Acknowledgement
59
 
60
- We thank the repositories: [SAM](https://github.com/facebookresearch/segment-anything), [EVA](https://github.com/baaivision/EVA), [LLaMA](https://github.com/facebookresearch/llama), [FlashAttention](https://github.com/Dao-AILab/flash-attention), [Gradio](https://github.com/gradio-app/gradio), [Detectron2](https://github.com/facebookresearch/detectron2) and [CodeWithGPU](https://github.com/seetacloud/codewithgpu).
 
11
  <sup>1</sup>[ICT-CAS](http://english.ict.cas.cn/), &nbsp; <sup>2</sup>[BAAI](https://www.baai.ac.cn/english.html)<br>
12
  <sup>*</sup> Equal Contribution, <sup>¶</sup>Project Lead
13
 
14
+ [[`Paper`](https://arxiv.org/pdf/2312.09128.pdf)] [[`🤗 Demo`](https://huggingface.co/spaces/BAAI/tokenize-anything)]
15
+
16
  </div>
17
 
18
+ We present **T**okenize **A**nything via **P**rompting, a unified and promptable model capable of simultaneously segmenting, recognizing, and captioning arbitrary regions, with flexible visual prompts (point, box and sketch). The model is trained with exhaustive segmentation masks sourced from SA-1B, coupled with semantic priors from a pre-trained EVA-CLIP with 5 billion parameters.
 
19
 
20
  ## Installation
21
 
 
51
  @article{pan2023tap,
52
  title={Tokenize Anything via Prompting},
53
  author={Pan, Ting and Tang, Lulu and Wang, Xinlong and Shan, Shiguang},
54
+ journal={arXiv preprint arXiv:2312.09128},
55
  year={2023}
56
  }
57
  ```
58
 
59
  ## Acknowledgement
60
 
61
+ We thank the repositories: [SAM](https://github.com/facebookresearch/segment-anything), [EVA](https://github.com/baaivision/EVA), [LLaMA](https://github.com/facebookresearch/llama), [FlashAttention](https://github.com/Dao-AILab/flash-attention), [Gradio](https://github.com/gradio-app/gradio), [Detectron2](https://github.com/facebookresearch/detectron2) and [CodeWithGPU](https://github.com/seetacloud/codewithgpu).