xichen98cn commited on
Commit
05e7aff
1 Parent(s): 3dac99f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -133
README.md CHANGED
@@ -1,133 +1,11 @@
1
- # FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation
2
-
3
- This repository is the official implementation of FrozenSeg introduced in the paper:
4
- >[**FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation**](https://arxiv.org/abs/2409.03525)
5
-
6
-
7
- ## Abstract
8
-
9
- >Open-vocabulary segmentation is challenging, with the need of segmenting and recognizing objects for an open set of categories in unconstrained environments. Building on the success of powerful vision-language (ViL) foundation models like CLIP, recent efforts sought to harness their zero-short capabilities to recognize unseen categories. Despite demonstrating strong performances, they still face a fundamental challenge of generating precise mask proposals for unseen categories and scenarios, resulting in inferior segmentation performance eventually. To address this, we introduce a novel approach, FrozenSeg, designed to integrate spatial knowledge from a localization foundation model (e.g., SAM) and semantic knowledge extracted from a ViL model (e.g., CLIP), in a synergistic framework. Taking the ViL model's visual encoder as the feature backbone, we inject the space-aware feature into learnable query and CLIP feature in the transformer decoder. In addition, we devise a mask proposal ensemble strategy for further improving the recall rate and mask quality. To fully exploit pre-trained knowledge while minimizing training overhead, we freeze both foundation models, focusing optimization efforts solely on a light transformer decoder for mask proposal generation – the performance bottleneck. Extensive experiments show that FrozenSeg advances state-of-the-art results across various segmentation benchmarks, trained exclusively on COCO panoptic data and tested in a zero-shot manner.
10
-
11
- ![FrozenSeg design](images/frozenseg.png)
12
-
13
- ## Dependencies and Installation
14
- See [installation instructions](INSTALL.md).
15
-
16
- ## Getting Started
17
- See [Preparing Datasets](datasets/README.md).
18
-
19
- See [Getting Started](GETTING_STARTED.md).
20
-
21
-
22
- ## Models
23
- <table>
24
- <thead>
25
- <tr>
26
- <th align="center"></th>
27
- <th align="center" style="text-align:center" colspan="4"><a href="logs/testing/ade20k.log">ADE20K(A-150)</th>
28
- <th align="center" style="text-align:center" colspan="3"><a href="logs/testing/cityscapes.log">Cityscapes</th>
29
- <th align="center" style="text-align:center" colspan="2"><a href="logs/testing/mapillary_vistas.log">Mapillary Vistas</th>
30
- <th align="center" style="text-align:center" colspan="2"><a href="logs/testing/bdd100k.log">BDD 100K</th>
31
- <th align="center" style="text-align:center" colspan="2"><a href="logs/testing/a-847.log"> A-847 </th>
32
- <th align="center" style="text-align:center" colspan="2"><a href="logs/testing/pc-459.log"> PC-459 </th>
33
- <th align="center" style="text-align:center" colspan="2"><a href="logs/testing/pas-21.log">PAS-21 </th>
34
- <th align="center" style="text-align:center" ><a href="logs/testing/lvis.log">Lvis </th>
35
- <th align="center" style="text-align:center" colspan="3"><a href="logs/testing/coco.log">COCO <br> (training dataset)</th>
36
- <th align="center" style="text-align:center">download </th>
37
- </tr>
38
- </thead>
39
- <tbody>
40
- <tr>
41
- <td align="center"></td>
42
- <td align="center">PQ</td>
43
- <td align="center">mAP</td>
44
- <td align="center">mIoU</td>
45
- <td align="center">FWIoU</td>
46
- <td align="center">PQ</td>
47
- <td align="center">mAP</td>
48
- <td align="center">mIoU</td>
49
- <td align="center">PQ</td>
50
- <td align="center">mIoU</td>
51
- <td align="center">PQ</td>
52
- <td align="center">mIoU</td>
53
- <td align="center">mIoU</td>
54
- <td align="center">FWIoU</td>
55
- <td align="center">mIoU</td>
56
- <td align="center">FWIoU</td>
57
- <td align="center">mIoU</td>
58
- <td align="center">FWIoU</td>
59
- <td align="center">APr</td>
60
- <td align="center">PQ</td>
61
- <td align="center">mAP</td>
62
- <td align="center">mIoU</td>
63
- <td></td>
64
- </tr>
65
- <td align="center"><a href="configs/coco/frozenseg/r50x64_eval_ade20k.yaml"> FrozenSeg (ResNet50x64) </a></td>
66
- <td align="center">23.1</td>
67
- <td align="center">13.5</td>
68
- <td align="center">30.7</td>
69
- <td align="center">56.6</td>
70
- <td align="center">45.2</td>
71
- <td align="center">28.9</td>
72
- <td align="center">56.0</td>
73
- <td align="center">18.1</td>
74
- <td align="center">27.7</td>
75
- <td align="center">12.9</td>
76
- <td align="center">46.2</td>
77
- <td align="center">11.8</td>
78
- <td align="center">52.8</td>
79
- <td align="center">18.7</td>
80
- <td align="center">60.1</td>
81
- <td align="center">82.3</td>
82
- <td align="center">92.1</td>
83
- <td align="center">23.5</td>
84
- <td align="center">55.7</td>
85
- <td align="center">47.4</td>
86
- <td align="center">65.4</td>
87
- <td align="center"><a href=""> checkpoint </a></td>
88
- </tr>
89
- <tr>
90
- <td align="center"><a href="configs/coco/frozenseg/convnext_large_eval_ade20k.yaml"> FrozenSeg (ConvNeXt-Large) </a></td>
91
- <td align="center">25.9</td>
92
- <td align="center">16.4</td>
93
- <td align="center">34.4</td>
94
- <td align="center">59.9</td>
95
- <td align="center">45.8</td>
96
- <td align="center">28.4</td>
97
- <td align="center">56.8</td>
98
- <td align="center">18.5</td>
99
- <td align="center">27.3</td>
100
- <td align="center">19.3</td>
101
- <td align="center">52.3</td>
102
- <td align="center">14.8</td>
103
- <td align="center">51.4</td>
104
- <td align="center">19.7</td>
105
- <td align="center">60.2</td>
106
- <td align="center">82.5</td>
107
- <td align="center">92.1</td>
108
- <td align="center">25.6</td>
109
- <td align="center">56.2</td>
110
- <td align="center">47.3</td>
111
- <td align="center">65.5</td>
112
- <td align="center"><a href="https://drive.google.com/file/d/1ThjVgY7nawm1AAP1LhrmGVlI3zr1EYMG/view?usp=drive_link"> checkpoint </a></td>
113
- </tr>
114
- </tbody>
115
- </table>
116
-
117
-
118
-
119
- ## Citing
120
-
121
- If you use FrozenSeg in your research, please use the following BibTeX entry.
122
-
123
- ```BibTeX
124
- @misc{FrozenSeg,
125
- title={FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation},
126
- author={Xi Chen and Haosen Yang and Sheng Jin and Xiatian Zhu and Hongxun Yao},
127
- publisher={arXiv:5835590},
128
- year={2024}
129
- }
130
- ```
131
-
132
- ## Acknowledgement
133
- [Detectron2](https://github.com/facebookresearch/detectron2), [Mask2Former](https://github.com/facebookresearch/Mask2Former) and [OpenCLIP](https://github.com/mlfoundations/open_clip)
 
1
+ ---
2
+ title: FrozenSeg
3
+ emoji: 🏢
4
+ colorFrom: purple
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 3.35.2
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference