FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
Abstract
Recently, open-vocabulary learning has emerged to accomplish segmentation for arbitrary categories of text-based descriptions, which popularizes the <PRE_TAG>segmentation system</POST_TAG> to more general-purpose application scenarios. However, existing methods devote to designing specialized architectures or parameters for specific <PRE_TAG>segmentation tasks</POST_TAG>. These customized design paradigms lead to fragmentation between various <PRE_TAG>segmentation tasks</POST_TAG>, thus hindering the uniformity of segmentation models. Hence in this paper, we propose FreeSeg, a generic framework to accomplish Unified, Universal and Open-Vocabulary Image Segmentation. FreeSeg optimizes an all-in-one network via one-shot training and employs the same architecture and parameters to handle diverse segmentation tasks seamlessly in the inference procedure. Additionally, adaptive prompt learning facilitates the unified model to capture task-aware and category-sensitive concepts, improving model robustness in multi-task and varied scenarios. Extensive experimental results demonstrate that FreeSeg establishes new state-of-the-art results in performance and generalization on three <PRE_TAG>segmentation tasks</POST_TAG>, which outperforms the best task-specific architectures by a large margin: 5.5% mIoU on semantic segmentation, 17.6% mAP on instance segmentation, 20.1% PQ on panoptic <PRE_TAG>segmentation</POST_TAG> for the unseen class on COCO.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 1
Collections including this paper 0
No Collection including this paper