ov-seg

Runtime error

App Files Files Community

liangfeng commited on Nov 4, 2022

Commit

b92a792

1 Parent(s): 0839f49

clean up

Browse files

Files changed (35) hide show

CODE_OF_CONDUCT.md +0 -80
CONTRIBUTING.md +0 -32
GETTING_STARTED.md +0 -99
INSTALL.md +0 -33
LICENSE +0 -399
README.md +1 -57
app.py +21 -9
configs/ovseg_swinB_vitL_bs32_120k.yaml +0 -100
datasets/DATASETS.md +0 -122
datasets/prepare_ade20k_full_sem_seg.py +0 -1011
datasets/prepare_ade20k_sem_seg.py +0 -35
datasets/prepare_coco_stuff_sem_seg.py +0 -219
datasets/prepare_pascal_context.py +0 -69
datasets/prepare_voc_sem_seg.py +0 -71
open_vocab_seg/.DS_Store +0 -0
open_vocab_seg/modeling/.DS_Store +0 -0
open_vocab_seg/modeling/clip_adapter/__init__.py +2 -0
open_vocab_seg/modeling/clip_adapter/clip/__init__.py +1 -0
open_vocab_seg/modeling/clip_adapter/clip/bpe_simple_vocab_16e6.txt.gz +3 -0
open_vocab_seg/modeling/clip_adapter/clip/clip.py +285 -0
open_vocab_seg/modeling/clip_adapter/clip/model.py +613 -0
open_vocab_seg/modeling/clip_adapter/clip/simple_tokenizer.py +150 -0
open_vocab_seg/modeling/clip_adapter/text_template.py +3 -2
open_vocab_seg/modeling/clip_adapter/utils.py +3 -3
configs/ovseg_swinB_vitL_demo.yaml → ovseg_swinB_vitL_demo.yaml +1 -1
requirements.txt +8 -2
resources/demo_samples/sample_01.jpeg +3 -0
resources/demo_samples/sample_02.jpeg +3 -0
tools/convert-pretrained-clip-model-to-d2.py +0 -69
tools/convert-pretrained-swin-model-to-d2.py +0 -30
tools/convert-torchvision-to-d2.py +0 -54
tools/ovseg_replace_clip.py +0 -30
tools/search_thr_ensemble_w.sh +0 -11
tools/web_demo.py +0 -76
train_net.py +0 -309

CODE_OF_CONDUCT.md DELETED Viewed

@@ -1,80 +0,0 @@
-# Code of Conduct
-## Our Pledge
-In the interest of fostering an open and welcoming environment, we as
-contributors and maintainers pledge to make participation in our project and
-our community a harassment-free experience for everyone, regardless of age, body
-size, disability, ethnicity, sex characteristics, gender identity and expression,
-level of experience, education, socio-economic status, nationality, personal
-appearance, race, religion, or sexual identity and orientation.
-## Our Standards
-Examples of behavior that contributes to creating a positive environment
-include:
-* Using welcoming and inclusive language
-* Being respectful of differing viewpoints and experiences
-* Gracefully accepting constructive criticism
-* Focusing on what is best for the community
-* Showing empathy towards other community members
-Examples of unacceptable behavior by participants include:
-* The use of sexualized language or imagery and unwelcome sexual attention or
-advances
-* Trolling, insulting/derogatory comments, and personal or political attacks
-* Public or private harassment
-* Publishing others' private information, such as a physical or electronic
-address, without explicit permission
-* Other conduct which could reasonably be considered inappropriate in a
-professional setting
-## Our Responsibilities
-Project maintainers are responsible for clarifying the standards of acceptable
-behavior and are expected to take appropriate and fair corrective action in
-response to any instances of unacceptable behavior.
-Project maintainers have the right and responsibility to remove, edit, or
-reject comments, commits, code, wiki edits, issues, and other contributions
-that are not aligned to this Code of Conduct, or to ban temporarily or
-permanently any contributor for other behaviors that they deem inappropriate,
-threatening, offensive, or harmful.
-## Scope
-This Code of Conduct applies within all project spaces, and it also applies when
-an individual is representing the project or its community in public spaces.
-Examples of representing a project or community include using an official
-project e-mail address, posting via an official social media account, or acting
-as an appointed representative at an online or offline event. Representation of
-a project may be further defined and clarified by project maintainers.
-This Code of Conduct also applies outside the project spaces when there is a
-reasonable belief that an individual's behavior may have a negative impact on
-the project or its community.
-## Enforcement
-Instances of abusive, harassing, or otherwise unacceptable behavior may be
-reported by contacting the project team at <[email protected]>. All
-complaints will be reviewed and investigated and will result in a response that
-is deemed necessary and appropriate to the circumstances. The project team is
-obligated to maintain confidentiality with regard to the reporter of an incident.
-Further details of specific enforcement policies may be posted separately.
-Project maintainers who do not follow or enforce the Code of Conduct in good
-faith may face temporary or permanent repercussions as determined by other
-members of the project's leadership.
-## Attribution
-This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
-available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
-[homepage]: https://www.contributor-covenant.org
-For answers to common questions about this code of conduct, see
-https://www.contributor-covenant.org/faq

CONTRIBUTING.md DELETED Viewed

@@ -1,32 +0,0 @@
-# Contributing to OVSeg
-We want to make contributing to this project as easy and transparent as
-possible.
-## Pull Requests
-We actively welcome your pull requests.
-1. Fork the repo and create your branch from `main`.
-2. If you've added code that should be tested, add tests.
-3. If you've changed APIs, update the documentation.
-4. Ensure the test suite passes.
-5. Make sure your code lints.
-6. If you haven't already, complete the Contributor License Agreement ("CLA").
-## Contributor License Agreement ("CLA")
-In order to accept your pull request, we need you to submit a CLA. You only need
-to do this once to work on any of Meta's open source projects.
-Complete your CLA here: <https://code.facebook.com/cla>
-## Issues
-We use GitHub issues to track public bugs. Please ensure your description is
-clear and has sufficient instructions to be able to reproduce the issue.
-Meta has a [bounty program](https://www.facebook.com/whitehat/) for the safe
-disclosure of security bugs. In those cases, please go through the process
-outlined on that page and do not file a public issue.
-## License
-By contributing to OVSeg, you agree that your contributions will be licensed
-under the LICENSE file in the root directory of this source tree.

GETTING_STARTED.md DELETED Viewed

@@ -1,99 +0,0 @@
-## Getting started with OVSeg
-### Try demo
-We release our largest model (Swin-Base + CLIP-ViT-L/14) [ovseg_swinbase_vitL14_ft_mpt.pth](https://drive.google.com/file/d/1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy/view?usp=sharing) (md5: <tt>526080</tt>).
-- Test on sample image
-  ```bash
-  python demo.py --config-file configs/ovseg_swinB_vitL_demo.yaml --class-names 'Oculus' 'Ukulele'  --input ./resources/demo_samples/sample_03.jpeg --output ./pred --opts MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_ft_mpt.pth
-  ```
-### Evaluation with pre-trained weights
-We release our largest model (Swin-Base + CLIP-ViT-L/14) [ovseg_swinbase_vitL14_ft_mpt.pth](https://drive.google.com/file/d/1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy/view?usp=sharing) (md5: <tt>526080</tt>).
-- Test on ADE20K-150 and ADE-847
-  ```bash
-  python train_net.py --num-gpu 8 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_ft_mpt.pth DATASETS.TEST \(\"ade20k_sem_seg_val\",\"ade20k_full_sem_seg_val\"\)
-  ```
-- Test on PascalContext-59 and PascalContext-459
-  ```bash
-  python train_net.py --num-gpu 8 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_ft_mpt.pth MODEL.CLIP_ADAPTER.CLIP_ENSEMBLE_WEIGHT 0.6  DATASETS.TEST \(\"pascal_context_59_sem_seg_val\",\"pascal_context_459_sem_seg_val\",\)
-  ```
-- Test on PascalVOC-20
-  ```bash
-  python train_net.py --num-gpu 8 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_ft_mpt.pth MODEL.CLIP_ADAPTER.CLIP_ENSEMBLE_WEIGHT 0.45  DATASETS.TEST \(\"pascalvoc20_sem_seg_val\",\)
-  ```
-#### Performance benchmark
-| method                             | backbone | training dataset | A-847 | PC-459 | A-150 | PC-59 | PAS-20 |
-|------------------------------------|----------|------------------|:-----:|:------:|:-----:|:-----:|:------:|
-| Open-vocabulary generalist models. |          |                  |       |        |       |       |        |
-| SPNet                              | R-101    | PASCAL-15        |   -   |    -   |   -   |  24.3 |  18.3  |
-| ZS3Net                             | R-101    | PASCAL-15        |   -   |    -   |   -   |  19.4 |  38.3  |
-| LSeg                               | R-101    | PASCAL-15        |   -   |    -   |   -   |   -   |  47.4  |
-| LSeg+                              | R-101    | COCO Panoptic    |  2.5  |   5.2  |  13.0 |  36.0 |  59.0  |
-| SimBaseline                        | R-101c   | COCO-Stuff-156   |   -   |    -   |  15.3 |   -   |  74.5  |
-| ZegFormer                          | R-50     | COCO-Stuff-156   |   -   |    -   |  16.4 |   -   |  80.7  |
-| OpenSeg                            | R-101    | COCO Panoptic    |  4.0  |   6.5  |  15.3 |  36.9 |  60.0  |
-| OVSeg (Ours)                       | R-101c   | COCO-Stuff-171   |  7.1  |  11.0  |  24.8 |  53.3 |  92.6  |
-| LSeg+                              | Eff-B7   | COCO Panoptic    |  3.8  |   7.8  |  18.0 |  46.5 |    -   |
-| OpenSeg                            | Eff-B7   | COCO Panoptic    |  6.3  |   9.0  |  21.1 |  42.1 |    -   |
-| OVSeg (Ours)                       | Swin-B   | COCO-Stuff-171   |  9.0  |  12.4  |  29.6 |  55.7 |  94.5  |
-| Supervised specialist models.      |          |                  |       |        |       |       |        |
-| FCN                                | FCN-8s   | Same as test     |   -   |    -   |  29.4 |  37.8 |    -   |
-| Deeplab                            | R-101    | Same as test     |   -   |    -   |   -   |  45.7 |  77.7  |
-| SelfTrain                          | Eff-L2   | Same as test     |   -   |    -   |   -   |   -   |  90.0  |
-#### Ablation study
-- Mask prompt tuning can bring significant improvement without changing CLIP weights (Table 3 in [paper](https://arxiv.org/pdf/2210.04150.pdf))
-Download the checkpoint with mpt only [ovseg_swinbase_vitL14_mpt_only.pt](https://drive.google.com/file/d/1LJGWFjHw76OGDNy9r9KQIaACfIm9KMhQ/view?usp=sharing) (md5: <tt>2dd495</tt>).
-  ```bash
-  python train_net.py --num-gpu 8 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_mpt_only.pt DATASETS.TEST \(\"ade20k_sem_seg_val\",\"ade20k_full_sem_seg_val\"\)
-  ```
-- Mask prompt tuning can improve over fully finetuned model (Table 3 in [paper](https://arxiv.org/pdf/2210.04150.pdf))
-With the same [ovseg_swinbase_vitL14_ft_mpt.pth](https://drive.google.com/file/d/1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy/view?usp=sharing) checkpoint, set `MASK_PROMPT_FWD` as `False`
-  ```bash
-  python train_net.py --num-gpu 8 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.CLIP_ADAPTER.MASK_PROMPT_FWD False MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_ft_mpt.pth DATASETS.TEST \(\"ade20k_sem_seg_val\",\"ade20k_full_sem_seg_val\"\)
-  ```
-- The effects of class prediction ensemble (Table 6 in [paper](https://arxiv.org/pdf/2210.04150.pdf))
-With the same [ovseg_swinbase_vitL14_ft_mpt.pth](https://drive.google.com/file/d/1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy/view?usp=sharing) checkpoint, set `CLIP_ENSEMBLE` as `False`.
-  ```bash
-  python train_net.py --num-gpu 8 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.CLIP_ADAPTER.CLIP_ENSEMBLE False MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_ft_mpt.pth DATASETS.TEST \(\"ade20k_sem_seg_val\",\"ade20k_full_sem_seg_val\"\)
-  ```
-### Training Segmentation model
-  Our model is trained on COCO-Stuff
-- Training baseline w/ original CLIP
-  ```
-  python train_net.py --num-gpu 8 --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.CLIP_ADAPTER.MASK_PROMPT_FWD False
-  ```
-To reproduce our final results, you may want to use the our mask-adapted CLIP
-- Training ovseg w/ mask-adapted CLIP
-  ```
-  python train_net.py --num-gpu 8 --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml MODEL.CLIP_ADAPTER.CLIP_MODEL_NAME #PATH_TO_MASKADAPTED_CLIP
-  ```
-CAUTION: The final results is sensitive to the ensemble (appendix A.5 in [paper](https://arxiv.org/pdf/2210.04150.pdf)). Thus, you may want to use the ```tools/search_thr_ensemble_w.sh``` to find the best ensemble hyper-parameters.
-### Fine-tuning CLIP with collected mask-category pairs
-We are still working on this part, stay tuned!

INSTALL.md DELETED Viewed

@@ -1,33 +0,0 @@
-## Installation
-### Requirements
-- Linux with Python ≥ 3.6
-- PyTorch ≥ 1.8 and [torchvision](https://github.com/pytorch/vision/) that matches the PyTorch installation.
-  Install them together at [pytorch.org](https://pytorch.org) to make sure of this. Note, please check
-  PyTorch version matches that is required by Detectron2.
-- Detectron2: follow [Detectron2 installation instructions](https://detectron2.readthedocs.io/tutorials/install.html).
-### Usage
-Install required packages.
-```bash
-conda create --name ovseg python=3.8
-conda activate ovseg
-conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
-pip install -r requirements.txt
-```
-You need to download `detectron2==0.6` following [instructions](https://detectron2.readthedocs.io/en/latest/tutorials/install.html)
-```bash
-python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
-```
-FurtherMore, install the modified clip package.
-```bash
-cd third_party/CLIP
-python -m pip install -Ue .
-```

LICENSE DELETED Viewed

@@ -1,399 +0,0 @@
-Attribution-NonCommercial 4.0 International
-=======================================================================
-Creative Commons Corporation ("Creative Commons") is not a law firm and
-does not provide legal services or legal advice. Distribution of
-Creative Commons public licenses does not create a lawyer-client or
-other relationship. Creative Commons makes its licenses and related
-information available on an "as-is" basis. Creative Commons gives no
-warranties regarding its licenses, any material licensed under their
-terms and conditions, or any related information. Creative Commons
-disclaims all liability for damages resulting from their use to the
-fullest extent possible.
-Using Creative Commons Public Licenses
-Creative Commons public licenses provide a standard set of terms and
-conditions that creators and other rights holders may use to share
-original works of authorship and other material subject to copyright
-and certain other rights specified in the public license below. The
-following considerations are for informational purposes only, are not
-exhaustive, and do not form part of our licenses.
-     Considerations for licensors: Our public licenses are
-     intended for use by those authorized to give the public
-     permission to use material in ways otherwise restricted by
-     copyright and certain other rights. Our licenses are
-     irrevocable. Licensors should read and understand the terms
-     and conditions of the license they choose before applying it.
-     Licensors should also secure all rights necessary before
-     applying our licenses so that the public can reuse the
-     material as expected. Licensors should clearly mark any
-     material not subject to the license. This includes other CC-
-     licensed material, or material used under an exception or
-     limitation to copyright. More considerations for licensors:
-	wiki.creativecommons.org/Considerations_for_licensors
-     Considerations for the public: By using one of our public
-     licenses, a licensor grants the public permission to use the
-     licensed material under specified terms and conditions. If
-     the licensor's permission is not necessary for any reason--for
-     example, because of any applicable exception or limitation to
-     copyright--then that use is not regulated by the license. Our
-     licenses grant only permissions under copyright and certain
-     other rights that a licensor has authority to grant. Use of
-     the licensed material may still be restricted for other
-     reasons, including because others have copyright or other
-     rights in the material. A licensor may make special requests,
-     such as asking that all changes be marked or described.
-     Although not required by our licenses, you are encouraged to
-     respect those requests where reasonable. More_considerations
-     for the public:
-	wiki.creativecommons.org/Considerations_for_licensees
-=======================================================================
-Creative Commons Attribution-NonCommercial 4.0 International Public
-License
-By exercising the Licensed Rights (defined below), You accept and agree
-to be bound by the terms and conditions of this Creative Commons
-Attribution-NonCommercial 4.0 International Public License ("Public
-License"). To the extent this Public License may be interpreted as a
-contract, You are granted the Licensed Rights in consideration of Your
-acceptance of these terms and conditions, and the Licensor grants You
-such rights in consideration of benefits the Licensor receives from
-making the Licensed Material available under these terms and
-conditions.
-Section 1 -- Definitions.
-  a. Adapted Material means material subject to Copyright and Similar
-     Rights that is derived from or based upon the Licensed Material
-     and in which the Licensed Material is translated, altered,
-     arranged, transformed, or otherwise modified in a manner requiring
-     permission under the Copyright and Similar Rights held by the
-     Licensor. For purposes of this Public License, where the Licensed
-     Material is a musical work, performance, or sound recording,
-     Adapted Material is always produced where the Licensed Material is
-     synched in timed relation with a moving image.
-  b. Adapter's License means the license You apply to Your Copyright
-     and Similar Rights in Your contributions to Adapted Material in
-     accordance with the terms and conditions of this Public License.
-  c. Copyright and Similar Rights means copyright and/or similar rights
-     closely related to copyright including, without limitation,
-     performance, broadcast, sound recording, and Sui Generis Database
-     Rights, without regard to how the rights are labeled or
-     categorized. For purposes of this Public License, the rights
-     specified in Section 2(b)(1)-(2) are not Copyright and Similar
-     Rights.
-  d. Effective Technological Measures means those measures that, in the
-     absence of proper authority, may not be circumvented under laws
-     fulfilling obligations under Article 11 of the WIPO Copyright
-     Treaty adopted on December 20, 1996, and/or similar international
-     agreements.
-  e. Exceptions and Limitations means fair use, fair dealing, and/or
-     any other exception or limitation to Copyright and Similar Rights
-     that applies to Your use of the Licensed Material.
-  f. Licensed Material means the artistic or literary work, database,
-     or other material to which the Licensor applied this Public
-     License.
-  g. Licensed Rights means the rights granted to You subject to the
-     terms and conditions of this Public License, which are limited to
-     all Copyright and Similar Rights that apply to Your use of the
-     Licensed Material and that the Licensor has authority to license.
-  h. Licensor means the individual(s) or entity(ies) granting rights
-     under this Public License.
-  i. NonCommercial means not primarily intended for or directed towards
-     commercial advantage or monetary compensation. For purposes of
-     this Public License, the exchange of the Licensed Material for
-     other material subject to Copyright and Similar Rights by digital
-     file-sharing or similar means is NonCommercial provided there is
-     no payment of monetary compensation in connection with the
-     exchange.
-  j. Share means to provide material to the public by any means or
-     process that requires permission under the Licensed Rights, such
-     as reproduction, public display, public performance, distribution,
-     dissemination, communication, or importation, and to make material
-     available to the public including in ways that members of the
-     public may access the material from a place and at a time
-     individually chosen by them.
-  k. Sui Generis Database Rights means rights other than copyright
-     resulting from Directive 96/9/EC of the European Parliament and of
-     the Council of 11 March 1996 on the legal protection of databases,
-     as amended and/or succeeded, as well as other essentially
-     equivalent rights anywhere in the world.
-  l. You means the individual or entity exercising the Licensed Rights
-     under this Public License. Your has a corresponding meaning.
-Section 2 -- Scope.
-  a. License grant.
-       1. Subject to the terms and conditions of this Public License,
-          the Licensor hereby grants You a worldwide, royalty-free,
-          non-sublicensable, non-exclusive, irrevocable license to
-          exercise the Licensed Rights in the Licensed Material to:
-            a. reproduce and Share the Licensed Material, in whole or
-               in part, for NonCommercial purposes only; and
-            b. produce, reproduce, and Share Adapted Material for
-               NonCommercial purposes only.
-       2. Exceptions and Limitations. For the avoidance of doubt, where
-          Exceptions and Limitations apply to Your use, this Public
-          License does not apply, and You do not need to comply with
-          its terms and conditions.
-       3. Term. The term of this Public License is specified in Section
-          6(a).
-       4. Media and formats; technical modifications allowed. The
-          Licensor authorizes You to exercise the Licensed Rights in
-          all media and formats whether now known or hereafter created,
-          and to make technical modifications necessary to do so. The
-          Licensor waives and/or agrees not to assert any right or
-          authority to forbid You from making technical modifications
-          necessary to exercise the Licensed Rights, including
-          technical modifications necessary to circumvent Effective
-          Technological Measures. For purposes of this Public License,
-          simply making modifications authorized by this Section 2(a)
-          (4) never produces Adapted Material.
-       5. Downstream recipients.
-            a. Offer from the Licensor -- Licensed Material. Every
-               recipient of the Licensed Material automatically
-               receives an offer from the Licensor to exercise the
-               Licensed Rights under the terms and conditions of this
-               Public License.
-            b. No downstream restrictions. You may not offer or impose
-               any additional or different terms or conditions on, or
-               apply any Effective Technological Measures to, the
-               Licensed Material if doing so restricts exercise of the
-               Licensed Rights by any recipient of the Licensed
-               Material.
-       6. No endorsement. Nothing in this Public License constitutes or
-          may be construed as permission to assert or imply that You
-          are, or that Your use of the Licensed Material is, connected
-          with, or sponsored, endorsed, or granted official status by,
-          the Licensor or others designated to receive attribution as
-          provided in Section 3(a)(1)(A)(i).
-  b. Other rights.
-       1. Moral rights, such as the right of integrity, are not
-          licensed under this Public License, nor are publicity,
-          privacy, and/or other similar personality rights; however, to
-          the extent possible, the Licensor waives and/or agrees not to
-          assert any such rights held by the Licensor to the limited
-          extent necessary to allow You to exercise the Licensed
-          Rights, but not otherwise.
-       2. Patent and trademark rights are not licensed under this
-          Public License.
-       3. To the extent possible, the Licensor waives any right to
-          collect royalties from You for the exercise of the Licensed
-          Rights, whether directly or through a collecting society
-          under any voluntary or waivable statutory or compulsory
-          licensing scheme. In all other cases the Licensor expressly
-          reserves any right to collect such royalties, including when
-          the Licensed Material is used other than for NonCommercial
-          purposes.
-Section 3 -- License Conditions.
-Your exercise of the Licensed Rights is expressly made subject to the
-following conditions.
-  a. Attribution.
-       1. If You Share the Licensed Material (including in modified
-          form), You must:
-            a. retain the following if it is supplied by the Licensor
-               with the Licensed Material:
-                 i. identification of the creator(s) of the Licensed
-                    Material and any others designated to receive
-                    attribution, in any reasonable manner requested by
-                    the Licensor (including by pseudonym if
-                    designated);
-                ii. a copyright notice;
-               iii. a notice that refers to this Public License;
-                iv. a notice that refers to the disclaimer of
-                    warranties;
-                 v. a URI or hyperlink to the Licensed Material to the
-                    extent reasonably practicable;
-            b. indicate if You modified the Licensed Material and
-               retain an indication of any previous modifications; and
-            c. indicate the Licensed Material is licensed under this
-               Public License, and include the text of, or the URI or
-               hyperlink to, this Public License.
-       2. You may satisfy the conditions in Section 3(a)(1) in any
-          reasonable manner based on the medium, means, and context in
-          which You Share the Licensed Material. For example, it may be
-          reasonable to satisfy the conditions by providing a URI or
-          hyperlink to a resource that includes the required
-          information.
-       3. If requested by the Licensor, You must remove any of the
-          information required by Section 3(a)(1)(A) to the extent
-          reasonably practicable.
-       4. If You Share Adapted Material You produce, the Adapter's
-          License You apply must not prevent recipients of the Adapted
-          Material from complying with this Public License.
-Section 4 -- Sui Generis Database Rights.
-Where the Licensed Rights include Sui Generis Database Rights that
-apply to Your use of the Licensed Material:
-  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
-     to extract, reuse, reproduce, and Share all or a substantial
-     portion of the contents of the database for NonCommercial purposes
-     only;
-  b. if You include all or a substantial portion of the database
-     contents in a database in which You have Sui Generis Database
-     Rights, then the database in which You have Sui Generis Database
-     Rights (but not its individual contents) is Adapted Material; and
-  c. You must comply with the conditions in Section 3(a) if You Share
-     all or a substantial portion of the contents of the database.
-For the avoidance of doubt, this Section 4 supplements and does not
-replace Your obligations under this Public License where the Licensed
-Rights include other Copyright and Similar Rights.
-Section 5 -- Disclaimer of Warranties and Limitation of Liability.
-  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
-     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
-     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
-     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
-     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
-     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
-     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
-     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
-     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
-     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
-  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
-     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
-     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
-     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
-     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
-     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
-     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
-     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
-     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
-  c. The disclaimer of warranties and limitation of liability provided
-     above shall be interpreted in a manner that, to the extent
-     possible, most closely approximates an absolute disclaimer and
-     waiver of all liability.
-Section 6 -- Term and Termination.
-  a. This Public License applies for the term of the Copyright and
-     Similar Rights licensed here. However, if You fail to comply with
-     this Public License, then Your rights under this Public License
-     terminate automatically.
-  b. Where Your right to use the Licensed Material has terminated under
-     Section 6(a), it reinstates:
-       1. automatically as of the date the violation is cured, provided
-          it is cured within 30 days of Your discovery of the
-          violation; or
-       2. upon express reinstatement by the Licensor.
-     For the avoidance of doubt, this Section 6(b) does not affect any
-     right the Licensor may have to seek remedies for Your violations
-     of this Public License.
-  c. For the avoidance of doubt, the Licensor may also offer the
-     Licensed Material under separate terms or conditions or stop
-     distributing the Licensed Material at any time; however, doing so
-     will not terminate this Public License.
-  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
-     License.
-Section 7 -- Other Terms and Conditions.
-  a. The Licensor shall not be bound by any additional or different
-     terms or conditions communicated by You unless expressly agreed.
-  b. Any arrangements, understandings, or agreements regarding the
-     Licensed Material not stated herein are separate from and
-     independent of the terms and conditions of this Public License.
-Section 8 -- Interpretation.
-  a. For the avoidance of doubt, this Public License does not, and
-     shall not be interpreted to, reduce, limit, restrict, or impose
-     conditions on any use of the Licensed Material that could lawfully
-     be made without permission under this Public License.
-  b. To the extent possible, if any provision of this Public License is
-     deemed unenforceable, it shall be automatically reformed to the
-     minimum extent necessary to make it enforceable. If the provision
-     cannot be reformed, it shall be severed from this Public License
-     without affecting the enforceability of the remaining terms and
-     conditions.
-  c. No term or condition of this Public License will be waived and no
-     failure to comply consented to unless expressly agreed to by the
-     Licensor.
-  d. Nothing in this Public License constitutes or may be interpreted
-     as a limitation upon, or waiver of, any privileges and immunities
-     that apply to the Licensor or You, including from the legal
-     processes of any jurisdiction or authority.
-=======================================================================
-Creative Commons is not a party to its public
-licenses. Notwithstanding, Creative Commons may elect to apply one of
-its public licenses to material it publishes and in those instances
-will be considered the “Licensor.” The text of the Creative Commons
-public licenses is dedicated to the public domain under the CC0 Public
-Domain Dedication. Except for the limited purpose of indicating that
-material is shared under a Creative Commons public license or as
-otherwise permitted by the Creative Commons policies published at
-creativecommons.org/policies, Creative Commons does not authorize the
-use of the trademark "Creative Commons" or any other trademark or logo
-of Creative Commons without its prior written consent including,
-without limitation, in connection with any unauthorized modifications
-to any of its public licenses or any other arrangements,
-understandings, or agreements concerning use of licensed material. For
-the avoidance of doubt, this paragraph does not form part of the
-public licenses.
-Creative Commons may be contacted at creativecommons.org.

README.md CHANGED Viewed

@@ -10,60 +10,4 @@ pinned: false
 license: cc-by-nc-4.0
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
-# [OVSeg] Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
-<img src="resources/pytorch-logo-dark.png" width="10%">
-This is the official PyTorch implementation of our paper: <br>
-**Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP**<br>
-[Feng Liang](https://jeff-liangf.github.io/), [Bichen Wu](https://www.linkedin.com/in/bichenwu), [Xiaoliang Dai](https://sites.google.com/view/xiaoliangdai/), [Kunpeng Li](https://kunpengli1994.github.io/), [Yinan Zhao](https://yinan-zhao.github.io/), [Hang Zhang](https://hangzhang.org/), [Peizhao Zhang](https://www.linkedin.com/in/peizhao-zhang-14846042/), [Peter Vajda](https://sites.google.com/site/vajdap), [Diana Marculescu](https://www.ece.utexas.edu/people/faculty/diana-marculescu)
-[[arXiv](https://arxiv.org/abs/2210.04150)] [[Project](https://jeff-liangf.github.io/projects/ovseg/)]
-<p align="center">
-<img src="resources/ovseg.gif" width="100%">
-</p>
-## Installation
-Please see [installation guide](./INSTALL.md).
-## Data Preparation
-Please see [datasets preparation](./datasets/DATASETS.md).
-## Getting started
-Please see [getting started instruction](./GETTING_STARTED.md).
-## LICENSE
-Shield: [![CC BY-NC 4.0][cc-by-nc-shield]][cc-by-nc]
-The majority of OVSeg is licensed under a
-[Creative Commons Attribution-NonCommercial 4.0 International License](LICENSE).
-[![CC BY-NC 4.0][cc-by-nc-image]][cc-by-nc]
-[cc-by-nc]: http://creativecommons.org/licenses/by-nc/4.0/
-[cc-by-nc-image]: https://licensebuttons.net/l/by-nc/4.0/88x31.png
-[cc-by-nc-shield]: https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg
-However portions of the project are under separate license terms: CLIP and ZSSEG are licensed under the [MIT license](https://github.com/openai/CLIP/blob/main/LICENSE); MaskFormer is licensed under the [CC-BY-NC](https://github.com/facebookresearch/MaskFormer/blob/main/LICENSE); openclip is licensed under the license at [its repo](https://github.com/mlfoundations/open_clip/blob/main/LICENSE).
-## Citing OVSeg :pray:
-If you use OVSeg in your research or wish to refer to the baseline results published in the paper, please use the following BibTeX entry.
-```BibTeX
-@article{liang2022open,
-  title={Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP},
-  author={Liang, Feng and Wu, Bichen and Dai, Xiaoliang and Li, Kunpeng and Zhao, Yinan and Zhang, Hang and Zhang, Peizhao and Vajda, Peter and Marculescu, Diana},
-  journal={arXiv preprint arXiv:2210.04150},
-  year={2022}
-}
-```

 license: cc-by-nc-4.0
 ---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py CHANGED Viewed

@@ -6,6 +6,13 @@ import multiprocessing as mp
 import numpy as np
 from PIL import Image
 from detectron2.config import get_cfg
 from detectron2.projects.deeplab import add_deeplab_config
@@ -15,6 +22,12 @@ from open_vocab_seg.utils import VisualizationDemo
 import gradio as gr
 def setup_cfg(config_file):
     # load config from file and command-line arguments
     cfg = get_cfg()
@@ -27,7 +40,7 @@ def setup_cfg(config_file):
 def inference(class_names, input_img):
     mp.set_start_method("spawn", force=True)
-    config_file = './configs/ovseg_swinB_vitL_demo.yaml'
     cfg = setup_cfg(config_file)
     demo = VisualizationDemo(cfg)
@@ -38,19 +51,18 @@ def inference(class_names, input_img):
     return Image.fromarray(np.uint8(visualized_output.get_image())).convert('RGB')
-# demo = gr.Interface(fn=greet, inputs="text", outputs="text")
-# demo.launch()
-examples = [['Oculus, Ukulele', './resources/demo_samples/sample_03.jpeg'],]
 output_labels = ['segmentation map']
 title = 'OVSeg'
 description = """
-Gradio Demo for Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP \n
-You may click on of the examples or upload your own image. \n
-OVSeg could perform open vocabulary segmentation, you may input more classes (seperate by comma).
 """
 article = """
@@ -59,7 +71,7 @@ article = """
 Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
 </a>
 |
-<a href='https://github.com' target='_blank'>Github Repo</a></p>
 """
 gr.Interface(

 import numpy as np
 from PIL import Image
+try:
+    import detectron2
+except:
+    import os
+    os.system('pip install git+https://github.com/facebookresearch/detectron2.git')
 from detectron2.config import get_cfg
 from detectron2.projects.deeplab import add_deeplab_config
 import gradio as gr
+import gdown
+ckpt_url = 'https://drive.google.com/uc?id=1cn-ohxgXDrDfkzC1QdO-fi8IjbjXmgKy'
+output = './ovseg_swinbase_vitL14_ft_mpt.pth'
+gdown.download(ckpt_url, output, quiet=False)
 def setup_cfg(config_file):
     # load config from file and command-line arguments
     cfg = get_cfg()
 def inference(class_names, input_img):
     mp.set_start_method("spawn", force=True)
+    config_file = './ovseg_swinB_vitL_demo.yaml'
     cfg = setup_cfg(config_file)
     demo = VisualizationDemo(cfg)
     return Image.fromarray(np.uint8(visualized_output.get_image())).convert('RGB')
+examples = [['Oculus, Ukulele', './resources/demo_samples/sample_03.jpeg'],
+            ['Saturn V, toys, blossom', './resources/demo_samples/sample_01.jpeg'],
+            ['Golden gate, yacht', './resources/demo_samples/sample_02.jpeg'],]
 output_labels = ['segmentation map']
 title = 'OVSeg'
 description = """
+Gradio Demo for Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP. \n
+OVSeg could perform open vocabulary segmentation, you may input more classes (seperate by comma). You may click on of the examples or upload your own image. \n
+It might take some time to process. Cheers!
 """
 article = """
 Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
 </a>
 |
+<a href='https://github.com/facebookresearch/ov-seg' target='_blank'>Github Repo</a></p>
 """
 gr.Interface(

configs/ovseg_swinB_vitL_bs32_120k.yaml DELETED Viewed

@@ -1,100 +0,0 @@
-MODEL:
-  META_ARCHITECTURE: "OVSeg"
-  BACKBONE:
-    FREEZE_AT: 0
-    NAME: "D2SwinTransformer"
-  SWIN:
-    EMBED_DIM: 128
-    DEPTHS: [2, 2, 18, 2]
-    NUM_HEADS: [4, 8, 16, 32]
-    WINDOW_SIZE: 12
-    APE: False
-    DROP_PATH_RATE: 0.3
-    PATCH_NORM: True
-    PRETRAIN_IMG_SIZE: 384
-  WEIGHTS: "swin_base_patch4_window12_384_22k.pkl"
-  PIXEL_MEAN: [123.675, 116.280, 103.530]
-  PIXEL_STD: [58.395, 57.120, 57.375]
-  SEM_SEG_HEAD:
-    NAME: "OpenVocabMaskFormerHead"
-    IN_FEATURES: ["res2", "res3", "res4", "res5"]
-    IGNORE_VALUE: 255
-    NUM_CLASSES: 171 # number of categories in training set
-    EMBEDDING_DIM: 768
-    EMBED_LAYERS: 2
-    COMMON_STRIDE: 4 # not used, hard-coded
-    LOSS_WEIGHT: 1.0
-    CONVS_DIM: 256
-    MASK_DIM: 256
-    NORM: "GN"
-  MASK_FORMER:
-    TRANSFORMER_IN_FEATURE: "res5"
-    DEEP_SUPERVISION: True
-    NO_OBJECT_WEIGHT: 0.1
-    DICE_WEIGHT: 1.0
-    MASK_WEIGHT: 20.0
-    HIDDEN_DIM: 256
-    NUM_OBJECT_QUERIES: 100
-    NHEADS: 8
-    DROPOUT: 0.1
-    DIM_FEEDFORWARD: 2048
-    ENC_LAYERS: 0
-    DEC_LAYERS: 6
-    PRE_NORM: False
-  CLIP_ADAPTER:
-    TEXT_TEMPLATES: "vild"
-    CLIP_MODEL_NAME: "ViT-L/14"
-    MASK_FILL: "mean"
-    MASK_EXPAND_RATIO: 1.0
-    MASK_THR: 0.4 # choose the foreground objects
-    MASK_MATTING: False # use soft background, default not used
-    MASK_PROMPT_DEPTH: 3
-    MASK_PROMPT_FWD: True # use mask prompt during forward
-    REGION_RESIZED: True # resize to the input of clip, e.g., 224
-    CLIP_ENSEMBLE: True # use ensemble of two classification branches
-    CLIP_ENSEMBLE_WEIGHT: 0.7
-DATASETS:
-  TRAIN: ("coco_2017_train_stuff_sem_seg",)
-  TEST: ("ade20k_sem_seg_val",)
-SOLVER:
-  IMS_PER_BATCH: 32
-  BASE_LR: 0.00006
-  MAX_ITER: 120000
-  WARMUP_FACTOR: 1e-6
-  WARMUP_ITERS: 1500
-  LR_SCHEDULER_NAME: "WarmupPolyLR"
-  WEIGHT_DECAY: 0.01
-  WEIGHT_DECAY_NORM: 0.0
-  WEIGHT_DECAY_EMBED: 0.0
-  BACKBONE_MULTIPLIER: 1.0
-  TEST_IMS_PER_BATCH: 1
-  CLIP_GRADIENTS:
-    ENABLED: True
-    CLIP_TYPE: "full_model"
-    CLIP_VALUE: 0.01
-    NORM_TYPE: 2.0
-INPUT:
-  MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 640) for x in range(5, 21)]"]
-  MIN_SIZE_TRAIN_SAMPLING: "choice"
-  MIN_SIZE_TEST: 640
-  MAX_SIZE_TRAIN: 2560
-  MAX_SIZE_TEST: 2560
-  CROP:
-    ENABLED: True
-    TYPE: "absolute"
-    SIZE: (640, 640)
-    SINGLE_CATEGORY_MAX_AREA: 1.0
-  COLOR_AUG_SSD: True
-  SIZE_DIVISIBILITY: 640  # used in dataset mapper
-  FORMAT: "RGB"
-TEST:
-  EVAL_PERIOD: 5000
-  AUG:
-    ENABLED: False
-    MIN_SIZES: [256, 384, 512, 640, 768, 896]
-    MAX_SIZE: 3584
-    FLIP: True
-DATALOADER:
-  FILTER_EMPTY_ANNOTATIONS: True
-  NUM_WORKERS: 4
-VERSION: 2

datasets/DATASETS.md DELETED Viewed

@@ -1,122 +0,0 @@
-## Prepare Datasets for OVSeg
-This doc is a modification/extension of [MaskFormer](https://github.com/facebookresearch/MaskFormer/blob/main/datasets/README.md) following [Detectron2 fromat](https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html).
-A dataset can be used by accessing [DatasetCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.DatasetCatalog)
-for its data, or [MetadataCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.MetadataCatalog) for its metadata (class names, etc).
-This document explains how to setup the builtin datasets so they can be used by the above APIs.
-[Use Custom Datasets](https://detectron2.readthedocs.io/tutorials/datasets.html) gives a deeper dive on how to use `DatasetCatalog` and `MetadataCatalog`,
-and how to add new datasets to them.
-OVSeg has builtin support for a few datasets.
-The datasets are assumed to exist in a directory specified by the environment variable
-`DETECTRON2_DATASETS`.
-Under this directory, detectron2 will look for datasets in the structure described below, if needed.
-```
-$DETECTRON2_DATASETS/
-  coco/                 # COCOStuff-171
-  ADEChallengeData2016/ # ADE20K-150
-  ADE20K_2021_17_01/    # ADE20K-847
-  VOCdevkit/
-    VOC2012/            # PASCALVOC-20
-    VOC2010/            # PASCALContext-59, PASCALContext-459
-```
-You can set the location for builtin datasets by `export DETECTRON2_DATASETS=/path/to/datasets`.
-If left unset, the default is `./datasets` relative to your current working directory.
-Without specific notifications, our model is trained on COCOStuff-171 and evlauted on ADE20K-150, ADE20K-847, PASCALVOC-20, PASCALContext-59 and PASCALContext-459.
-|     dataset    |   split   | # images | # categories |
-|:--------------:|:---------:|:--------:|:------------:|
-|   COCO Stuff   | train2017 |   118K   |      171     |
-|     ADE20K     |    val    |    2K    |    150/847   |
-|   Pascal VOC   |    val    |   1.5K   |      20      |
-| Pascal Context |    val    |    5K    |    59/459    |
-### Expected dataset structure for [COCO Stuff](https://github.com/nightrome/cocostuff):
-```
-coco/
-  train2017/ # http://images.cocodataset.org/zips/train2017.zip
-  annotations/ # http://images.cocodataset.org/annotations/annotations_trainval2017.zip
-  stuffthingmaps/
-    stuffthingmaps_trainval2017.zip # http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip
-    train2017/
-  # below are generated
-  stuffthingmaps_detectron2/
-    train2017/
-```
-The directory `stuffthingmaps_detectron2` is generated by running `python datasets/prepare_coco_stuff_sem_seg.py`.
-### Expected dataset structure for [ADE20k Scene Parsing (ADE20K-150)](http://sceneparsing.csail.mit.edu/):
-```
-ADEChallengeData2016/
-  annotations/
-  images/
-  objectInfo150.txt
-  # below are generated
-  annotations_detectron2/
-```
-The directory `annotations_detectron2` is generated by running `python datasets/prepare_ade20k_sem_seg.py`.
-### Expected dataset structure for [ADE20k-Full (ADE20K-847)](https://github.com/CSAILVision/ADE20K#download):
-```
-ADE20K_2021_17_01/
-  images/
-  index_ade20k.pkl
-  objects.txt
-  # below are generated
-  images_detectron2/
-  annotations_detectron2/
-```
-The directories `images_detectron2` and `annotations_detectron2` are generated by running `python datasets/prepare_ade20k_full_sem_seg.py`.
-### Expected dataset structure for [Pascal VOC 2012 (PASCALVOC-20)](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit):
-```
-VOCdevkit/VOC2012/
-  Annotations/
-  ImageSets/
-  JPEGImages/
-  SegmentationClass/
-  SegmentationObject/
-  SegmentationClassAug/ # https://github.com/kazuto1011/deeplab-pytorch/blob/master/data/datasets/voc12/README.md
-  # below are generated
-  images_detectron2/
-  annotations_detectron2/
-```
-It starts with a tar file `VOCtrainval_11-May-2012.tar`.
-We use SBD augmentated training data as `SegmentationClassAug` following [Deeplab](https://github.com/kazuto1011/deeplab-pytorch/blob/master/data/datasets/voc12/README.md)
-The directories `images_detectron2` and `annotations_detectron2` are generated by running `python datasets/prepare_voc_sem_seg.py`.
-### Expected dataset structure for [Pascal Context](https://www.cs.stanford.edu/~roozbeh/pascal-context/):
-```
-VOCdevkit/VOC2010/
-  Annotations/
-  ImageSets/
-  JPEGImages/
-  SegmentationClass/
-  SegmentationObject/
-  # below are from https://www.cs.stanford.edu/~roozbeh/pascal-context/trainval.tar.gz
-  trainval/
-  labels.txt
-  59_labels.txt # https://www.cs.stanford.edu/~roozbeh/pascal-context/59_labels.txt
-  pascalcontext_val.txt # https://drive.google.com/file/d/1BCbiOKtLvozjVnlTJX51koIveUZHCcUh/view?usp=sharing
-  # below are generated
-  annotations_detectron2/
-    pc459_val
-    pc59_val
-```
-It starts with a tar file `VOCtrainval_03-May-2010.tar`. You may want to download the 5K validation set [here](https://drive.google.com/file/d/1BCbiOKtLvozjVnlTJX51koIveUZHCcUh/view?usp=sharing).
-The directory `annotations_detectron2` is generated by running `python datasets/prepare_pascal_context.py`.

datasets/prepare_ade20k_full_sem_seg.py DELETED Viewed

@@ -1,1011 +0,0 @@
-# Copyright (c) Facebook, Inc. and its affiliates.
-# Copyright (c) Meta Platforms, Inc. All Rights Reserved
-import os
-import pickle as pkl
-from pathlib import Path
-import cv2
-import numpy as np
-import tqdm
-from PIL import Image
-ADE20K_SEM_SEG_FULL_CATEGORIES = [
-    {"name": "wall", "id": 2978, "trainId": 0},
-    {"name": "building, edifice", "id": 312, "trainId": 1},
-    {"name": "sky", "id": 2420, "trainId": 2},
-    {"name": "tree", "id": 2855, "trainId": 3},
-    {"name": "road, route", "id": 2131, "trainId": 4},
-    {"name": "floor, flooring", "id": 976, "trainId": 5},
-    {"name": "ceiling", "id": 447, "trainId": 6},
-    {"name": "bed", "id": 165, "trainId": 7},
-    {"name": "sidewalk, pavement", "id": 2377, "trainId": 8},
-    {"name": "earth, ground", "id": 838, "trainId": 9},
-    {"name": "cabinet", "id": 350, "trainId": 10},
-    {"name": "person, individual, someone, somebody, mortal, soul", "id": 1831, "trainId": 11},
-    {"name": "grass", "id": 1125, "trainId": 12},
-    {"name": "windowpane, window", "id": 3055, "trainId": 13},
-    {"name": "car, auto, automobile, machine, motorcar", "id": 401, "trainId": 14},
-    {"name": "mountain, mount", "id": 1610, "trainId": 15},
-    {"name": "plant, flora, plant life", "id": 1910, "trainId": 16},
-    {"name": "table", "id": 2684, "trainId": 17},
-    {"name": "chair", "id": 471, "trainId": 18},
-    {"name": "curtain, drape, drapery, mantle, pall", "id": 687, "trainId": 19},
-    {"name": "door", "id": 774, "trainId": 20},
-    {"name": "sofa, couch, lounge", "id": 2473, "trainId": 21},
-    {"name": "sea", "id": 2264, "trainId": 22},
-    {"name": "painting, picture", "id": 1735, "trainId": 23},
-    {"name": "water", "id": 2994, "trainId": 24},
-    {"name": "mirror", "id": 1564, "trainId": 25},
-    {"name": "house", "id": 1276, "trainId": 26},
-    {"name": "rug, carpet, carpeting", "id": 2178, "trainId": 27},
-    {"name": "shelf", "id": 2329, "trainId": 28},
-    {"name": "armchair", "id": 57, "trainId": 29},
-    {"name": "fence, fencing", "id": 907, "trainId": 30},
-    {"name": "field", "id": 913, "trainId": 31},
-    {"name": "lamp", "id": 1395, "trainId": 32},
-    {"name": "rock, stone", "id": 2138, "trainId": 33},
-    {"name": "seat", "id": 2272, "trainId": 34},
-    {"name": "river", "id": 2128, "trainId": 35},
-    {"name": "desk", "id": 724, "trainId": 36},
-    {"name": "bathtub, bathing tub, bath, tub", "id": 155, "trainId": 37},
-    {"name": "railing, rail", "id": 2053, "trainId": 38},
-    {"name": "signboard, sign", "id": 2380, "trainId": 39},
-    {"name": "cushion", "id": 689, "trainId": 40},
-    {"name": "path", "id": 1788, "trainId": 41},
-    {"name": "work surface", "id": 3087, "trainId": 42},
-    {"name": "stairs, steps", "id": 2530, "trainId": 43},
-    {"name": "column, pillar", "id": 581, "trainId": 44},
-    {"name": "sink", "id": 2388, "trainId": 45},
-    {"name": "wardrobe, closet, press", "id": 2985, "trainId": 46},
-    {"name": "snow", "id": 2454, "trainId": 47},
-    {"name": "refrigerator, icebox", "id": 2096, "trainId": 48},
-    {"name": "base, pedestal, stand", "id": 137, "trainId": 49},
-    {"name": "bridge, span", "id": 294, "trainId": 50},
-    {"name": "blind, screen", "id": 212, "trainId": 51},
-    {"name": "runway", "id": 2185, "trainId": 52},
-    {"name": "cliff, drop, drop-off", "id": 524, "trainId": 53},
-    {"name": "sand", "id": 2212, "trainId": 54},
-    {"name": "fireplace, hearth, open fireplace", "id": 943, "trainId": 55},
-    {"name": "pillow", "id": 1869, "trainId": 56},
-    {"name": "screen door, screen", "id": 2251, "trainId": 57},
-    {"name": "toilet, can, commode, crapper, pot, potty, stool, throne", "id": 2793, "trainId": 58},
-    {"name": "skyscraper", "id": 2423, "trainId": 59},
-    {"name": "grandstand, covered stand", "id": 1121, "trainId": 60},
-    {"name": "box", "id": 266, "trainId": 61},
-    {"name": "pool table, billiard table, snooker table", "id": 1948, "trainId": 62},
-    {"name": "palm, palm tree", "id": 1744, "trainId": 63},
-    {"name": "double door", "id": 783, "trainId": 64},
-    {"name": "coffee table, cocktail table", "id": 571, "trainId": 65},
-    {"name": "counter", "id": 627, "trainId": 66},
-    {"name": "countertop", "id": 629, "trainId": 67},
-    {"name": "chest of drawers, chest, bureau, dresser", "id": 491, "trainId": 68},
-    {"name": "kitchen island", "id": 1374, "trainId": 69},
-    {"name": "boat", "id": 223, "trainId": 70},
-    {"name": "waterfall, falls", "id": 3016, "trainId": 71},
-    {
-        "name": "stove, kitchen stove, range, kitchen range, cooking stove",
-        "id": 2598,
-        "trainId": 72,
-    },
-    {"name": "flower", "id": 978, "trainId": 73},
-    {"name": "bookcase", "id": 239, "trainId": 74},
-    {"name": "controls", "id": 608, "trainId": 75},
-    {"name": "book", "id": 236, "trainId": 76},
-    {"name": "stairway, staircase", "id": 2531, "trainId": 77},
-    {"name": "streetlight, street lamp", "id": 2616, "trainId": 78},
-    {
-        "name": "computer, computing machine, computing device, data processor, electronic computer, information processing system",
-        "id": 591,
-        "trainId": 79,
-    },
-    {
-        "name": "bus, autobus, coach, charabanc, double-decker, jitney, motorbus, motorcoach, omnibus, passenger vehicle",
-        "id": 327,
-        "trainId": 80,
-    },
-    {"name": "swivel chair", "id": 2679, "trainId": 81},
-    {"name": "light, light source", "id": 1451, "trainId": 82},
-    {"name": "bench", "id": 181, "trainId": 83},
-    {"name": "case, display case, showcase, vitrine", "id": 420, "trainId": 84},
-    {"name": "towel", "id": 2821, "trainId": 85},
-    {"name": "fountain", "id": 1023, "trainId": 86},
-    {"name": "embankment", "id": 855, "trainId": 87},
-    {
-        "name": "television receiver, television, television set, tv, tv set, idiot box, boob tube, telly, goggle box",
-        "id": 2733,
-        "trainId": 88,
-    },
-    {"name": "van", "id": 2928, "trainId": 89},
-    {"name": "hill", "id": 1240, "trainId": 90},
-    {"name": "awning, sunshade, sunblind", "id": 77, "trainId": 91},
-    {"name": "poster, posting, placard, notice, bill, card", "id": 1969, "trainId": 92},
-    {"name": "truck, motortruck", "id": 2880, "trainId": 93},
-    {"name": "airplane, aeroplane, plane", "id": 14, "trainId": 94},
-    {"name": "pole", "id": 1936, "trainId": 95},
-    {"name": "tower", "id": 2828, "trainId": 96},
-    {"name": "court", "id": 631, "trainId": 97},
-    {"name": "ball", "id": 103, "trainId": 98},
-    {
-        "name": "aircraft carrier, carrier, flattop, attack aircraft carrier",
-        "id": 3144,
-        "trainId": 99,
-    },
-    {"name": "buffet, counter, sideboard", "id": 308, "trainId": 100},
-    {"name": "hovel, hut, hutch, shack, shanty", "id": 1282, "trainId": 101},
-    {"name": "apparel, wearing apparel, dress, clothes", "id": 38, "trainId": 102},
-    {"name": "minibike, motorbike", "id": 1563, "trainId": 103},
-    {"name": "animal, animate being, beast, brute, creature, fauna", "id": 29, "trainId": 104},
-    {"name": "chandelier, pendant, pendent", "id": 480, "trainId": 105},
-    {"name": "step, stair", "id": 2569, "trainId": 106},
-    {"name": "booth, cubicle, stall, kiosk", "id": 247, "trainId": 107},
-    {"name": "bicycle, bike, wheel, cycle", "id": 187, "trainId": 108},
-    {"name": "doorframe, doorcase", "id": 778, "trainId": 109},
-    {"name": "sconce", "id": 2243, "trainId": 110},
-    {"name": "pond", "id": 1941, "trainId": 111},
-    {"name": "trade name, brand name, brand, marque", "id": 2833, "trainId": 112},
-    {"name": "bannister, banister, balustrade, balusters, handrail", "id": 120, "trainId": 113},
-    {"name": "bag", "id": 95, "trainId": 114},
-    {"name": "traffic light, traffic signal, stoplight", "id": 2836, "trainId": 115},
-    {"name": "gazebo", "id": 1087, "trainId": 116},
-    {"name": "escalator, moving staircase, moving stairway", "id": 868, "trainId": 117},
-    {"name": "land, ground, soil", "id": 1401, "trainId": 118},
-    {"name": "board, plank", "id": 220, "trainId": 119},
-    {"name": "arcade machine", "id": 47, "trainId": 120},
-    {"name": "eiderdown, duvet, continental quilt", "id": 843, "trainId": 121},
-    {"name": "bar", "id": 123, "trainId": 122},
-    {"name": "stall, stand, sales booth", "id": 2537, "trainId": 123},
-    {"name": "playground", "id": 1927, "trainId": 124},
-    {"name": "ship", "id": 2337, "trainId": 125},
-    {"name": "ottoman, pouf, pouffe, puff, hassock", "id": 1702, "trainId": 126},
-    {
-        "name": "ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin",
-        "id": 64,
-        "trainId": 127,
-    },
-    {"name": "bottle", "id": 249, "trainId": 128},
-    {"name": "cradle", "id": 642, "trainId": 129},
-    {"name": "pot, flowerpot", "id": 1981, "trainId": 130},
-    {
-        "name": "conveyer belt, conveyor belt, conveyer, conveyor, transporter",
-        "id": 609,
-        "trainId": 131,
-    },
-    {"name": "train, railroad train", "id": 2840, "trainId": 132},
-    {"name": "stool", "id": 2586, "trainId": 133},
-    {"name": "lake", "id": 1393, "trainId": 134},
-    {"name": "tank, storage tank", "id": 2704, "trainId": 135},
-    {"name": "ice, water ice", "id": 1304, "trainId": 136},
-    {"name": "basket, handbasket", "id": 146, "trainId": 137},
-    {"name": "manhole", "id": 1494, "trainId": 138},
-    {"name": "tent, collapsible shelter", "id": 2739, "trainId": 139},
-    {"name": "canopy", "id": 389, "trainId": 140},
-    {"name": "microwave, microwave oven", "id": 1551, "trainId": 141},
-    {"name": "barrel, cask", "id": 131, "trainId": 142},
-    {"name": "dirt track", "id": 738, "trainId": 143},
-    {"name": "beam", "id": 161, "trainId": 144},
-    {"name": "dishwasher, dish washer, dishwashing machine", "id": 747, "trainId": 145},
-    {"name": "plate", "id": 1919, "trainId": 146},
-    {"name": "screen, crt screen", "id": 3109, "trainId": 147},
-    {"name": "ruins", "id": 2179, "trainId": 148},
-    {"name": "washer, automatic washer, washing machine", "id": 2989, "trainId": 149},
-    {"name": "blanket, cover", "id": 206, "trainId": 150},
-    {"name": "plaything, toy", "id": 1930, "trainId": 151},
-    {"name": "food, solid food", "id": 1002, "trainId": 152},
-    {"name": "screen, silver screen, projection screen", "id": 2254, "trainId": 153},
-    {"name": "oven", "id": 1708, "trainId": 154},
-    {"name": "stage", "id": 2526, "trainId": 155},
-    {"name": "beacon, lighthouse, beacon light, pharos", "id": 160, "trainId": 156},
-    {"name": "umbrella", "id": 2901, "trainId": 157},
-    {"name": "sculpture", "id": 2262, "trainId": 158},
-    {"name": "aqueduct", "id": 44, "trainId": 159},
-    {"name": "container", "id": 597, "trainId": 160},
-    {"name": "scaffolding, staging", "id": 2235, "trainId": 161},
-    {"name": "hood, exhaust hood", "id": 1260, "trainId": 162},
-    {"name": "curb, curbing, kerb", "id": 682, "trainId": 163},
-    {"name": "roller coaster", "id": 2151, "trainId": 164},
-    {"name": "horse, equus caballus", "id": 3107, "trainId": 165},
-    {"name": "catwalk", "id": 432, "trainId": 166},
-    {"name": "glass, drinking glass", "id": 1098, "trainId": 167},
-    {"name": "vase", "id": 2932, "trainId": 168},
-    {"name": "central reservation", "id": 461, "trainId": 169},
-    {"name": "carousel", "id": 410, "trainId": 170},
-    {"name": "radiator", "id": 2046, "trainId": 171},
-    {"name": "closet", "id": 533, "trainId": 172},
-    {"name": "machine", "id": 1481, "trainId": 173},
-    {"name": "pier, wharf, wharfage, dock", "id": 1858, "trainId": 174},
-    {"name": "fan", "id": 894, "trainId": 175},
-    {"name": "inflatable bounce game", "id": 1322, "trainId": 176},
-    {"name": "pitch", "id": 1891, "trainId": 177},
-    {"name": "paper", "id": 1756, "trainId": 178},
-    {"name": "arcade, colonnade", "id": 49, "trainId": 179},
-    {"name": "hot tub", "id": 1272, "trainId": 180},
-    {"name": "helicopter", "id": 1229, "trainId": 181},
-    {"name": "tray", "id": 2850, "trainId": 182},
-    {"name": "partition, divider", "id": 1784, "trainId": 183},
-    {"name": "vineyard", "id": 2962, "trainId": 184},
-    {"name": "bowl", "id": 259, "trainId": 185},
-    {"name": "bullring", "id": 319, "trainId": 186},
-    {"name": "flag", "id": 954, "trainId": 187},
-    {"name": "pot", "id": 1974, "trainId": 188},
-    {"name": "footbridge, overcrossing, pedestrian bridge", "id": 1013, "trainId": 189},
-    {"name": "shower", "id": 2356, "trainId": 190},
-    {"name": "bag, traveling bag, travelling bag, grip, suitcase", "id": 97, "trainId": 191},
-    {"name": "bulletin board, notice board", "id": 318, "trainId": 192},
-    {"name": "confessional booth", "id": 592, "trainId": 193},
-    {"name": "trunk, tree trunk, bole", "id": 2885, "trainId": 194},
-    {"name": "forest", "id": 1017, "trainId": 195},
-    {"name": "elevator door", "id": 851, "trainId": 196},
-    {"name": "laptop, laptop computer", "id": 1407, "trainId": 197},
-    {"name": "instrument panel", "id": 1332, "trainId": 198},
-    {"name": "bucket, pail", "id": 303, "trainId": 199},
-    {"name": "tapestry, tapis", "id": 2714, "trainId": 200},
-    {"name": "platform", "id": 1924, "trainId": 201},
-    {"name": "jacket", "id": 1346, "trainId": 202},
-    {"name": "gate", "id": 1081, "trainId": 203},
-    {"name": "monitor, monitoring device", "id": 1583, "trainId": 204},
-    {
-        "name": "telephone booth, phone booth, call box, telephone box, telephone kiosk",
-        "id": 2727,
-        "trainId": 205,
-    },
-    {"name": "spotlight, spot", "id": 2509, "trainId": 206},
-    {"name": "ring", "id": 2123, "trainId": 207},
-    {"name": "control panel", "id": 602, "trainId": 208},
-    {"name": "blackboard, chalkboard", "id": 202, "trainId": 209},
-    {"name": "air conditioner, air conditioning", "id": 10, "trainId": 210},
-    {"name": "chest", "id": 490, "trainId": 211},
-    {"name": "clock", "id": 530, "trainId": 212},
-    {"name": "sand dune", "id": 2213, "trainId": 213},
-    {"name": "pipe, pipage, piping", "id": 1884, "trainId": 214},
-    {"name": "vault", "id": 2934, "trainId": 215},
-    {"name": "table football", "id": 2687, "trainId": 216},
-    {"name": "cannon", "id": 387, "trainId": 217},
-    {"name": "swimming pool, swimming bath, natatorium", "id": 2668, "trainId": 218},
-    {"name": "fluorescent, fluorescent fixture", "id": 982, "trainId": 219},
-    {"name": "statue", "id": 2547, "trainId": 220},
-    {
-        "name": "loudspeaker, speaker, speaker unit, loudspeaker system, speaker system",
-        "id": 1474,
-        "trainId": 221,
-    },
-    {"name": "exhibitor", "id": 877, "trainId": 222},
-    {"name": "ladder", "id": 1391, "trainId": 223},
-    {"name": "carport", "id": 414, "trainId": 224},
-    {"name": "dam", "id": 698, "trainId": 225},
-    {"name": "pulpit", "id": 2019, "trainId": 226},
-    {"name": "skylight, fanlight", "id": 2422, "trainId": 227},
-    {"name": "water tower", "id": 3010, "trainId": 228},
-    {"name": "grill, grille, grillwork", "id": 1139, "trainId": 229},
-    {"name": "display board", "id": 753, "trainId": 230},
-    {"name": "pane, pane of glass, window glass", "id": 1747, "trainId": 231},
-    {"name": "rubbish, trash, scrap", "id": 2175, "trainId": 232},
-    {"name": "ice rink", "id": 1301, "trainId": 233},
-    {"name": "fruit", "id": 1033, "trainId": 234},
-    {"name": "patio", "id": 1789, "trainId": 235},
-    {"name": "vending machine", "id": 2939, "trainId": 236},
-    {"name": "telephone, phone, telephone set", "id": 2730, "trainId": 237},
-    {"name": "net", "id": 1652, "trainId": 238},
-    {
-        "name": "backpack, back pack, knapsack, packsack, rucksack, haversack",
-        "id": 90,
-        "trainId": 239,
-    },
-    {"name": "jar", "id": 1349, "trainId": 240},
-    {"name": "track", "id": 2830, "trainId": 241},
-    {"name": "magazine", "id": 1485, "trainId": 242},
-    {"name": "shutter", "id": 2370, "trainId": 243},
-    {"name": "roof", "id": 2155, "trainId": 244},
-    {"name": "banner, streamer", "id": 118, "trainId": 245},
-    {"name": "landfill", "id": 1402, "trainId": 246},
-    {"name": "post", "id": 1957, "trainId": 247},
-    {"name": "altarpiece, reredos", "id": 3130, "trainId": 248},
-    {"name": "hat, chapeau, lid", "id": 1197, "trainId": 249},
-    {"name": "arch, archway", "id": 52, "trainId": 250},
-    {"name": "table game", "id": 2688, "trainId": 251},
-    {"name": "bag, handbag, pocketbook, purse", "id": 96, "trainId": 252},
-    {"name": "document, written document, papers", "id": 762, "trainId": 253},
-    {"name": "dome", "id": 772, "trainId": 254},
-    {"name": "pier", "id": 1857, "trainId": 255},
-    {"name": "shanties", "id": 2315, "trainId": 256},
-    {"name": "forecourt", "id": 1016, "trainId": 257},
-    {"name": "crane", "id": 643, "trainId": 258},
-    {"name": "dog, domestic dog, canis familiaris", "id": 3105, "trainId": 259},
-    {"name": "piano, pianoforte, forte-piano", "id": 1849, "trainId": 260},
-    {"name": "drawing", "id": 791, "trainId": 261},
-    {"name": "cabin", "id": 349, "trainId": 262},
-    {
-        "name": "ad, advertisement, advertizement, advertising, advertizing, advert",
-        "id": 6,
-        "trainId": 263,
-    },
-    {"name": "amphitheater, amphitheatre, coliseum", "id": 3114, "trainId": 264},
-    {"name": "monument", "id": 1587, "trainId": 265},
-    {"name": "henhouse", "id": 1233, "trainId": 266},
-    {"name": "cockpit", "id": 559, "trainId": 267},
-    {"name": "heater, warmer", "id": 1223, "trainId": 268},
-    {"name": "windmill, aerogenerator, wind generator", "id": 3049, "trainId": 269},
-    {"name": "pool", "id": 1943, "trainId": 270},
-    {"name": "elevator, lift", "id": 853, "trainId": 271},
-    {"name": "decoration, ornament, ornamentation", "id": 709, "trainId": 272},
-    {"name": "labyrinth", "id": 1390, "trainId": 273},
-    {"name": "text, textual matter", "id": 2748, "trainId": 274},
-    {"name": "printer", "id": 2007, "trainId": 275},
-    {"name": "mezzanine, first balcony", "id": 1546, "trainId": 276},
-    {"name": "mattress", "id": 1513, "trainId": 277},
-    {"name": "straw", "id": 2600, "trainId": 278},
-    {"name": "stalls", "id": 2538, "trainId": 279},
-    {"name": "patio, terrace", "id": 1790, "trainId": 280},
-    {"name": "billboard, hoarding", "id": 194, "trainId": 281},
-    {"name": "bus stop", "id": 326, "trainId": 282},
-    {"name": "trouser, pant", "id": 2877, "trainId": 283},
-    {"name": "console table, console", "id": 594, "trainId": 284},
-    {"name": "rack", "id": 2036, "trainId": 285},
-    {"name": "notebook", "id": 1662, "trainId": 286},
-    {"name": "shrine", "id": 2366, "trainId": 287},
-    {"name": "pantry", "id": 1754, "trainId": 288},
-    {"name": "cart", "id": 418, "trainId": 289},
-    {"name": "steam shovel", "id": 2553, "trainId": 290},
-    {"name": "porch", "id": 1951, "trainId": 291},
-    {"name": "postbox, mailbox, letter box", "id": 1963, "trainId": 292},
-    {"name": "figurine, statuette", "id": 918, "trainId": 293},
-    {"name": "recycling bin", "id": 2086, "trainId": 294},
-    {"name": "folding screen", "id": 997, "trainId": 295},
-    {"name": "telescope", "id": 2731, "trainId": 296},
-    {"name": "deck chair, beach chair", "id": 704, "trainId": 297},
-    {"name": "kennel", "id": 1365, "trainId": 298},
-    {"name": "coffee maker", "id": 569, "trainId": 299},
-    {"name": "altar, communion table, lord's table", "id": 3108, "trainId": 300},
-    {"name": "fish", "id": 948, "trainId": 301},
-    {"name": "easel", "id": 839, "trainId": 302},
-    {"name": "artificial golf green", "id": 63, "trainId": 303},
-    {"name": "iceberg", "id": 1305, "trainId": 304},
-    {"name": "candlestick, candle holder", "id": 378, "trainId": 305},
-    {"name": "shower stall, shower bath", "id": 2362, "trainId": 306},
-    {"name": "television stand", "id": 2734, "trainId": 307},
-    {
-        "name": "wall socket, wall plug, electric outlet, electrical outlet, outlet, electric receptacle",
-        "id": 2982,
-        "trainId": 308,
-    },
-    {"name": "skeleton", "id": 2398, "trainId": 309},
-    {"name": "grand piano, grand", "id": 1119, "trainId": 310},
-    {"name": "candy, confect", "id": 382, "trainId": 311},
-    {"name": "grille door", "id": 1141, "trainId": 312},
-    {"name": "pedestal, plinth, footstall", "id": 1805, "trainId": 313},
-    {"name": "jersey, t-shirt, tee shirt", "id": 3102, "trainId": 314},
-    {"name": "shoe", "id": 2341, "trainId": 315},
-    {"name": "gravestone, headstone, tombstone", "id": 1131, "trainId": 316},
-    {"name": "shanty", "id": 2316, "trainId": 317},
-    {"name": "structure", "id": 2626, "trainId": 318},
-    {"name": "rocking chair, rocker", "id": 3104, "trainId": 319},
-    {"name": "bird", "id": 198, "trainId": 320},
-    {"name": "place mat", "id": 1896, "trainId": 321},
-    {"name": "tomb", "id": 2800, "trainId": 322},
-    {"name": "big top", "id": 190, "trainId": 323},
-    {"name": "gas pump, gasoline pump, petrol pump, island dispenser", "id": 3131, "trainId": 324},
-    {"name": "lockers", "id": 1463, "trainId": 325},
-    {"name": "cage", "id": 357, "trainId": 326},
-    {"name": "finger", "id": 929, "trainId": 327},
-    {"name": "bleachers", "id": 209, "trainId": 328},
-    {"name": "ferris wheel", "id": 912, "trainId": 329},
-    {"name": "hairdresser chair", "id": 1164, "trainId": 330},
-    {"name": "mat", "id": 1509, "trainId": 331},
-    {"name": "stands", "id": 2539, "trainId": 332},
-    {"name": "aquarium, fish tank, marine museum", "id": 3116, "trainId": 333},
-    {"name": "streetcar, tram, tramcar, trolley, trolley car", "id": 2615, "trainId": 334},
-    {"name": "napkin, table napkin, serviette", "id": 1644, "trainId": 335},
-    {"name": "dummy", "id": 818, "trainId": 336},
-    {"name": "booklet, brochure, folder, leaflet, pamphlet", "id": 242, "trainId": 337},
-    {"name": "sand trap", "id": 2217, "trainId": 338},
-    {"name": "shop, store", "id": 2347, "trainId": 339},
-    {"name": "table cloth", "id": 2686, "trainId": 340},
-    {"name": "service station", "id": 2300, "trainId": 341},
-    {"name": "coffin", "id": 572, "trainId": 342},
-    {"name": "drawer", "id": 789, "trainId": 343},
-    {"name": "cages", "id": 358, "trainId": 344},
-    {"name": "slot machine, coin machine", "id": 2443, "trainId": 345},
-    {"name": "balcony", "id": 101, "trainId": 346},
-    {"name": "volleyball court", "id": 2969, "trainId": 347},
-    {"name": "table tennis", "id": 2692, "trainId": 348},
-    {"name": "control table", "id": 606, "trainId": 349},
-    {"name": "shirt", "id": 2339, "trainId": 350},
-    {"name": "merchandise, ware, product", "id": 1533, "trainId": 351},
-    {"name": "railway", "id": 2060, "trainId": 352},
-    {"name": "parterre", "id": 1782, "trainId": 353},
-    {"name": "chimney", "id": 495, "trainId": 354},
-    {"name": "can, tin, tin can", "id": 371, "trainId": 355},
-    {"name": "tanks", "id": 2707, "trainId": 356},
-    {"name": "fabric, cloth, material, textile", "id": 889, "trainId": 357},
-    {"name": "alga, algae", "id": 3156, "trainId": 358},
-    {"name": "system", "id": 2683, "trainId": 359},
-    {"name": "map", "id": 1499, "trainId": 360},
-    {"name": "greenhouse", "id": 1135, "trainId": 361},
-    {"name": "mug", "id": 1619, "trainId": 362},
-    {"name": "barbecue", "id": 125, "trainId": 363},
-    {"name": "trailer", "id": 2838, "trainId": 364},
-    {"name": "toilet tissue, toilet paper, bathroom tissue", "id": 2792, "trainId": 365},
-    {"name": "organ", "id": 1695, "trainId": 366},
-    {"name": "dishrag, dishcloth", "id": 746, "trainId": 367},
-    {"name": "island", "id": 1343, "trainId": 368},
-    {"name": "keyboard", "id": 1370, "trainId": 369},
-    {"name": "trench", "id": 2858, "trainId": 370},
-    {"name": "basket, basketball hoop, hoop", "id": 145, "trainId": 371},
-    {"name": "steering wheel, wheel", "id": 2565, "trainId": 372},
-    {"name": "pitcher, ewer", "id": 1892, "trainId": 373},
-    {"name": "goal", "id": 1103, "trainId": 374},
-    {"name": "bread, breadstuff, staff of life", "id": 286, "trainId": 375},
-    {"name": "beds", "id": 170, "trainId": 376},
-    {"name": "wood", "id": 3073, "trainId": 377},
-    {"name": "file cabinet", "id": 922, "trainId": 378},
-    {"name": "newspaper, paper", "id": 1655, "trainId": 379},
-    {"name": "motorboat", "id": 1602, "trainId": 380},
-    {"name": "rope", "id": 2160, "trainId": 381},
-    {"name": "guitar", "id": 1151, "trainId": 382},
-    {"name": "rubble", "id": 2176, "trainId": 383},
-    {"name": "scarf", "id": 2239, "trainId": 384},
-    {"name": "barrels", "id": 132, "trainId": 385},
-    {"name": "cap", "id": 394, "trainId": 386},
-    {"name": "leaves", "id": 1424, "trainId": 387},
-    {"name": "control tower", "id": 607, "trainId": 388},
-    {"name": "dashboard", "id": 700, "trainId": 389},
-    {"name": "bandstand", "id": 116, "trainId": 390},
-    {"name": "lectern", "id": 1425, "trainId": 391},
-    {"name": "switch, electric switch, electrical switch", "id": 2676, "trainId": 392},
-    {"name": "baseboard, mopboard, skirting board", "id": 141, "trainId": 393},
-    {"name": "shower room", "id": 2360, "trainId": 394},
-    {"name": "smoke", "id": 2449, "trainId": 395},
-    {"name": "faucet, spigot", "id": 897, "trainId": 396},
-    {"name": "bulldozer", "id": 317, "trainId": 397},
-    {"name": "saucepan", "id": 2228, "trainId": 398},
-    {"name": "shops", "id": 2351, "trainId": 399},
-    {"name": "meter", "id": 1543, "trainId": 400},
-    {"name": "crevasse", "id": 656, "trainId": 401},
-    {"name": "gear", "id": 1088, "trainId": 402},
-    {"name": "candelabrum, candelabra", "id": 373, "trainId": 403},
-    {"name": "sofa bed", "id": 2472, "trainId": 404},
-    {"name": "tunnel", "id": 2892, "trainId": 405},
-    {"name": "pallet", "id": 1740, "trainId": 406},
-    {"name": "wire, conducting wire", "id": 3067, "trainId": 407},
-    {"name": "kettle, boiler", "id": 1367, "trainId": 408},
-    {"name": "bidet", "id": 188, "trainId": 409},
-    {
-        "name": "baby buggy, baby carriage, carriage, perambulator, pram, stroller, go-cart, pushchair, pusher",
-        "id": 79,
-        "trainId": 410,
-    },
-    {"name": "music stand", "id": 1633, "trainId": 411},
-    {"name": "pipe, tube", "id": 1885, "trainId": 412},
-    {"name": "cup", "id": 677, "trainId": 413},
-    {"name": "parking meter", "id": 1779, "trainId": 414},
-    {"name": "ice hockey rink", "id": 1297, "trainId": 415},
-    {"name": "shelter", "id": 2334, "trainId": 416},
-    {"name": "weeds", "id": 3027, "trainId": 417},
-    {"name": "temple", "id": 2735, "trainId": 418},
-    {"name": "patty, cake", "id": 1791, "trainId": 419},
-    {"name": "ski slope", "id": 2405, "trainId": 420},
-    {"name": "panel", "id": 1748, "trainId": 421},
-    {"name": "wallet", "id": 2983, "trainId": 422},
-    {"name": "wheel", "id": 3035, "trainId": 423},
-    {"name": "towel rack, towel horse", "id": 2824, "trainId": 424},
-    {"name": "roundabout", "id": 2168, "trainId": 425},
-    {"name": "canister, cannister, tin", "id": 385, "trainId": 426},
-    {"name": "rod", "id": 2148, "trainId": 427},
-    {"name": "soap dispenser", "id": 2465, "trainId": 428},
-    {"name": "bell", "id": 175, "trainId": 429},
-    {"name": "canvas", "id": 390, "trainId": 430},
-    {"name": "box office, ticket office, ticket booth", "id": 268, "trainId": 431},
-    {"name": "teacup", "id": 2722, "trainId": 432},
-    {"name": "trellis", "id": 2857, "trainId": 433},
-    {"name": "workbench", "id": 3088, "trainId": 434},
-    {"name": "valley, vale", "id": 2926, "trainId": 435},
-    {"name": "toaster", "id": 2782, "trainId": 436},
-    {"name": "knife", "id": 1378, "trainId": 437},
-    {"name": "podium", "id": 1934, "trainId": 438},
-    {"name": "ramp", "id": 2072, "trainId": 439},
-    {"name": "tumble dryer", "id": 2889, "trainId": 440},
-    {"name": "fireplug, fire hydrant, plug", "id": 944, "trainId": 441},
-    {"name": "gym shoe, sneaker, tennis shoe", "id": 1158, "trainId": 442},
-    {"name": "lab bench", "id": 1383, "trainId": 443},
-    {"name": "equipment", "id": 867, "trainId": 444},
-    {"name": "rocky formation", "id": 2145, "trainId": 445},
-    {"name": "plastic", "id": 1915, "trainId": 446},
-    {"name": "calendar", "id": 361, "trainId": 447},
-    {"name": "caravan", "id": 402, "trainId": 448},
-    {"name": "check-in-desk", "id": 482, "trainId": 449},
-    {"name": "ticket counter", "id": 2761, "trainId": 450},
-    {"name": "brush", "id": 300, "trainId": 451},
-    {"name": "mill", "id": 1554, "trainId": 452},
-    {"name": "covered bridge", "id": 636, "trainId": 453},
-    {"name": "bowling alley", "id": 260, "trainId": 454},
-    {"name": "hanger", "id": 1186, "trainId": 455},
-    {"name": "excavator", "id": 871, "trainId": 456},
-    {"name": "trestle", "id": 2859, "trainId": 457},
-    {"name": "revolving door", "id": 2103, "trainId": 458},
-    {"name": "blast furnace", "id": 208, "trainId": 459},
-    {"name": "scale, weighing machine", "id": 2236, "trainId": 460},
-    {"name": "projector", "id": 2012, "trainId": 461},
-    {"name": "soap", "id": 2462, "trainId": 462},
-    {"name": "locker", "id": 1462, "trainId": 463},
-    {"name": "tractor", "id": 2832, "trainId": 464},
-    {"name": "stretcher", "id": 2617, "trainId": 465},
-    {"name": "frame", "id": 1024, "trainId": 466},
-    {"name": "grating", "id": 1129, "trainId": 467},
-    {"name": "alembic", "id": 18, "trainId": 468},
-    {"name": "candle, taper, wax light", "id": 376, "trainId": 469},
-    {"name": "barrier", "id": 134, "trainId": 470},
-    {"name": "cardboard", "id": 407, "trainId": 471},
-    {"name": "cave", "id": 434, "trainId": 472},
-    {"name": "puddle", "id": 2017, "trainId": 473},
-    {"name": "tarp", "id": 2717, "trainId": 474},
-    {"name": "price tag", "id": 2005, "trainId": 475},
-    {"name": "watchtower", "id": 2993, "trainId": 476},
-    {"name": "meters", "id": 1545, "trainId": 477},
-    {
-        "name": "light bulb, lightbulb, bulb, incandescent lamp, electric light, electric-light bulb",
-        "id": 1445,
-        "trainId": 478,
-    },
-    {"name": "tracks", "id": 2831, "trainId": 479},
-    {"name": "hair dryer", "id": 1161, "trainId": 480},
-    {"name": "skirt", "id": 2411, "trainId": 481},
-    {"name": "viaduct", "id": 2949, "trainId": 482},
-    {"name": "paper towel", "id": 1769, "trainId": 483},
-    {"name": "coat", "id": 552, "trainId": 484},
-    {"name": "sheet", "id": 2327, "trainId": 485},
-    {"name": "fire extinguisher, extinguisher, asphyxiator", "id": 939, "trainId": 486},
-    {"name": "water wheel", "id": 3013, "trainId": 487},
-    {"name": "pottery, clayware", "id": 1986, "trainId": 488},
-    {"name": "magazine rack", "id": 1486, "trainId": 489},
-    {"name": "teapot", "id": 2723, "trainId": 490},
-    {"name": "microphone, mike", "id": 1549, "trainId": 491},
-    {"name": "support", "id": 2649, "trainId": 492},
-    {"name": "forklift", "id": 1020, "trainId": 493},
-    {"name": "canyon", "id": 392, "trainId": 494},
-    {"name": "cash register, register", "id": 422, "trainId": 495},
-    {"name": "leaf, leafage, foliage", "id": 1419, "trainId": 496},
-    {"name": "remote control, remote", "id": 2099, "trainId": 497},
-    {"name": "soap dish", "id": 2464, "trainId": 498},
-    {"name": "windshield, windscreen", "id": 3058, "trainId": 499},
-    {"name": "cat", "id": 430, "trainId": 500},
-    {"name": "cue, cue stick, pool cue, pool stick", "id": 675, "trainId": 501},
-    {"name": "vent, venthole, vent-hole, blowhole", "id": 2941, "trainId": 502},
-    {"name": "videos", "id": 2955, "trainId": 503},
-    {"name": "shovel", "id": 2355, "trainId": 504},
-    {"name": "eaves", "id": 840, "trainId": 505},
-    {"name": "antenna, aerial, transmitting aerial", "id": 32, "trainId": 506},
-    {"name": "shipyard", "id": 2338, "trainId": 507},
-    {"name": "hen, biddy", "id": 1232, "trainId": 508},
-    {"name": "traffic cone", "id": 2834, "trainId": 509},
-    {"name": "washing machines", "id": 2991, "trainId": 510},
-    {"name": "truck crane", "id": 2879, "trainId": 511},
-    {"name": "cds", "id": 444, "trainId": 512},
-    {"name": "niche", "id": 1657, "trainId": 513},
-    {"name": "scoreboard", "id": 2246, "trainId": 514},
-    {"name": "briefcase", "id": 296, "trainId": 515},
-    {"name": "boot", "id": 245, "trainId": 516},
-    {"name": "sweater, jumper", "id": 2661, "trainId": 517},
-    {"name": "hay", "id": 1202, "trainId": 518},
-    {"name": "pack", "id": 1714, "trainId": 519},
-    {"name": "bottle rack", "id": 251, "trainId": 520},
-    {"name": "glacier", "id": 1095, "trainId": 521},
-    {"name": "pergola", "id": 1828, "trainId": 522},
-    {"name": "building materials", "id": 311, "trainId": 523},
-    {"name": "television camera", "id": 2732, "trainId": 524},
-    {"name": "first floor", "id": 947, "trainId": 525},
-    {"name": "rifle", "id": 2115, "trainId": 526},
-    {"name": "tennis table", "id": 2738, "trainId": 527},
-    {"name": "stadium", "id": 2525, "trainId": 528},
-    {"name": "safety belt", "id": 2194, "trainId": 529},
-    {"name": "cover", "id": 634, "trainId": 530},
-    {"name": "dish rack", "id": 740, "trainId": 531},
-    {"name": "synthesizer", "id": 2682, "trainId": 532},
-    {"name": "pumpkin", "id": 2020, "trainId": 533},
-    {"name": "gutter", "id": 1156, "trainId": 534},
-    {"name": "fruit stand", "id": 1036, "trainId": 535},
-    {"name": "ice floe, floe", "id": 1295, "trainId": 536},
-    {"name": "handle, grip, handgrip, hold", "id": 1181, "trainId": 537},
-    {"name": "wheelchair", "id": 3037, "trainId": 538},
-    {"name": "mousepad, mouse mat", "id": 1614, "trainId": 539},
-    {"name": "diploma", "id": 736, "trainId": 540},
-    {"name": "fairground ride", "id": 893, "trainId": 541},
-    {"name": "radio", "id": 2047, "trainId": 542},
-    {"name": "hotplate", "id": 1274, "trainId": 543},
-    {"name": "junk", "id": 1361, "trainId": 544},
-    {"name": "wheelbarrow", "id": 3036, "trainId": 545},
-    {"name": "stream", "id": 2606, "trainId": 546},
-    {"name": "toll plaza", "id": 2797, "trainId": 547},
-    {"name": "punching bag", "id": 2022, "trainId": 548},
-    {"name": "trough", "id": 2876, "trainId": 549},
-    {"name": "throne", "id": 2758, "trainId": 550},
-    {"name": "chair desk", "id": 472, "trainId": 551},
-    {"name": "weighbridge", "id": 3028, "trainId": 552},
-    {"name": "extractor fan", "id": 882, "trainId": 553},
-    {"name": "hanging clothes", "id": 1189, "trainId": 554},
-    {"name": "dish, dish aerial, dish antenna, saucer", "id": 743, "trainId": 555},
-    {"name": "alarm clock, alarm", "id": 3122, "trainId": 556},
-    {"name": "ski lift", "id": 2401, "trainId": 557},
-    {"name": "chain", "id": 468, "trainId": 558},
-    {"name": "garage", "id": 1061, "trainId": 559},
-    {"name": "mechanical shovel", "id": 1523, "trainId": 560},
-    {"name": "wine rack", "id": 3059, "trainId": 561},
-    {"name": "tramway", "id": 2843, "trainId": 562},
-    {"name": "treadmill", "id": 2853, "trainId": 563},
-    {"name": "menu", "id": 1529, "trainId": 564},
-    {"name": "block", "id": 214, "trainId": 565},
-    {"name": "well", "id": 3032, "trainId": 566},
-    {"name": "witness stand", "id": 3071, "trainId": 567},
-    {"name": "branch", "id": 277, "trainId": 568},
-    {"name": "duck", "id": 813, "trainId": 569},
-    {"name": "casserole", "id": 426, "trainId": 570},
-    {"name": "frying pan", "id": 1039, "trainId": 571},
-    {"name": "desk organizer", "id": 727, "trainId": 572},
-    {"name": "mast", "id": 1508, "trainId": 573},
-    {"name": "spectacles, specs, eyeglasses, glasses", "id": 2490, "trainId": 574},
-    {"name": "service elevator", "id": 2299, "trainId": 575},
-    {"name": "dollhouse", "id": 768, "trainId": 576},
-    {"name": "hammock", "id": 1172, "trainId": 577},
-    {"name": "clothes hanging", "id": 537, "trainId": 578},
-    {"name": "photocopier", "id": 1847, "trainId": 579},
-    {"name": "notepad", "id": 1664, "trainId": 580},
-    {"name": "golf cart", "id": 1110, "trainId": 581},
-    {"name": "footpath", "id": 1014, "trainId": 582},
-    {"name": "cross", "id": 662, "trainId": 583},
-    {"name": "baptismal font", "id": 121, "trainId": 584},
-    {"name": "boiler", "id": 227, "trainId": 585},
-    {"name": "skip", "id": 2410, "trainId": 586},
-    {"name": "rotisserie", "id": 2165, "trainId": 587},
-    {"name": "tables", "id": 2696, "trainId": 588},
-    {"name": "water mill", "id": 3005, "trainId": 589},
-    {"name": "helmet", "id": 1231, "trainId": 590},
-    {"name": "cover curtain", "id": 635, "trainId": 591},
-    {"name": "brick", "id": 292, "trainId": 592},
-    {"name": "table runner", "id": 2690, "trainId": 593},
-    {"name": "ashtray", "id": 65, "trainId": 594},
-    {"name": "street box", "id": 2607, "trainId": 595},
-    {"name": "stick", "id": 2574, "trainId": 596},
-    {"name": "hangers", "id": 1188, "trainId": 597},
-    {"name": "cells", "id": 456, "trainId": 598},
-    {"name": "urinal", "id": 2913, "trainId": 599},
-    {"name": "centerpiece", "id": 459, "trainId": 600},
-    {"name": "portable fridge", "id": 1955, "trainId": 601},
-    {"name": "dvds", "id": 827, "trainId": 602},
-    {"name": "golf club", "id": 1111, "trainId": 603},
-    {"name": "skirting board", "id": 2412, "trainId": 604},
-    {"name": "water cooler", "id": 2997, "trainId": 605},
-    {"name": "clipboard", "id": 528, "trainId": 606},
-    {"name": "camera, photographic camera", "id": 366, "trainId": 607},
-    {"name": "pigeonhole", "id": 1863, "trainId": 608},
-    {"name": "chips", "id": 500, "trainId": 609},
-    {"name": "food processor", "id": 1001, "trainId": 610},
-    {"name": "post box", "id": 1958, "trainId": 611},
-    {"name": "lid", "id": 1441, "trainId": 612},
-    {"name": "drum", "id": 809, "trainId": 613},
-    {"name": "blender", "id": 210, "trainId": 614},
-    {"name": "cave entrance", "id": 435, "trainId": 615},
-    {"name": "dental chair", "id": 718, "trainId": 616},
-    {"name": "obelisk", "id": 1674, "trainId": 617},
-    {"name": "canoe", "id": 388, "trainId": 618},
-    {"name": "mobile", "id": 1572, "trainId": 619},
-    {"name": "monitors", "id": 1584, "trainId": 620},
-    {"name": "pool ball", "id": 1944, "trainId": 621},
-    {"name": "cue rack", "id": 674, "trainId": 622},
-    {"name": "baggage carts", "id": 99, "trainId": 623},
-    {"name": "shore", "id": 2352, "trainId": 624},
-    {"name": "fork", "id": 1019, "trainId": 625},
-    {"name": "paper filer", "id": 1763, "trainId": 626},
-    {"name": "bicycle rack", "id": 185, "trainId": 627},
-    {"name": "coat rack", "id": 554, "trainId": 628},
-    {"name": "garland", "id": 1066, "trainId": 629},
-    {"name": "sports bag", "id": 2508, "trainId": 630},
-    {"name": "fish tank", "id": 951, "trainId": 631},
-    {"name": "towel dispenser", "id": 2822, "trainId": 632},
-    {"name": "carriage", "id": 415, "trainId": 633},
-    {"name": "brochure", "id": 297, "trainId": 634},
-    {"name": "plaque", "id": 1914, "trainId": 635},
-    {"name": "stringer", "id": 2619, "trainId": 636},
-    {"name": "iron", "id": 1338, "trainId": 637},
-    {"name": "spoon", "id": 2505, "trainId": 638},
-    {"name": "flag pole", "id": 955, "trainId": 639},
-    {"name": "toilet brush", "id": 2786, "trainId": 640},
-    {"name": "book stand", "id": 238, "trainId": 641},
-    {"name": "water faucet, water tap, tap, hydrant", "id": 3000, "trainId": 642},
-    {"name": "ticket office", "id": 2763, "trainId": 643},
-    {"name": "broom", "id": 299, "trainId": 644},
-    {"name": "dvd", "id": 822, "trainId": 645},
-    {"name": "ice bucket", "id": 1288, "trainId": 646},
-    {"name": "carapace, shell, cuticle, shield", "id": 3101, "trainId": 647},
-    {"name": "tureen", "id": 2894, "trainId": 648},
-    {"name": "folders", "id": 992, "trainId": 649},
-    {"name": "chess", "id": 489, "trainId": 650},
-    {"name": "root", "id": 2157, "trainId": 651},
-    {"name": "sewing machine", "id": 2309, "trainId": 652},
-    {"name": "model", "id": 1576, "trainId": 653},
-    {"name": "pen", "id": 1810, "trainId": 654},
-    {"name": "violin", "id": 2964, "trainId": 655},
-    {"name": "sweatshirt", "id": 2662, "trainId": 656},
-    {"name": "recycling materials", "id": 2087, "trainId": 657},
-    {"name": "mitten", "id": 1569, "trainId": 658},
-    {"name": "chopping board, cutting board", "id": 503, "trainId": 659},
-    {"name": "mask", "id": 1505, "trainId": 660},
-    {"name": "log", "id": 1468, "trainId": 661},
-    {"name": "mouse, computer mouse", "id": 1613, "trainId": 662},
-    {"name": "grill", "id": 1138, "trainId": 663},
-    {"name": "hole", "id": 1256, "trainId": 664},
-    {"name": "target", "id": 2715, "trainId": 665},
-    {"name": "trash bag", "id": 2846, "trainId": 666},
-    {"name": "chalk", "id": 477, "trainId": 667},
-    {"name": "sticks", "id": 2576, "trainId": 668},
-    {"name": "balloon", "id": 108, "trainId": 669},
-    {"name": "score", "id": 2245, "trainId": 670},
-    {"name": "hair spray", "id": 1162, "trainId": 671},
-    {"name": "roll", "id": 2149, "trainId": 672},
-    {"name": "runner", "id": 2183, "trainId": 673},
-    {"name": "engine", "id": 858, "trainId": 674},
-    {"name": "inflatable glove", "id": 1324, "trainId": 675},
-    {"name": "games", "id": 1055, "trainId": 676},
-    {"name": "pallets", "id": 1741, "trainId": 677},
-    {"name": "baskets", "id": 149, "trainId": 678},
-    {"name": "coop", "id": 615, "trainId": 679},
-    {"name": "dvd player", "id": 825, "trainId": 680},
-    {"name": "rocking horse", "id": 2143, "trainId": 681},
-    {"name": "buckets", "id": 304, "trainId": 682},
-    {"name": "bread rolls", "id": 283, "trainId": 683},
-    {"name": "shawl", "id": 2322, "trainId": 684},
-    {"name": "watering can", "id": 3017, "trainId": 685},
-    {"name": "spotlights", "id": 2510, "trainId": 686},
-    {"name": "post-it", "id": 1960, "trainId": 687},
-    {"name": "bowls", "id": 265, "trainId": 688},
-    {"name": "security camera", "id": 2282, "trainId": 689},
-    {"name": "runner cloth", "id": 2184, "trainId": 690},
-    {"name": "lock", "id": 1461, "trainId": 691},
-    {"name": "alarm, warning device, alarm system", "id": 3113, "trainId": 692},
-    {"name": "side", "id": 2372, "trainId": 693},
-    {"name": "roulette", "id": 2166, "trainId": 694},
-    {"name": "bone", "id": 232, "trainId": 695},
-    {"name": "cutlery", "id": 693, "trainId": 696},
-    {"name": "pool balls", "id": 1945, "trainId": 697},
-    {"name": "wheels", "id": 3039, "trainId": 698},
-    {"name": "spice rack", "id": 2494, "trainId": 699},
-    {"name": "plant pots", "id": 1908, "trainId": 700},
-    {"name": "towel ring", "id": 2827, "trainId": 701},
-    {"name": "bread box", "id": 280, "trainId": 702},
-    {"name": "video", "id": 2950, "trainId": 703},
-    {"name": "funfair", "id": 1044, "trainId": 704},
-    {"name": "breads", "id": 288, "trainId": 705},
-    {"name": "tripod", "id": 2863, "trainId": 706},
-    {"name": "ironing board", "id": 1342, "trainId": 707},
-    {"name": "skimmer", "id": 2409, "trainId": 708},
-    {"name": "hollow", "id": 1258, "trainId": 709},
-    {"name": "scratching post", "id": 2249, "trainId": 710},
-    {"name": "tricycle", "id": 2862, "trainId": 711},
-    {"name": "file box", "id": 920, "trainId": 712},
-    {"name": "mountain pass", "id": 1607, "trainId": 713},
-    {"name": "tombstones", "id": 2802, "trainId": 714},
-    {"name": "cooker", "id": 610, "trainId": 715},
-    {"name": "card game, cards", "id": 3129, "trainId": 716},
-    {"name": "golf bag", "id": 1108, "trainId": 717},
-    {"name": "towel paper", "id": 2823, "trainId": 718},
-    {"name": "chaise lounge", "id": 476, "trainId": 719},
-    {"name": "sun", "id": 2641, "trainId": 720},
-    {"name": "toilet paper holder", "id": 2788, "trainId": 721},
-    {"name": "rake", "id": 2070, "trainId": 722},
-    {"name": "key", "id": 1368, "trainId": 723},
-    {"name": "umbrella stand", "id": 2903, "trainId": 724},
-    {"name": "dartboard", "id": 699, "trainId": 725},
-    {"name": "transformer", "id": 2844, "trainId": 726},
-    {"name": "fireplace utensils", "id": 942, "trainId": 727},
-    {"name": "sweatshirts", "id": 2663, "trainId": 728},
-    {
-        "name": "cellular telephone, cellular phone, cellphone, cell, mobile phone",
-        "id": 457,
-        "trainId": 729,
-    },
-    {"name": "tallboy", "id": 2701, "trainId": 730},
-    {"name": "stapler", "id": 2540, "trainId": 731},
-    {"name": "sauna", "id": 2231, "trainId": 732},
-    {"name": "test tube", "id": 2746, "trainId": 733},
-    {"name": "palette", "id": 1738, "trainId": 734},
-    {"name": "shopping carts", "id": 2350, "trainId": 735},
-    {"name": "tools", "id": 2808, "trainId": 736},
-    {"name": "push button, push, button", "id": 2025, "trainId": 737},
-    {"name": "star", "id": 2541, "trainId": 738},
-    {"name": "roof rack", "id": 2156, "trainId": 739},
-    {"name": "barbed wire", "id": 126, "trainId": 740},
-    {"name": "spray", "id": 2512, "trainId": 741},
-    {"name": "ear", "id": 831, "trainId": 742},
-    {"name": "sponge", "id": 2503, "trainId": 743},
-    {"name": "racket", "id": 2039, "trainId": 744},
-    {"name": "tins", "id": 2774, "trainId": 745},
-    {"name": "eyeglasses", "id": 886, "trainId": 746},
-    {"name": "file", "id": 919, "trainId": 747},
-    {"name": "scarfs", "id": 2240, "trainId": 748},
-    {"name": "sugar bowl", "id": 2636, "trainId": 749},
-    {"name": "flip flop", "id": 963, "trainId": 750},
-    {"name": "headstones", "id": 1218, "trainId": 751},
-    {"name": "laptop bag", "id": 1406, "trainId": 752},
-    {"name": "leash", "id": 1420, "trainId": 753},
-    {"name": "climbing frame", "id": 526, "trainId": 754},
-    {"name": "suit hanger", "id": 2639, "trainId": 755},
-    {"name": "floor spotlight", "id": 975, "trainId": 756},
-    {"name": "plate rack", "id": 1921, "trainId": 757},
-    {"name": "sewer", "id": 2305, "trainId": 758},
-    {"name": "hard drive", "id": 1193, "trainId": 759},
-    {"name": "sprinkler", "id": 2517, "trainId": 760},
-    {"name": "tools box", "id": 2809, "trainId": 761},
-    {"name": "necklace", "id": 1647, "trainId": 762},
-    {"name": "bulbs", "id": 314, "trainId": 763},
-    {"name": "steel industry", "id": 2560, "trainId": 764},
-    {"name": "club", "id": 545, "trainId": 765},
-    {"name": "jack", "id": 1345, "trainId": 766},
-    {"name": "door bars", "id": 775, "trainId": 767},
-    {
-        "name": "control panel, instrument panel, control board, board, panel",
-        "id": 603,
-        "trainId": 768,
-    },
-    {"name": "hairbrush", "id": 1163, "trainId": 769},
-    {"name": "napkin holder", "id": 1641, "trainId": 770},
-    {"name": "office", "id": 1678, "trainId": 771},
-    {"name": "smoke detector", "id": 2450, "trainId": 772},
-    {"name": "utensils", "id": 2915, "trainId": 773},
-    {"name": "apron", "id": 42, "trainId": 774},
-    {"name": "scissors", "id": 2242, "trainId": 775},
-    {"name": "terminal", "id": 2741, "trainId": 776},
-    {"name": "grinder", "id": 1143, "trainId": 777},
-    {"name": "entry phone", "id": 862, "trainId": 778},
-    {"name": "newspaper stand", "id": 1654, "trainId": 779},
-    {"name": "pepper shaker", "id": 1826, "trainId": 780},
-    {"name": "onions", "id": 1689, "trainId": 781},
-    {
-        "name": "central processing unit, cpu, c p u , central processor, processor, mainframe",
-        "id": 3124,
-        "trainId": 782,
-    },
-    {"name": "tape", "id": 2710, "trainId": 783},
-    {"name": "bat", "id": 152, "trainId": 784},
-    {"name": "coaster", "id": 549, "trainId": 785},
-    {"name": "calculator", "id": 360, "trainId": 786},
-    {"name": "potatoes", "id": 1982, "trainId": 787},
-    {"name": "luggage rack", "id": 1478, "trainId": 788},
-    {"name": "salt", "id": 2203, "trainId": 789},
-    {"name": "street number", "id": 2612, "trainId": 790},
-    {"name": "viewpoint", "id": 2956, "trainId": 791},
-    {"name": "sword", "id": 2681, "trainId": 792},
-    {"name": "cd", "id": 437, "trainId": 793},
-    {"name": "rowing machine", "id": 2171, "trainId": 794},
-    {"name": "plug", "id": 1933, "trainId": 795},
-    {"name": "andiron, firedog, dog, dog-iron", "id": 3110, "trainId": 796},
-    {"name": "pepper", "id": 1824, "trainId": 797},
-    {"name": "tongs", "id": 2803, "trainId": 798},
-    {"name": "bonfire", "id": 234, "trainId": 799},
-    {"name": "dog dish", "id": 764, "trainId": 800},
-    {"name": "belt", "id": 177, "trainId": 801},
-    {"name": "dumbbells", "id": 817, "trainId": 802},
-    {"name": "videocassette recorder, vcr", "id": 3145, "trainId": 803},
-    {"name": "hook", "id": 1262, "trainId": 804},
-    {"name": "envelopes", "id": 864, "trainId": 805},
-    {"name": "shower faucet", "id": 2359, "trainId": 806},
-    {"name": "watch", "id": 2992, "trainId": 807},
-    {"name": "padlock", "id": 1725, "trainId": 808},
-    {"name": "swimming pool ladder", "id": 2667, "trainId": 809},
-    {"name": "spanners", "id": 2484, "trainId": 810},
-    {"name": "gravy boat", "id": 1133, "trainId": 811},
-    {"name": "notice board", "id": 1667, "trainId": 812},
-    {"name": "trash bags", "id": 2847, "trainId": 813},
-    {"name": "fire alarm", "id": 932, "trainId": 814},
-    {"name": "ladle", "id": 1392, "trainId": 815},
-    {"name": "stethoscope", "id": 2573, "trainId": 816},
-    {"name": "rocket", "id": 2140, "trainId": 817},
-    {"name": "funnel", "id": 1046, "trainId": 818},
-    {"name": "bowling pins", "id": 264, "trainId": 819},
-    {"name": "valve", "id": 2927, "trainId": 820},
-    {"name": "thermometer", "id": 2752, "trainId": 821},
-    {"name": "cups", "id": 679, "trainId": 822},
-    {"name": "spice jar", "id": 2493, "trainId": 823},
-    {"name": "night light", "id": 1658, "trainId": 824},
-    {"name": "soaps", "id": 2466, "trainId": 825},
-    {"name": "games table", "id": 1057, "trainId": 826},
-    {"name": "slotted spoon", "id": 2444, "trainId": 827},
-    {"name": "reel", "id": 2093, "trainId": 828},
-    {"name": "scourer", "id": 2248, "trainId": 829},
-    {"name": "sleeping robe", "id": 2432, "trainId": 830},
-    {"name": "desk mat", "id": 726, "trainId": 831},
-    {"name": "dumbbell", "id": 816, "trainId": 832},
-    {"name": "hammer", "id": 1171, "trainId": 833},
-    {"name": "tie", "id": 2766, "trainId": 834},
-    {"name": "typewriter", "id": 2900, "trainId": 835},
-    {"name": "shaker", "id": 2313, "trainId": 836},
-    {"name": "cheese dish", "id": 488, "trainId": 837},
-    {"name": "sea star", "id": 2265, "trainId": 838},
-    {"name": "racquet", "id": 2043, "trainId": 839},
-    {"name": "butane gas cylinder", "id": 332, "trainId": 840},
-    {"name": "paper weight", "id": 1771, "trainId": 841},
-    {"name": "shaving brush", "id": 2320, "trainId": 842},
-    {"name": "sunglasses", "id": 2646, "trainId": 843},
-    {"name": "gear shift", "id": 1089, "trainId": 844},
-    {"name": "towel rail", "id": 2826, "trainId": 845},
-    {"name": "adding machine, totalizer, totaliser", "id": 3148, "trainId": 846},
-]
-def loadAde20K(file):
-    fileseg = file.replace(".jpg", "_seg.png")
-    with Image.open(fileseg) as io:
-        seg = np.array(io)
-    R = seg[:, :, 0]
-    G = seg[:, :, 1]
-    ObjectClassMasks = (R / 10).astype(np.int32) * 256 + (G.astype(np.int32))
-    return {"img_name": file, "segm_name": fileseg, "class_mask": ObjectClassMasks}
-if __name__ == "__main__":
-    dataset_dir = Path(os.getenv("DETECTRON2_DATASETS", "datasets"))
-    index_file = dataset_dir / "ADE20K_2021_17_01" / "index_ade20k.pkl"
-    print('Caution: we only generate the validation set!')
-    with open(index_file, "rb") as f:
-        index_ade20k = pkl.load(f)
-    id_map = {}
-    for cat in ADE20K_SEM_SEG_FULL_CATEGORIES:
-        id_map[cat["id"]] = cat["trainId"]
-    # make output dir
-    for name in ["training", "validation"]:
-        image_dir = dataset_dir / "ADE20K_2021_17_01" / "images_detectron2" / name
-        image_dir.mkdir(parents=True, exist_ok=True)
-        annotation_dir = dataset_dir / "ADE20K_2021_17_01" / "annotations_detectron2" / name
-        annotation_dir.mkdir(parents=True, exist_ok=True)
-    # process image and gt
-    for i, (folder_name, file_name) in tqdm.tqdm(
-        enumerate(zip(index_ade20k["folder"], index_ade20k["filename"])),
-        total=len(index_ade20k["filename"]),
-    ):
-        split = "validation" if file_name.split("_")[1] == "val" else "training"
-        if split == 'training':
-            # FIXME: If you want to generate training set, delete this condition
-            continue
-        info = loadAde20K(str(dataset_dir / folder_name / file_name))
-        # resize image and label
-        img = np.asarray(Image.open(info["img_name"]))
-        lab = np.asarray(info["class_mask"])
-        h, w = img.shape[0], img.shape[1]
-        max_size = 512
-        resize = True
-        if w >= h > max_size:
-            h_new, w_new = max_size, round(w / float(h) * max_size)
-        elif h >= w > max_size:
-            h_new, w_new = round(h / float(w) * max_size), max_size
-        else:
-            resize = False
-        if resize:
-            img = cv2.resize(img, (w_new, h_new), interpolation=cv2.INTER_LINEAR)
-            lab = cv2.resize(lab, (w_new, h_new), interpolation=cv2.INTER_NEAREST)
-        assert img.dtype == np.uint8
-        assert lab.dtype == np.int32
-        # apply label conversion and save into uint16 images
-        output = np.zeros_like(lab, dtype=np.uint16) + 65535
-        for obj_id in np.unique(lab):
-            if obj_id in id_map:
-                output[lab == obj_id] = id_map[obj_id]
-        output_img = dataset_dir / "ADE20K_2021_17_01" / "images_detectron2" / split / file_name
-        output_lab = (
-            dataset_dir
-            / "ADE20K_2021_17_01"
-            / "annotations_detectron2"
-            / split
-            / file_name.replace(".jpg", ".tif")
-        )
-        Image.fromarray(img).save(output_img)
-        assert output.dtype == np.uint16
-        Image.fromarray(output).save(output_lab)

datasets/prepare_ade20k_sem_seg.py DELETED Viewed

@@ -1,35 +0,0 @@
-# Copyright (c) Facebook, Inc. and its affiliates.
-# Copyright (c) Meta Platforms, Inc. All Rights Reserved
-import os
-from pathlib import Path
-import numpy as np
-import tqdm
-from PIL import Image
-def convert(input, output, index=None):
-    img = np.asarray(Image.open(input))
-    assert img.dtype == np.uint8
-    img = img - 1  # 0 (ignore) becomes 255. others are shifted by 1
-    if index is not None:
-        mapping = {i: k for k, i in enumerate(index)}
-        img = np.vectorize(lambda x: mapping[x] if x in mapping else 255)(
-            img.astype(np.float)
-        ).astype(np.uint8)
-    Image.fromarray(img).save(output)
-if __name__ == "__main__":
-    dataset_dir = (
-        Path(os.getenv("DETECTRON2_DATASETS", "datasets")) / "ADEChallengeData2016"
-    )
-    print('Caution: we only generate the validation set!')
-    for name in ["validation"]:
-        annotation_dir = dataset_dir / "annotations" / name
-        output_dir = dataset_dir / "annotations_detectron2" / name
-        output_dir.mkdir(parents=True, exist_ok=True)
-        for file in tqdm.tqdm(list(annotation_dir.iterdir())):
-            output_file = output_dir / file.name
-            convert(file, output_file)

datasets/prepare_coco_stuff_sem_seg.py DELETED Viewed

@@ -1,219 +0,0 @@
-# Copyright (c) Facebook, Inc. and its affiliates.
-# Copyright (c) Meta Platforms, Inc. All Rights Reserved
-# Modified by Feng Liang from
-# https://github.com/MendelXu/zsseg.baseline/blob/master/datasets/prepare_coco_stuff_164k_sem_seg.py
-import os
-import os.path as osp
-from pathlib import Path
-import tqdm
-from glob import glob
-import numpy as np
-from PIL import Image
-full_clsID_to_trID = {
-    0: 0,
-    1: 1,
-    2: 2,
-    3: 3,
-    4: 4,
-    5: 5,
-    6: 6,
-    7: 7,
-    8: 8,
-    9: 9,
-    10: 10,
-    12: 11,
-    13: 12,
-    14: 13,
-    15: 14,
-    16: 15,
-    17: 16,
-    18: 17,
-    19: 18,
-    20: 19,
-    21: 20,
-    22: 21,
-    23: 22,
-    24: 23,
-    26: 24,
-    27: 25,
-    30: 26,
-    31: 27,
-    32: 28,
-    33: 29,
-    34: 30,
-    35: 31,
-    36: 32,
-    37: 33,
-    38: 34,
-    39: 35,
-    40: 36,
-    41: 37,
-    42: 38,
-    43: 39,
-    45: 40,
-    46: 41,
-    47: 42,
-    48: 43,
-    49: 44,
-    50: 45,
-    51: 46,
-    52: 47,
-    53: 48,
-    54: 49,
-    55: 50,
-    56: 51,
-    57: 52,
-    58: 53,
-    59: 54,
-    60: 55,
-    61: 56,
-    62: 57,
-    63: 58,
-    64: 59,
-    66: 60,
-    69: 61,
-    71: 62,
-    72: 63,
-    73: 64,
-    74: 65,
-    75: 66,
-    76: 67,
-    77: 68,
-    78: 69,
-    79: 70,
-    80: 71,
-    81: 72,
-    83: 73,
-    84: 74,
-    85: 75,
-    86: 76,
-    87: 77,
-    88: 78,
-    89: 79,
-    91: 80,
-    92: 81,
-    93: 82,
-    94: 83,
-    95: 84,
-    96: 85,
-    97: 86,
-    98: 87,
-    99: 88,
-    100: 89,
-    101: 90,
-    102: 91,
-    103: 92,
-    104: 93,
-    105: 94,
-    106: 95,
-    107: 96,
-    108: 97,
-    109: 98,
-    110: 99,
-    111: 100,
-    112: 101,
-    113: 102,
-    114: 103,
-    115: 104,
-    116: 105,
-    117: 106,
-    118: 107,
-    119: 108,
-    120: 109,
-    121: 110,
-    122: 111,
-    123: 112,
-    124: 113,
-    125: 114,
-    126: 115,
-    127: 116,
-    128: 117,
-    129: 118,
-    130: 119,
-    131: 120,
-    132: 121,
-    133: 122,
-    134: 123,
-    135: 124,
-    136: 125,
-    137: 126,
-    138: 127,
-    139: 128,
-    140: 129,
-    141: 130,
-    142: 131,
-    143: 132,
-    144: 133,
-    145: 134,
-    146: 135,
-    147: 136,
-    148: 137,
-    149: 138,
-    150: 139,
-    151: 140,
-    152: 141,
-    153: 142,
-    154: 143,
-    155: 144,
-    156: 145,
-    157: 146,
-    158: 147,
-    159: 148,
-    160: 149,
-    161: 150,
-    162: 151,
-    163: 152,
-    164: 153,
-    165: 154,
-    166: 155,
-    167: 156,
-    168: 157,
-    169: 158,
-    170: 159,
-    171: 160,
-    172: 161,
-    173: 162,
-    174: 163,
-    175: 164,
-    176: 165,
-    177: 166,
-    178: 167,
-    179: 168,
-    180: 169,
-    181: 170,
-    255: 255,
-}
-def convert_to_trainID(
-    maskpath, out_mask_dir, is_train, clsID_to_trID=full_clsID_to_trID, suffix=""
-):
-    mask = np.array(Image.open(maskpath))
-    mask_copy = np.ones_like(mask, dtype=np.uint8) * 255
-    for clsID, trID in clsID_to_trID.items():
-        mask_copy[mask == clsID] = trID
-    seg_filename = (
-        osp.join(out_mask_dir, "train2017" + suffix, osp.basename(maskpath))
-        if is_train
-        else osp.join(out_mask_dir, "val2017" + suffix, osp.basename(maskpath))
-    )
-    if len(np.unique(mask_copy)) == 1 and np.unique(mask_copy)[0] == 255:
-        return
-    Image.fromarray(mask_copy).save(seg_filename, "PNG")
-if __name__ == "__main__":
-    dataset_dir = Path(os.getenv("DETECTRON2_DATASETS", "datasets"))
-    print('Caution: we only generate the training set!')
-    coco_path = dataset_dir / "coco"
-    mask_dir = coco_path / "stuffthingmaps"
-    out_mask_dir = coco_path / "stuffthingmaps_detectron2"
-    for name in ["train2017"]:
-        os.makedirs((out_mask_dir / name), exist_ok=True)
-        train_list = glob(osp.join(mask_dir, "train2017", "*.png"))
-        for file in tqdm.tqdm(train_list):
-            convert_to_trainID(file, out_mask_dir, is_train=True)

datasets/prepare_pascal_context.py DELETED Viewed

@@ -1,69 +0,0 @@
-# Copyright (c) Facebook, Inc. and its affiliates.
-# Copyright (c) Meta Platforms, Inc. All Rights Reserved
-import tqdm
-import os
-import os.path as osp
-from pathlib import Path
-import numpy as np
-from PIL import Image
-import scipy.io
-def convert_pc59(mask_path, new_mask_path, pc59_dict):
-    mat = scipy.io.loadmat(mask_path)
-    mask = mat['LabelMap']
-    mask_copy = np.ones_like(mask, dtype=np.uint8) * 255
-    for trID, clsID in pc59_dict.items():
-        mask_copy[mask == clsID] = trID
-    min_value = np.amin(mask_copy)
-    assert min_value >= 0, print(min_value)
-    Image.fromarray(mask_copy).save(new_mask_path, "PNG")
-def convert_pc459(mask_path, new_mask_path):
-    mat = scipy.io.loadmat(mask_path)
-    mask = mat['LabelMap']
-    mask = mask - 1
-    min_value = np.amin(mask)
-    assert min_value >= 0, print(min_value)
-    Image.fromarray(mask).save(new_mask_path, "TIFF")
-if __name__ == "__main__":
-    dataset_dir = Path(os.getenv("DETECTRON2_DATASETS", "datasets"))
-    print('Caution: we only generate the validation set!')
-    pc_path = dataset_dir / "VOCdevkit/VOC2010"
-    val_list = open(pc_path / "pascalcontext_val.txt", "r")
-    pc459_labels = open(pc_path / "labels.txt", "r")
-    pc59_labels = open(pc_path / "59_labels.txt", "r")
-    pc459_dict = {}
-    for line in pc459_labels.readlines():
-        if ':' in line:
-            idx, name = line.split(':')
-            idx = int(idx.strip())
-            name = name.strip()
-            pc459_dict[name] = idx
-    pc59_dict = {}
-    for i, line in enumerate(pc59_labels.readlines()):
-        name = line.split(':')[-1].strip()
-        if name is not '':
-            pc59_dict[i] = pc459_dict[name]
-    pc459_dir = pc_path / "annotations_detectron2" / "pc459_val"
-    pc459_dir.mkdir(parents=True, exist_ok=True)
-    pc59_dir = pc_path / "annotations_detectron2" / "pc59_val"
-    pc59_dir.mkdir(parents=True, exist_ok=True)
-    for line in tqdm.tqdm(val_list.readlines()):
-        fileid = line.strip()
-        ori_mask = f'{pc_path}/trainval/{fileid}.mat'
-        pc459_dst = f'{pc459_dir}/{fileid}.tif'
-        pc59_dst = f'{pc59_dir}/{fileid}.png'
-        if osp.exists(ori_mask):
-            convert_pc459(ori_mask, pc459_dst)
-            convert_pc59(ori_mask, pc59_dst, pc59_dict)

datasets/prepare_voc_sem_seg.py DELETED Viewed

@@ -1,71 +0,0 @@
-# Copyright (c) Facebook, Inc. and its affiliates.
-# Copyright (c) Meta Platforms, Inc. All Rights Reserved
-# Modified by Feng Liang from https://github.com/MendelXu/zsseg.baseline/blob/master/datasets/prepare_voc_sem_seg.py
-import os
-import os.path as osp
-from pathlib import Path
-import tqdm
-import numpy as np
-from PIL import Image
-clsID_to_trID = {
-    0: 255,
-    1: 0,
-    2: 1,
-    3: 2,
-    4: 3,
-    5: 4,
-    6: 5,
-    7: 6,
-    8: 7,
-    9: 8,
-    10: 9,
-    11: 10,
-    12: 11,
-    13: 12,
-    14: 13,
-    15: 14,
-    16: 15,
-    17: 16,
-    18: 17,
-    19: 18,
-    20: 19,
-    255: 255,
-}
-def convert_to_trainID(
-    maskpath, out_mask_dir, is_train, clsID_to_trID=clsID_to_trID, suffix=""
-):
-    mask = np.array(Image.open(maskpath))
-    mask_copy = np.ones_like(mask, dtype=np.uint8) * 255
-    for clsID, trID in clsID_to_trID.items():
-        mask_copy[mask == clsID] = trID
-    seg_filename = (
-        osp.join(out_mask_dir, "train" + suffix, osp.basename(maskpath))
-        if is_train
-        else osp.join(out_mask_dir, "val" + suffix, osp.basename(maskpath))
-    )
-    if len(np.unique(mask_copy)) == 1 and np.unique(mask_copy)[0] == 255:
-        return
-    Image.fromarray(mask_copy).save(seg_filename, "PNG")
-if __name__ == "__main__":
-    dataset_dir = Path(os.getenv("DETECTRON2_DATASETS", "datasets"))
-    print('Caution: we only generate the validation set!')
-    voc_path = dataset_dir / "VOCdevkit" / "VOC2012"
-    out_mask_dir = voc_path / "annotations_detectron2"
-    out_image_dir = voc_path / "images_detectron2"
-    for name in ["val"]:
-        os.makedirs((out_mask_dir / name), exist_ok=True)
-        os.makedirs((out_image_dir / name), exist_ok=True)
-        val_list = [
-            osp.join(voc_path, "SegmentationClassAug", f + ".png")
-            for f in np.loadtxt(osp.join(voc_path, "ImageSets/Segmentation/val.txt"), dtype=np.str).tolist()
-        ]
-        for file in tqdm.tqdm(val_list):
-            convert_to_trainID(file, out_mask_dir, is_train=False)

open_vocab_seg/.DS_Store CHANGED Viewed

Binary files a/open_vocab_seg/.DS_Store and b/open_vocab_seg/.DS_Store differ

open_vocab_seg/modeling/.DS_Store CHANGED Viewed

Binary files a/open_vocab_seg/modeling/.DS_Store and b/open_vocab_seg/modeling/.DS_Store differ

open_vocab_seg/modeling/clip_adapter/__init__.py CHANGED Viewed

@@ -21,3 +21,5 @@ def build_text_prompt(cfg):
             "Prompt learner {} is not supported".format(cfg.TEXT_TEMPLATES)
         )
     return text_templates

             "Prompt learner {} is not supported".format(cfg.TEXT_TEMPLATES)
         )
     return text_templates
+from .clip import tokenize

open_vocab_seg/modeling/clip_adapter/clip/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from .clip import *

open_vocab_seg/modeling/clip_adapter/clip/bpe_simple_vocab_16e6.txt.gz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:924691ac288e54409236115652ad4aa250f48203de50a9e4722a6ecd48d6804a
+size 1356917

open_vocab_seg/modeling/clip_adapter/clip/clip.py ADDED Viewed

	@@ -0,0 +1,285 @@

+import hashlib
+import os
+import urllib
+import warnings
+from collections import OrderedDict
+from typing import Union, List
+import torch
+from PIL import Image
+from torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize
+from tqdm import tqdm
+from .model import build_model
+from .simple_tokenizer import SimpleTokenizer as _Tokenizer
+try:
+    from torchvision.transforms import InterpolationMode
+    BICUBIC = InterpolationMode.BICUBIC
+except ImportError:
+    BICUBIC = Image.BICUBIC
+if torch.__version__.split(".") < ["1", "7", "1"]:
+    warnings.warn("PyTorch version 1.7.1 or higher is recommended")
+__all__ = ["available_models", "load", "tokenize"]
+_tokenizer = _Tokenizer()
+_MODELS = {
+    "RN50": "https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt",
+    "RN101": "https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt",
+    "RN50x4": "https://openaipublic.azureedge.net/clip/models/7e526bd135e493cef0776de27d5f42653e6b4c8bf9e0f653bb11773263205fdd/RN50x4.pt",
+    "RN50x16": "https://openaipublic.azureedge.net/clip/models/52378b407f34354e150460fe41077663dd5b39c54cd0bfd2b27167a4a06ec9aa/RN50x16.pt",
+    "ViT-B/32": "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt",
+    "ViT-B/16": "https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt",
+    "ViT-L/14": "https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt",
+    "ViT-L/14@336px": "https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt",
+}
+def _download(url: str, root: str = os.path.expanduser("~/.cache/clip")):
+    os.makedirs(root, exist_ok=True)
+    filename = os.path.basename(url)
+    expected_sha256 = url.split("/")[-2]
+    download_target = os.path.join(root, filename)
+    if os.path.exists(download_target) and not os.path.isfile(download_target):
+        raise RuntimeError(f"{download_target} exists and is not a regular file")
+    if os.path.isfile(download_target):
+        if (
+            hashlib.sha256(open(download_target, "rb").read()).hexdigest()
+            == expected_sha256
+        ):
+            return download_target
+        else:
+            warnings.warn(
+                f"{download_target} exists, but the SHA256 checksum does not match; re-downloading the file"
+            )
+    with urllib.request.urlopen(url) as source, open(download_target, "wb") as output:
+        with tqdm(
+            total=int(source.info().get("Content-Length")),
+            ncols=80,
+            unit="iB",
+            unit_scale=True,
+        ) as loop:
+            while True:
+                buffer = source.read(8192)
+                if not buffer:
+                    break
+                output.write(buffer)
+                loop.update(len(buffer))
+    if (
+        hashlib.sha256(open(download_target, "rb").read()).hexdigest()
+        != expected_sha256
+    ):
+        raise RuntimeError(
+            f"Model has been downloaded but the SHA256 checksum does not not match"
+        )
+    return download_target
+def _transform(n_px):
+    return Compose(
+        [
+            Resize(n_px, interpolation=BICUBIC),
+            CenterCrop(n_px),
+            lambda image: image.convert("RGB"),
+            ToTensor(),
+            Normalize(
+                (0.48145466, 0.4578275, 0.40821073),
+                (0.26862954, 0.26130258, 0.27577711),
+            ),
+        ]
+    )
+def available_models() -> List[str]:
+    """Returns the names of available CLIP models"""
+    return list(_MODELS.keys())
+def load(
+    name: str,
+    mask_prompt_depth: int = 0,
+    device: Union[str, torch.device] = "cuda" if torch.cuda.is_available() else "cpu",
+    jit=False,
+):
+    """Load a CLIP model
+    Parameters
+    ----------
+    name : str
+        A model name listed by `clip.available_models()`, or the path to a model checkpoint containing the state_dict
+    device : Union[str, torch.device]
+        The device to put the loaded model
+    jit : bool
+        Whether to load the optimized JIT model or more hackable non-JIT model (default).
+    Returns
+    -------
+    model : torch.nn.Module
+        The CLIP model
+    preprocess : Callable[[PIL.Image], torch.Tensor]
+        A torchvision transform that converts a PIL image into a tensor that the returned model can take as its input
+    """
+    if name in _MODELS:
+        model_path = _download(_MODELS[name])
+    elif os.path.isfile(name):
+        model_path = name
+    else:
+        raise RuntimeError(
+            f"Model {name} not found; available models = {available_models()}"
+        )
+    try:
+        # loading JIT archive
+        model = torch.jit.load(model_path, map_location=device if jit else "cpu").eval()
+        state_dict = None
+    except RuntimeError:
+        # loading saved state dict
+        if jit:
+            warnings.warn(
+                f"File {model_path} is not a JIT archive. Loading as a state dict instead"
+            )
+            jit = False
+        state_dict = torch.load(model_path, map_location="cpu")
+        if 'state_dict' in state_dict:
+            new_state_dict = OrderedDict()
+            for k, v in state_dict['state_dict'].items():
+                if k.startswith('module.'):
+                    name = k[7:]  # remove `module.`
+                    new_state_dict[name] = v
+            state_dict = new_state_dict
+    if not jit:
+        model = build_model(state_dict or model.state_dict(), mask_prompt_depth).to(device)
+        if str(device) == "cpu":
+            model.float()
+        return model, _transform(model.visual.input_resolution)
+    # patch the device names
+    device_holder = torch.jit.trace(
+        lambda: torch.ones([]).to(torch.device(device)), example_inputs=[]
+    )
+    device_node = [
+        n
+        for n in device_holder.graph.findAllNodes("prim::Constant")
+        if "Device" in repr(n)
+    ][-1]
+    def patch_device(module):
+        try:
+            graphs = [module.graph] if hasattr(module, "graph") else []
+        except RuntimeError:
+            graphs = []
+        if hasattr(module, "forward1"):
+            graphs.append(module.forward1.graph)
+        for graph in graphs:
+            for node in graph.findAllNodes("prim::Constant"):
+                if "value" in node.attributeNames() and str(node["value"]).startswith(
+                    "cuda"
+                ):
+                    node.copyAttributes(device_node)
+    model.apply(patch_device)
+    patch_device(model.encode_image)
+    patch_device(model.encode_text)
+    # patch dtype to float32 on CPU
+    if str(device) == "cpu":
+        float_holder = torch.jit.trace(
+            lambda: torch.ones([]).float(), example_inputs=[]
+        )
+        float_input = list(float_holder.graph.findNode("aten::to").inputs())[1]
+        float_node = float_input.node()
+        def patch_float(module):
+            try:
+                graphs = [module.graph] if hasattr(module, "graph") else []
+            except RuntimeError:
+                graphs = []
+            if hasattr(module, "forward1"):
+                graphs.append(module.forward1.graph)
+            for graph in graphs:
+                for node in graph.findAllNodes("aten::to"):
+                    inputs = list(node.inputs())
+                    for i in [
+                        1,
+                        2,
+                    ]:  # dtype can be the second or third argument to aten::to()
+                        if inputs[i].node()["value"] == 5:
+                            inputs[i].node().copyAttributes(float_node)
+        model.apply(patch_float)
+        patch_float(model.encode_image)
+        patch_float(model.encode_text)
+        model.float()
+    return model, _transform(model.input_resolution.item())
+def tokenize(
+    texts: Union[str, List[str]],
+    context_length: int = 77,
+    truncate: bool = False,
+    return_length: bool = False,
+) -> torch.LongTensor:
+    """
+    Returns the tokenized representation of given input string(s)
+    Parameters
+    ----------
+    texts : Union[str, List[str]]
+        An input string or a list of input strings to tokenize
+    context_length : int
+        The context length to use; all CLIP models use 77 as the context length
+    truncate: bool
+        Whether to truncate the text in case its encoding is longer than the context length
+    Returns
+    -------
+    A two-dimensional tensor containing the resulting tokens, shape = [number of input strings, context_length]
+    """
+    if isinstance(texts, str):
+        texts = [texts]
+    sot_token = _tokenizer.encoder["<|startoftext|>"]
+    eot_token = _tokenizer.encoder["<|endoftext|>"]
+    all_tokens = [[sot_token] + _tokenizer.encode(text) + [eot_token] for text in texts]
+    result = torch.zeros(len(all_tokens), context_length, dtype=torch.long)
+    length = []
+    for i, tokens in enumerate(all_tokens):
+        if len(tokens) > context_length:
+            if truncate:
+                tokens = tokens[:context_length]
+                tokens[-1] = eot_token
+                length.append(context_length)
+            else:
+                raise RuntimeError(
+                    f"Input {texts[i]} is too long for context length {context_length}"
+                )
+        else:
+            length.append(len(tokens))
+        result[i, : len(tokens)] = torch.tensor(tokens)
+    if return_length:
+        return result, length
+    return result

open_vocab_seg/modeling/clip_adapter/clip/model.py ADDED Viewed

	@@ -0,0 +1,613 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+# Copyright (c) Meta Platforms, Inc. All Rights Reserved
+# Modified by Feng Liang from https://github.com/openai/CLIP/blob/main/clip/model.py
+from collections import OrderedDict
+from typing import Tuple, Union
+import numpy as np
+import torch
+import torch.nn.functional as F
+from torch import nn
+class Bottleneck(nn.Module):
+    expansion = 4
+    def __init__(self, inplanes, planes, stride=1):
+        super().__init__()
+        # all conv layers have stride 1. an avgpool is performed after the second convolution when stride > 1
+        self.conv1 = nn.Conv2d(inplanes, planes, 1, bias=False)
+        self.bn1 = nn.BatchNorm2d(planes)
+        self.conv2 = nn.Conv2d(planes, planes, 3, padding=1, bias=False)
+        self.bn2 = nn.BatchNorm2d(planes)
+        self.avgpool = nn.AvgPool2d(stride) if stride > 1 else nn.Identity()
+        self.conv3 = nn.Conv2d(planes, planes * self.expansion, 1, bias=False)
+        self.bn3 = nn.BatchNorm2d(planes * self.expansion)
+        self.relu = nn.ReLU(inplace=True)
+        self.downsample = None
+        self.stride = stride
+        if stride > 1 or inplanes != planes * Bottleneck.expansion:
+            # downsampling layer is prepended with an avgpool, and the subsequent convolution has stride 1
+            self.downsample = nn.Sequential(
+                OrderedDict(
+                    [
+                        ("-1", nn.AvgPool2d(stride)),
+                        (
+                            "0",
+                            nn.Conv2d(
+                                inplanes,
+                                planes * self.expansion,
+                                1,
+                                stride=1,
+                                bias=False,
+                            ),
+                        ),
+                        ("1", nn.BatchNorm2d(planes * self.expansion)),
+                    ]
+                )
+            )
+    def forward(self, x: torch.Tensor):
+        identity = x
+        out = self.relu(self.bn1(self.conv1(x)))
+        out = self.relu(self.bn2(self.conv2(out)))
+        out = self.avgpool(out)
+        out = self.bn3(self.conv3(out))
+        if self.downsample is not None:
+            identity = self.downsample(x)
+        out += identity
+        out = self.relu(out)
+        return out
+class AttentionPool2d(nn.Module):
+    def __init__(
+        self, spacial_dim: int, embed_dim: int, num_heads: int, output_dim: int = None
+    ):
+        super().__init__()
+        self.positional_embedding = nn.Parameter(
+            torch.randn(spacial_dim ** 2 + 1, embed_dim) / embed_dim ** 0.5
+        )
+        self.k_proj = nn.Linear(embed_dim, embed_dim)
+        self.q_proj = nn.Linear(embed_dim, embed_dim)
+        self.v_proj = nn.Linear(embed_dim, embed_dim)
+        self.c_proj = nn.Linear(embed_dim, output_dim or embed_dim)
+        self.num_heads = num_heads
+        self.grid_size = spacial_dim
+    def forward(self, x, mask=None, return_cls=True):
+        b, c, gh, gw = x.shape
+        # remove irrelated feature
+        if mask is not None:
+            mask = F.interpolate(mask[:, None, ...], size=(gh, gw)).squeeze(
+                1
+            )  # [N,H,W] -> [N,grid,grid]
+            mask = (mask > 0.5).reshape(mask.shape[0], -1)
+            mask = torch.cat([mask, mask.new_ones(mask.shape[0], 1)], dim=1)
+            if x.size()[0] == 1:
+                x = x.expand(mask.shape[0], c, gh, gw)
+        x = x.reshape(x.shape[0], c, gh * gw).permute(2, 0, 1)  # NCHW -> (HW)NC
+        x = torch.cat([x.mean(dim=0, keepdim=True), x], dim=0)  # (HW+1)NC
+        positional_embedding = self.positional_embedding
+        if not (self.positional_embedding.shape[0] == x.shape[0]):
+            cls_pos = positional_embedding[0:1, :]
+            per_pos_embedding = (
+                F.interpolate(
+                    positional_embedding[1:, :]
+                    .permute(1, 0)
+                    .view(1, -1, self.grid_size, self.grid_size),
+                    size=(gh, gw),
+                    mode="bicubic",
+                )
+                .reshape(-1, gh * gw)
+                .permute(1, 0)
+            )
+            positional_embedding = torch.cat([cls_pos, per_pos_embedding])
+        x = x + positional_embedding[:, None, :].to(x.dtype)  # (HW+1)NC
+        x, _ = F.multi_head_attention_forward(
+            query=x,
+            key=x,
+            value=x,
+            embed_dim_to_check=x.shape[-1],
+            num_heads=self.num_heads,
+            q_proj_weight=self.q_proj.weight,
+            k_proj_weight=self.k_proj.weight,
+            v_proj_weight=self.v_proj.weight,
+            in_proj_weight=None,
+            in_proj_bias=torch.cat(
+                [self.q_proj.bias, self.k_proj.bias, self.v_proj.bias]
+            ),
+            bias_k=None,
+            bias_v=None,
+            add_zero_attn=False,
+            dropout_p=0,
+            out_proj_weight=self.c_proj.weight,
+            out_proj_bias=self.c_proj.bias,
+            use_separate_proj_weight=True,
+            training=self.training,
+            need_weights=False,
+            key_padding_mask=mask,
+        )
+        if return_cls:
+            return x[0]
+        else:
+            return x
+class ModifiedResNet(nn.Module):
+    """
+    A ResNet class that is similar to torchvision's but contains the following changes:
+    - There are now 3 "stem" convolutions as opposed to 1, with an average pool instead of a max pool.
+    - Performs anti-aliasing strided convolutions, where an avgpool is prepended to convolutions with stride > 1
+    - The final pooling layer is a QKV attention instead of an average pool
+    """
+    def __init__(self, layers, output_dim, heads, input_resolution=224, width=64):
+        super().__init__()
+        self.output_dim = output_dim
+        self.input_resolution = input_resolution
+        # the 3-layer stem
+        self.conv1 = nn.Conv2d(
+            3, width // 2, kernel_size=3, stride=2, padding=1, bias=False
+        )
+        self.bn1 = nn.BatchNorm2d(width // 2)
+        self.conv2 = nn.Conv2d(
+            width // 2, width // 2, kernel_size=3, padding=1, bias=False
+        )
+        self.bn2 = nn.BatchNorm2d(width // 2)
+        self.conv3 = nn.Conv2d(width // 2, width, kernel_size=3, padding=1, bias=False)
+        self.bn3 = nn.BatchNorm2d(width)
+        self.avgpool = nn.AvgPool2d(2)
+        self.relu = nn.ReLU(inplace=True)
+        # residual layers
+        self._inplanes = width  # this is a *mutable* variable used during construction
+        self.layer1 = self._make_layer(width, layers[0])
+        self.layer2 = self._make_layer(width * 2, layers[1], stride=2)
+        self.layer3 = self._make_layer(width * 4, layers[2], stride=2)
+        self.layer4 = self._make_layer(width * 8, layers[3], stride=2)
+        embed_dim = width * 32  # the ResNet feature dimension
+        self.attnpool = AttentionPool2d(
+            input_resolution // 32, embed_dim, heads, output_dim
+        )
+    def _make_layer(self, planes, blocks, stride=1):
+        layers = [Bottleneck(self._inplanes, planes, stride)]
+        self._inplanes = planes * Bottleneck.expansion
+        for _ in range(1, blocks):
+            layers.append(Bottleneck(self._inplanes, planes))
+        return nn.Sequential(*layers)
+    def forward(self, x, mask: torch.Tensor = None, return_cls=True):
+        def stem(x):
+            for conv, bn in [
+                (self.conv1, self.bn1),
+                (self.conv2, self.bn2),
+                (self.conv3, self.bn3),
+            ]:
+                x = self.relu(bn(conv(x)))
+            x = self.avgpool(x)
+            return x
+        x = x.type(self.conv1.weight.dtype)
+        x = stem(x)  # 1/4,1/4
+        x = self.layer1(x)
+        x = self.layer2(x)  # 1/8,1/8
+        x = self.layer3(x)  # 1/16,1/16
+        x = self.layer4(x)  # 1/32,1/32
+        b, c, gh, gw = x.shape
+        x = self.attnpool(x, mask, return_cls)
+        if not return_cls:
+            return x[1:].permute(1, 0, 2).reshape(b, gh, gw, x.shape[-1])  # N,L,C
+        return x
+class LayerNorm(nn.LayerNorm):
+    """Subclass torch's LayerNorm to handle fp16."""
+    def forward(self, x: torch.Tensor):
+        orig_type = x.dtype
+        ret = super().forward(x.type(torch.float32))
+        return ret.type(orig_type)
+class QuickGELU(nn.Module):
+    def forward(self, x: torch.Tensor):
+        return x * torch.sigmoid(1.702 * x)
+class ResidualAttentionBlock(nn.Module):
+    def __init__(self, d_model: int, n_head: int, attn_mask: torch.Tensor = None):
+        super().__init__()
+        self.attn = nn.MultiheadAttention(d_model, n_head)
+        self.ln_1 = LayerNorm(d_model)
+        self.mlp = nn.Sequential(
+            OrderedDict(
+                [
+                    ("c_fc", nn.Linear(d_model, d_model * 4)),
+                    ("gelu", QuickGELU()),
+                    ("c_proj", nn.Linear(d_model * 4, d_model)),
+                ]
+            )
+        )
+        self.ln_2 = LayerNorm(d_model)
+        self.attn_mask = attn_mask
+    def attention(self, x: torch.Tensor, **kwargs):
+        self.attn_mask = (
+            self.attn_mask.to(dtype=x.dtype, device=x.device)
+            if self.attn_mask is not None
+            else None
+        )
+        return self.attn(
+            x, x, x, need_weights=False, attn_mask=self.attn_mask, **kwargs
+        )[0]
+    def forward(self, x: torch.Tensor, **kwargs):
+        x = x + self.attention(self.ln_1(x), **kwargs)
+        x = x + self.mlp(self.ln_2(x))
+        return x
+class Transformer(nn.Module):
+    def __init__(
+        self, width: int, layers: int, heads: int, attn_mask: torch.Tensor = None
+    ):
+        super().__init__()
+        self.width = width
+        self.layers = layers
+        self.resblocks = nn.Sequential(
+            *[ResidualAttentionBlock(width, heads, attn_mask) for _ in range(layers)]
+        )
+    def forward(self, x: torch.Tensor, **kwargs):
+        for block in self.resblocks:
+            x = block(x, **kwargs)
+        return x
+class VisionTransformer(nn.Module):
+    def __init__(
+        self,
+        input_resolution: int,
+        patch_size: int,
+        mask_prompt_depth: int,
+        width: int,
+        layers: int,
+        heads: int,
+        output_dim: int,
+    ):
+        super().__init__()
+        self.input_resolution = input_resolution
+        self.output_dim = output_dim
+        self.conv1 = nn.Conv2d(
+            in_channels=3,
+            out_channels=width,
+            kernel_size=patch_size,
+            stride=patch_size,
+            bias=False,
+        )
+        scale = width ** -0.5
+        self.class_embedding = nn.Parameter(scale * torch.randn(width))
+        self.positional_embedding = nn.Parameter(
+            scale * torch.randn((input_resolution // patch_size) ** 2 + 1, width)
+        )
+        self.grid_size = input_resolution // patch_size
+        self.ln_pre = LayerNorm(width)
+        self.transformer = Transformer(width, layers, heads)
+        self.ln_post = LayerNorm(width)
+        self.proj = nn.Parameter(scale * torch.randn(width, output_dim))
+        self.mask_pool = nn.AvgPool2d(patch_size, stride=patch_size)
+        self.mask_prompt_depth = mask_prompt_depth
+        self.mask_embedding = nn.Parameter(torch.zeros(self.mask_prompt_depth, self.grid_size * self.grid_size, width))
+    def forward(self, x: torch.Tensor, m: torch.Tensor = None):
+        x = self.conv1(x)  # shape = [*, width, grid, grid]
+        x = x.reshape(x.shape[0], x.shape[1], -1)  # shape = [*, width, grid ** 2]
+        x = x.permute(0, 2, 1)  # shape = [*, grid ** 2, width]
+        if m is not None:
+            m = self.mask_pool(m.to(torch.float).squeeze()).reshape(m.shape[0], -1).unsqueeze(-1)
+            m = torch.ceil(m)
+            if self.mask_embedding.shape[1] == 1:
+                mask_embedding = self.mask_embedding.to(x.dtype).repeat(1, x.shape[1], 1)
+            else:
+                mask_embedding = self.mask_embedding.to(x.dtype)
+            x = x * m + mask_embedding[0].unsqueeze(0) * (1 - m)
+        x = torch.cat([self.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, x.shape[-1], dtype=x.dtype, device=x.device), x], dim=1)  # shape = [*, grid ** 2 + 1, width]
+        x = x + self.positional_embedding.to(x.dtype)
+        x = self.ln_pre(x)
+        x = x.permute(1, 0, 2)  # NLD -> LND
+        if m is not None:
+            for i, blk in enumerate(self.transformer.resblocks):
+                d = i + 1
+                x = blk(x)
+                if d < self.mask_prompt_depth:
+                    masked_x = x[1:, :, :] * m.permute(1, 0, 2) + \
+                               mask_embedding[d].unsqueeze(0).permute(1, 0, 2) * (1 - m.permute(1, 0, 2))
+                    x = torch.cat([x[:1, :, :], masked_x], dim=0)
+        else:
+            x = self.transformer(x)
+        x = x.permute(1, 0, 2)  # LND -> NLD
+        x = self.ln_post(x[:, 0, :])
+        if self.proj is not None:
+            x = x @ self.proj
+        return x
+class CLIP(nn.Module):
+    def __init__(
+        self,
+        embed_dim: int,
+        # vision
+        image_resolution: int,
+        vision_layers: Union[Tuple[int, int, int, int], int],
+        vision_width: int,
+        vision_patch_size: int,
+        mask_prompt_depth: int,
+        # text
+        context_length: int,
+        vocab_size: int,
+        transformer_width: int,
+        transformer_heads: int,
+        transformer_layers: int,
+    ):
+        super().__init__()
+        self.context_length = context_length
+        if isinstance(vision_layers, (tuple, list)):
+            vision_heads = vision_width * 32 // 64
+            self.visual = ModifiedResNet(
+                layers=vision_layers,
+                output_dim=embed_dim,
+                heads=vision_heads,
+                input_resolution=image_resolution,
+                width=vision_width,
+            )
+        else:
+            vision_heads = vision_width // 64
+            self.visual = VisionTransformer(
+                input_resolution=image_resolution,
+                patch_size=vision_patch_size,
+                mask_prompt_depth=mask_prompt_depth,
+                width=vision_width,
+                layers=vision_layers,
+                heads=vision_heads,
+                output_dim=embed_dim,
+            )
+        self.transformer = Transformer(
+            width=transformer_width,
+            layers=transformer_layers,
+            heads=transformer_heads,
+            attn_mask=self.build_attention_mask(),
+        )
+        self.vocab_size = vocab_size
+        self.token_embedding = nn.Embedding(vocab_size, transformer_width)
+        self.positional_embedding = nn.Parameter(
+            torch.empty(self.context_length, transformer_width)
+        )
+        self.ln_final = LayerNorm(transformer_width)
+        self.text_projection = nn.Parameter(torch.empty(transformer_width, embed_dim))
+        self.logit_scale = nn.Parameter(torch.ones([]) * np.log(1 / 0.07))
+        self.initialize_parameters()
+    def initialize_parameters(self):
+        nn.init.normal_(self.token_embedding.weight, std=0.02)
+        nn.init.normal_(self.positional_embedding, std=0.01)
+        if isinstance(self.visual, ModifiedResNet):
+            if self.visual.attnpool is not None:
+                std = self.visual.attnpool.c_proj.in_features ** -0.5
+                nn.init.normal_(self.visual.attnpool.q_proj.weight, std=std)
+                nn.init.normal_(self.visual.attnpool.k_proj.weight, std=std)
+                nn.init.normal_(self.visual.attnpool.v_proj.weight, std=std)
+                nn.init.normal_(self.visual.attnpool.c_proj.weight, std=std)
+            for resnet_block in [
+                self.visual.layer1,
+                self.visual.layer2,
+                self.visual.layer3,
+                self.visual.layer4,
+            ]:
+                for name, param in resnet_block.named_parameters():
+                    if name.endswith("bn3.weight"):
+                        nn.init.zeros_(param)
+        proj_std = (self.transformer.width ** -0.5) * (
+            (2 * self.transformer.layers) ** -0.5
+        )
+        attn_std = self.transformer.width ** -0.5
+        fc_std = (2 * self.transformer.width) ** -0.5
+        for block in self.transformer.resblocks:
+            nn.init.normal_(block.attn.in_proj_weight, std=attn_std)
+            nn.init.normal_(block.attn.out_proj.weight, std=proj_std)
+            nn.init.normal_(block.mlp.c_fc.weight, std=fc_std)
+            nn.init.normal_(block.mlp.c_proj.weight, std=proj_std)
+        if self.text_projection is not None:
+            nn.init.normal_(self.text_projection, std=self.transformer.width ** -0.5)
+    def build_attention_mask(self):
+        # lazily create causal attention mask, with full attention between the vision tokens
+        # pytorch uses additive attention mask; fill with -inf
+        mask = torch.empty(self.context_length, self.context_length)
+        mask.fill_(float("-inf"))
+        mask.triu_(1)  # zero out the lower diagonal
+        return mask
+    @property
+    def dtype(self):
+        return self.visual.conv1.weight.dtype
+    def encode_image(self, image, **kwargs):
+        return self.visual(image.type(self.dtype), **kwargs)
+    def encode_text(self, text):
+        x = self.token_embedding(text).type(self.dtype)  # [batch_size, n_ctx, d_model]
+        x = x + self.positional_embedding.type(self.dtype)
+        x = x.permute(1, 0, 2)  # NLD -> LND
+        x = self.transformer(x)
+        x = x.permute(1, 0, 2)  # LND -> NLD
+        x = self.ln_final(x).type(self.dtype)
+        # x.shape = [batch_size, n_ctx, transformer.width]
+        # take features from the eot embedding (eot_token is the highest number in each sequence)
+        x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection
+        return x
+    def forward(self, image, text):
+        image_features = self.encode_image(image)
+        text_features = self.encode_text(text)
+        # normalized features
+        image_features = image_features / image_features.norm(dim=-1, keepdim=True)
+        text_features = text_features / text_features.norm(dim=-1, keepdim=True)
+        # cosine similarity as logits
+        logit_scale = self.logit_scale.exp()
+        logits_per_image = logit_scale * image_features @ text_features.t()
+        logits_per_text = logit_scale * text_features @ image_features.t()
+        # shape = [global_batch_size, global_batch_size]
+        return logits_per_image, logits_per_text
+def convert_weights(model: nn.Module):
+    """Convert applicable model parameters to fp16"""
+    def _convert_weights_to_fp16(l):
+        if isinstance(l, (nn.Conv1d, nn.Conv2d, nn.Linear)):
+            l.weight.data = l.weight.data.half()
+            if l.bias is not None:
+                l.bias.data = l.bias.data.half()
+        if isinstance(l, nn.MultiheadAttention):
+            for attr in [
+                *[f"{s}_proj_weight" for s in ["in", "q", "k", "v"]],
+                "in_proj_bias",
+                "bias_k",
+                "bias_v",
+            ]:
+                tensor = getattr(l, attr)
+                if tensor is not None:
+                    tensor.data = tensor.data.half()
+        for name in ["text_projection", "proj"]:
+            if hasattr(l, name):
+                attr = getattr(l, name)
+                if attr is not None:
+                    attr.data = attr.data.half()
+    model.apply(_convert_weights_to_fp16)
+def build_model(state_dict: dict, mask_prompt_depth: int = 0):
+    vit = "visual.proj" in state_dict
+    if vit:
+        vision_width = state_dict["visual.conv1.weight"].shape[0]
+        vision_layers = len(
+            [
+                k
+                for k in state_dict.keys()
+                if k.startswith("visual.") and k.endswith(".attn.in_proj_weight")
+            ]
+        )
+        vision_patch_size = state_dict["visual.conv1.weight"].shape[-1]
+        grid_size = round(
+            (state_dict["visual.positional_embedding"].shape[0] - 1) ** 0.5
+        )
+        image_resolution = vision_patch_size * grid_size
+    else:
+        assert mask_prompt_depth == 0, 'ResNets do not support mask prompt tuning'
+        counts: list = [
+            len(
+                set(
+                    k.split(".")[2]
+                    for k in state_dict
+                    if k.startswith(f"visual.layer{b}")
+                )
+            )
+            for b in [1, 2, 3, 4]
+        ]
+        vision_layers = tuple(counts)
+        vision_width = state_dict["visual.layer1.0.conv1.weight"].shape[0]
+        output_width = round(
+            (state_dict["visual.attnpool.positional_embedding"].shape[0] - 1) ** 0.5
+        )
+        vision_patch_size = None
+        assert (
+            output_width ** 2 + 1
+            == state_dict["visual.attnpool.positional_embedding"].shape[0]
+        )
+        image_resolution = output_width * 32
+    embed_dim = state_dict["text_projection"].shape[1]
+    context_length = state_dict["positional_embedding"].shape[0]
+    vocab_size = state_dict["token_embedding.weight"].shape[0]
+    transformer_width = state_dict["ln_final.weight"].shape[0]
+    transformer_heads = transformer_width // 64
+    transformer_layers = len(
+        set(
+            k.split(".")[2]
+            for k in state_dict
+            if k.startswith(f"transformer.resblocks")
+        )
+    )
+    model = CLIP(
+        embed_dim,
+        image_resolution,
+        vision_layers,
+        vision_width,
+        vision_patch_size,
+        mask_prompt_depth,
+        context_length,
+        vocab_size,
+        transformer_width,
+        transformer_heads,
+        transformer_layers,
+    )
+    for key in ["input_resolution", "context_length", "vocab_size"]:
+        if key in state_dict:
+            del state_dict[key]
+    convert_weights(model)
+    model.load_state_dict(state_dict, strict=False)
+    return model.eval()

open_vocab_seg/modeling/clip_adapter/clip/simple_tokenizer.py ADDED Viewed

	@@ -0,0 +1,150 @@

+import gzip
+import html
+import os
+from functools import lru_cache
+import ftfy
+import regex as re
+@lru_cache()
+def default_bpe():
+    return os.path.join(
+        os.path.dirname(os.path.abspath(__file__)), "bpe_simple_vocab_16e6.txt.gz"
+    )
+@lru_cache()
+def bytes_to_unicode():
+    """
+    Returns list of utf-8 byte and a corresponding list of unicode strings.
+    The reversible bpe codes work on unicode strings.
+    This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.
+    When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.
+    This is a signficant percentage of your normal, say, 32K bpe vocab.
+    To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
+    And avoids mapping to whitespace/control characters the bpe code barfs on.
+    """
+    bs = (
+        list(range(ord("!"), ord("~") + 1))
+        + list(range(ord("¡"), ord("¬") + 1))
+        + list(range(ord("®"), ord("ÿ") + 1))
+    )
+    cs = bs[:]
+    n = 0
+    for b in range(2 ** 8):
+        if b not in bs:
+            bs.append(b)
+            cs.append(2 ** 8 + n)
+            n += 1
+    cs = [chr(n) for n in cs]
+    return dict(zip(bs, cs))
+def get_pairs(word):
+    """Return set of symbol pairs in a word.
+    Word is represented as tuple of symbols (symbols being variable-length strings).
+    """
+    pairs = set()
+    prev_char = word[0]
+    for char in word[1:]:
+        pairs.add((prev_char, char))
+        prev_char = char
+    return pairs
+def basic_clean(text):
+    text = ftfy.fix_text(text)
+    text = html.unescape(html.unescape(text))
+    return text.strip()
+def whitespace_clean(text):
+    text = re.sub(r"\s+", " ", text)
+    text = text.strip()
+    return text
+class SimpleTokenizer(object):
+    def __init__(self, bpe_path: str = default_bpe()):
+        self.byte_encoder = bytes_to_unicode()
+        self.byte_decoder = {v: k for k, v in self.byte_encoder.items()}
+        merges = gzip.open(bpe_path).read().decode("utf-8").split("\n")
+        merges = merges[1 : 49152 - 256 - 2 + 1]
+        merges = [tuple(merge.split()) for merge in merges]
+        vocab = list(bytes_to_unicode().values())
+        vocab = vocab + [v + "</w>" for v in vocab]
+        for merge in merges:
+            vocab.append("".join(merge))
+        vocab.extend(["<|startoftext|>", "<|endoftext|>"])
+        self.encoder = dict(zip(vocab, range(len(vocab))))
+        self.decoder = {v: k for k, v in self.encoder.items()}
+        self.bpe_ranks = dict(zip(merges, range(len(merges))))
+        self.cache = {
+            "<|startoftext|>": "<|startoftext|>",
+            "<|endoftext|>": "<|endoftext|>",
+        }
+        self.pat = re.compile(
+            r"""<\|startoftext\|>|<\|endoftext\|>|'s|'t|'re|'ve|'m|'ll|'d|[\p{L}]+|[\p{N}]|[^\s\p{L}\p{N}]+""",
+            re.IGNORECASE,
+        )
+    def bpe(self, token):
+        if token in self.cache:
+            return self.cache[token]
+        word = tuple(token[:-1]) + (token[-1] + "</w>",)
+        pairs = get_pairs(word)
+        if not pairs:
+            return token + "</w>"
+        while True:
+            bigram = min(pairs, key=lambda pair: self.bpe_ranks.get(pair, float("inf")))
+            if bigram not in self.bpe_ranks:
+                break
+            first, second = bigram
+            new_word = []
+            i = 0
+            while i < len(word):
+                try:
+                    j = word.index(first, i)
+                    new_word.extend(word[i:j])
+                    i = j
+                except:
+                    new_word.extend(word[i:])
+                    break
+                if word[i] == first and i < len(word) - 1 and word[i + 1] == second:
+                    new_word.append(first + second)
+                    i += 2
+                else:
+                    new_word.append(word[i])
+                    i += 1
+            new_word = tuple(new_word)
+            word = new_word
+            if len(word) == 1:
+                break
+            else:
+                pairs = get_pairs(word)
+        word = " ".join(word)
+        self.cache[token] = word
+        return word
+    def encode(self, text):
+        bpe_tokens = []
+        text = whitespace_clean(basic_clean(text)).lower()
+        for token in re.findall(self.pat, text):
+            token = "".join(self.byte_encoder[b] for b in token.encode("utf-8"))
+            bpe_tokens.extend(
+                self.encoder[bpe_token] for bpe_token in self.bpe(token).split(" ")
+            )
+        return bpe_tokens
+    def decode(self, tokens):
+        text = "".join([self.decoder[token] for token in tokens])
+        text = (
+            bytearray([self.byte_decoder[c] for c in text])
+            .decode("utf-8", errors="replace")
+            .replace("</w>", " ")
+        )
+        return text

open_vocab_seg/modeling/clip_adapter/text_template.py CHANGED Viewed

@@ -6,7 +6,8 @@
 from typing import List
-import clip
 import torch
 from torch import nn
@@ -130,7 +131,7 @@ class PredefinedPromptExtractor(PromptExtractor):
     def forward(self, noun_list: List[str], clip_model: nn.Module):
         text_features_bucket = []
         for template in self.templates:
-            noun_tokens = [clip.tokenize(template.format(noun)) for noun in noun_list]
             text_inputs = torch.cat(noun_tokens).to(
                 clip_model.text_projection.data.device
             )

 from typing import List
+# import clip
+from .clip import tokenize
 import torch
 from torch import nn
     def forward(self, noun_list: List[str], clip_model: nn.Module):
         text_features_bucket = []
         for template in self.templates:
+            noun_tokens = [tokenize(template.format(noun)) for noun in noun_list]
             text_inputs = torch.cat(noun_tokens).to(
                 clip_model.text_projection.data.device
             )

open_vocab_seg/modeling/clip_adapter/utils.py CHANGED Viewed

@@ -4,7 +4,7 @@
 from typing import Tuple
 import numpy as np
 import torch
-import clip
 from detectron2.utils.comm import get_local_rank, synchronize
@@ -70,10 +70,10 @@ def build_clip_model(model: str, mask_prompt_depth: int = 0, frozen: bool = True
     rank = get_local_rank()
     if rank == 0:
         # download on rank 0 only
-        model, _ = clip.load(model, mask_prompt_depth=mask_prompt_depth, device="cpu")
     synchronize()
     if rank != 0:
-        model, _ = clip.load(model, mask_prompt_depth=mask_prompt_depth, device="cpu")
     synchronize()
     if frozen:
         for param in model.parameters():

 from typing import Tuple
 import numpy as np
 import torch
+from .clip import load as clip_load
 from detectron2.utils.comm import get_local_rank, synchronize
     rank = get_local_rank()
     if rank == 0:
         # download on rank 0 only
+        model, _ = clip_load(model, mask_prompt_depth=mask_prompt_depth, device="cpu")
     synchronize()
     if rank != 0:
+        model, _ = clip_load(model, mask_prompt_depth=mask_prompt_depth, device="cpu")
     synchronize()
     if frozen:
         for param in model.parameters():

configs/ovseg_swinB_vitL_demo.yaml → ovseg_swinB_vitL_demo.yaml RENAMED Viewed

@@ -12,7 +12,7 @@ MODEL:
     DROP_PATH_RATE: 0.3
     PATCH_NORM: True
     PRETRAIN_IMG_SIZE: 384
-  WEIGHTS: "swin_base_patch4_window12_384_22k.pkl"
   PIXEL_MEAN: [123.675, 116.280, 103.530]
   PIXEL_STD: [58.395, 57.120, 57.375]
   SEM_SEG_HEAD:

     DROP_PATH_RATE: 0.3
     PATCH_NORM: True
     PRETRAIN_IMG_SIZE: 384
+  WEIGHTS: "./ovseg_swinbase_vitL14_ft_mpt.pth"
   PIXEL_MEAN: [123.675, 116.280, 103.530]
   PIXEL_STD: [58.395, 57.120, 57.375]
   SEM_SEG_HEAD:

requirements.txt CHANGED Viewed

@@ -7,8 +7,14 @@ wandb
 fire
 opencv-python
 pandas
-torch==1.10.1
-torchvision==0.11.2
 # Detectron
 --find-links https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html

 fire
 opencv-python
 pandas
+ftfy
+regex
+tqdm
+gdown
+# Torch
+--find-links https://download.pytorch.org/whl/cu113/torch_stable.html
+torch==1.10.1+cu113
+torchvision==0.11.2+cu113
 # Detectron
 --find-links https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html

resources/demo_samples/sample_01.jpeg ADDED Viewed

Git LFS Details

SHA256: 154943906b5ed394b620da62124c4421dfa96f858f014839eb346678aaa71fc3
Pointer size: 132 Bytes
Size of remote file: 4.32 MB

resources/demo_samples/sample_02.jpeg ADDED Viewed

Git LFS Details

SHA256: 591c2bf26a843a62881d89dbd7f4e9a6f90dda9fb8786c9b6e5172a28623d1b0
Pointer size: 132 Bytes
Size of remote file: 1.84 MB

tools/convert-pretrained-clip-model-to-d2.py DELETED Viewed

@@ -1,69 +0,0 @@
-# Copyright (c) Facebook, Inc. and its affiliates.
-# Copyright (c) Meta Platforms, Inc. All Rights Reserved
-import pickle as pkl
-import sys
-import torch
-"""
-Usage:
-  # download pretrained swin model:
-  wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth
-  # run the conversion
-  ./convert-pretrained-model-to-d2.py swin_tiny_patch4_window7_224.pth swin_tiny_patch4_window7_224.pkl
-  # Then, use swin_tiny_patch4_window7_224.pkl with the following changes in config:
-MODEL:
-  WEIGHTS: "/path/to/swin_tiny_patch4_window7_224.pkl"
-INPUT:
-  FORMAT: "RGB"
-"""
-def transform(path):
-    model = torch.load(path, map_location="cpu")
-    print(f"loading {path}......")
-    state_dict = model["model"]
-    state_dict = {
-        k.replace("visual_model.", ""): v
-        for k, v in state_dict.items()
-        if k.startswith("visual_model")
-    }
-    source_keys = [k for k in state_dict.keys() if "relative_coords" in k]
-    for k in source_keys:
-        state_dict[
-            k.replace("relative_coords", "relative_position_index")
-        ] = state_dict[k]
-        del state_dict[k]
-    source_keys = [k for k in state_dict.keys() if "atten_mask_matrix" in k]
-    for k in source_keys:
-        state_dict[k.replace("atten_mask_matrix", "attn_mask")] = state_dict[k]
-        del state_dict[k]
-    source_keys = [k for k in state_dict.keys() if "rel_pos_embed_table" in k]
-    for k in source_keys:
-        state_dict[
-            k.replace("rel_pos_embed_table", "relative_position_bias_table")
-        ] = state_dict[k]
-        del state_dict[k]
-    source_keys = [k for k in state_dict.keys() if "channel_reduction" in k]
-    for k in source_keys:
-        state_dict[k.replace("channel_reduction", "reduction")] = state_dict[k]
-        del state_dict[k]
-    return {
-        k if k.startswith("backbone.") else "backbone." + k: v
-        for k, v in state_dict.items()
-    }
-if __name__ == "__main__":
-    input = sys.argv[1]
-    res = {
-        "model": transform(input),
-        "__author__": "third_party",
-        "matching_heuristics": True,
-    }
-    with open(sys.argv[2], "wb") as f:
-        pkl.dump(res, f)

tools/convert-pretrained-swin-model-to-d2.py DELETED Viewed

@@ -1,30 +0,0 @@
-# Copyright (c) Facebook, Inc. and its affiliates.
-# Copyright (c) Meta Platforms, Inc. All Rights Reserved
-import pickle as pkl
-import sys
-import torch
-"""
-Usage:
-  # download pretrained swin model:
-  wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth
-  # run the conversion
-  ./convert-pretrained-model-to-d2.py swin_tiny_patch4_window7_224.pth swin_tiny_patch4_window7_224.pkl
-  # Then, use swin_tiny_patch4_window7_224.pkl with the following changes in config:
-MODEL:
-  WEIGHTS: "/path/to/swin_tiny_patch4_window7_224.pkl"
-INPUT:
-  FORMAT: "RGB"
-"""
-if __name__ == "__main__":
-    input = sys.argv[1]
-    obj = torch.load(input, map_location="cpu")["model"]
-    res = {"model": obj, "__author__": "third_party", "matching_heuristics": True}
-    with open(sys.argv[2], "wb") as f:
-        pkl.dump(res, f)

tools/convert-torchvision-to-d2.py DELETED Viewed

@@ -1,54 +0,0 @@
-# Copyright (c) Facebook, Inc. and its affiliates.
-# Copyright (c) Meta Platforms, Inc. All Rights Reserved
-import pickle as pkl
-import sys
-import torch
-"""
-Usage:
-  # download one of the ResNet{18,34,50,101,152} models from torchvision:
-  wget https://download.pytorch.org/models/resnet50-19c8e357.pth -O r50.pth
-  # run the conversion
-  ./convert-torchvision-to-d2.py r50.pth r50.pkl
-  # Then, use r50.pkl with the following changes in config:
-MODEL:
-  WEIGHTS: "/path/to/r50.pkl"
-  PIXEL_MEAN: [123.675, 116.280, 103.530]
-  PIXEL_STD: [58.395, 57.120, 57.375]
-  RESNETS:
-    DEPTH: 50
-    STRIDE_IN_1X1: False
-INPUT:
-  FORMAT: "RGB"
-  These models typically produce slightly worse results than the
-  pre-trained ResNets we use in official configs, which are the
-  original ResNet models released by MSRA.
-"""
-if __name__ == "__main__":
-    input = sys.argv[1]
-    obj = torch.load(input, map_location="cpu")
-    newmodel = {}
-    for k in list(obj.keys()):
-        old_k = k
-        if "layer" not in k:
-            k = "stem." + k
-        for t in [1, 2, 3, 4]:
-            k = k.replace("layer{}".format(t), "res{}".format(t + 1))
-        for t in [1, 2, 3]:
-            k = k.replace("bn{}".format(t), "conv{}.norm".format(t))
-        k = k.replace("downsample.0", "shortcut")
-        k = k.replace("downsample.1", "shortcut.norm")
-        print(old_k, "->", k)
-        newmodel[k] = obj.pop(old_k).detach().numpy()
-    res = {"model": newmodel, "__author__": "torchvision", "matching_heuristics": True}
-    with open(sys.argv[2], "wb") as f:
-        pkl.dump(res, f)
-    if obj:
-        print("Unconverted keys:", obj.keys())

tools/ovseg_replace_clip.py DELETED Viewed

@@ -1,30 +0,0 @@
-# Copyright (c) Facebook, Inc. and its affiliates.
-# Copyright (c) Meta Platforms, Inc. All Rights Reserved
-import torch
-from collections import OrderedDict
-# PATH to new clip model
-clip_ckpt = torch.load('xx/open_clip/src/logs/2022_xx/checkpoints/epoch_x.pt')
-new_model = OrderedDict()
-state_dict = clip_ckpt['state_dict']
-for k, v in state_dict.items():
-    new_key = k.replace('module.','')
-    new_model[new_key] = v
-# PATH to trained ovseg model
-ovseg_model = torch.load('xx/ovseg/output/model_final.pth', 'cpu')
-for k, v in new_model.items():
-    new_k = 'clip_adapter.clip_model.' + k
-    if new_k in ovseg_model['model'].keys():
-        ovseg_model['model'][new_k] = v
-    else:
-        print(f'{new_k} does not exist in ckpt')
-# ovseg_model['model']['clip_adapter.clip_model.visual.mask_embedding'] = new_model['visual.mask_embedding']
-torch.save(ovseg_model, 'xx/ovseg/output/ovseg_ft_mpt.pth')

tools/search_thr_ensemble_w.sh DELETED Viewed

@@ -1,11 +0,0 @@
-or MASK_THR in 0.35 0.4 0.45
-o
-   for ENSEMBLE_WEIGHT in 0.6 0.65 0.7 0.75 0.8
-   do
-       python train_net.py --num-gpu 8 --eval-only --config-file configs/ovseg_swinB_vitL_bs32_120k.yaml \
-       MODEL.WEIGHTS #PATH_of_ovseg_swinbase_vitL14_ft_mpt.pth DATASETS.TEST \(\"ade20k_sem_seg_val\"\) \
-       MODEL.CLIP_ADAPTER.CLIP_ENSEMBLE_WEIGHT $ENSEMBLE_WEIGHT MODEL.CLIP_ADAPTER.MASK_THR $MASK_THR
-   done
-one

tools/web_demo.py DELETED Viewed

@@ -1,76 +0,0 @@
-# Copyright (c) Facebook, Inc. and its affiliates.
-# Copyright (c) Meta Platforms, Inc. All Rights Reserved
-import multiprocessing as mp
-import numpy as np
-from PIL import Image
-from detectron2.config import get_cfg
-from detectron2.projects.deeplab import add_deeplab_config
-from detectron2.data.detection_utils import read_image
-from open_vocab_seg import add_ovseg_config
-from open_vocab_seg.utils import VisualizationDemo
-import gradio as gr
-def setup_cfg(config_file):
-    # load config from file and command-line arguments
-    cfg = get_cfg()
-    add_deeplab_config(cfg)
-    add_ovseg_config(cfg)
-    cfg.merge_from_file(config_file)
-    cfg.freeze()
-    return cfg
-def inference(class_names, input_img):
-    mp.set_start_method("spawn", force=True)
-    config_file = './configs/ovseg_swinB_vitL_demo.yaml'
-    cfg = setup_cfg(config_file)
-    demo = VisualizationDemo(cfg)
-    class_names = class_names.split(',')
-    img = read_image(input_img, format="BGR")
-    _, visualized_output = demo.run_on_image(img, class_names)
-    return Image.fromarray(np.uint8(visualized_output.get_image())).convert('RGB')
-# demo = gr.Interface(fn=greet, inputs="text", outputs="text")
-# demo.launch()
-examples = [['Oculus, Ukulele', './resources/demo_samples/sample_03.jpeg'],]
-output_labels = ['segmentation map']
-title = 'OVSeg'
-description = """
-Gradio Demo for Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP \n
-You may click on of the examples or upload your own image. \n
-OVSeg could perform open vocabulary segmentation, you may input more classes (seperate by comma).
-"""
-article = """
-<p style='text-align: center'>
-<a href='https://arxiv.org/abs/2210.04150' target='_blank'>
-Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
-</a>
-|
-<a href='https://github.com' target='_blank'>Github Repo</a></p>
-"""
-gr.Interface(
-    inference,
-    inputs=[
-        gr.inputs.Textbox(
-            lines=1, placeholder=None, default='', label='class names'),
-        gr.inputs.Image(type='filepath')
-    ],
-    outputs=gr.outputs.Image(label='segmentation map'),
-    title=title,
-    description=description,
-    article=article,
-    examples=examples).launch(enable_queue=True)

train_net.py DELETED Viewed

@@ -1,309 +0,0 @@
-# Copyright (c) Facebook, Inc. and its affiliates.
-# Copyright (c) Meta Platforms, Inc. All Rights Reserved
-# Modified by Feng Liang from https://github.com/MendelXu/zsseg.baseline/blob/master/train_net.py
-"""
-OVSeg Training Script.
-This script is a simplified version of the training script in detectron2/tools.
-"""
-import copy
-import itertools
-import logging
-import os
-from collections import OrderedDict
-from typing import Any, Dict, List, Set
-import detectron2.utils.comm as comm
-import torch
-from detectron2.checkpoint import DetectionCheckpointer
-from detectron2.config import get_cfg
-from detectron2.data import MetadataCatalog
-from detectron2.engine import (
-    DefaultTrainer,
-    default_argument_parser,
-    default_setup,
-    launch,
-)
-from detectron2.evaluation import (
-    DatasetEvaluator,
-    CityscapesSemSegEvaluator,
-    COCOEvaluator,
-    DatasetEvaluators,
-    verify_results,
-)
-from detectron2.projects.deeplab import add_deeplab_config, build_lr_scheduler
-from detectron2.solver.build import maybe_add_gradient_clipping
-from detectron2.utils.logger import setup_logger
-from detectron2.utils.events import CommonMetricPrinter, JSONWriter
-# OVSeg
-from open_vocab_seg import SemanticSegmentorWithTTA, add_ovseg_config
-from open_vocab_seg.data import (
-    MaskFormerSemanticDatasetMapper,
-)
-from open_vocab_seg.data import (
-    build_detection_test_loader,
-    build_detection_train_loader,
-)
-from open_vocab_seg.evaluation import (
-    GeneralizedSemSegEvaluator,
-)
-from open_vocab_seg.utils.events import WandbWriter, setup_wandb
-from open_vocab_seg.utils.post_process_utils import dense_crf_post_process
-class Trainer(DefaultTrainer):
-    """
-    Extension of the Trainer class adapted to DETR.
-    """
-    @classmethod
-    def build_evaluator(cls, cfg, dataset_name, output_folder=None):
-        """
-        Create evaluator(s) for a given dataset.
-        This uses the special metadata "evaluator_type" associated with each
-        builtin dataset. For your own dataset, you can simply create an
-        evaluator manually in your script and do not have to worry about the
-        hacky if-else logic here.
-        """
-        if output_folder is None:
-            output_folder = os.path.join(cfg.OUTPUT_DIR, "inference")
-        evaluator_list = []
-        evaluator_type = MetadataCatalog.get(dataset_name).evaluator_type
-        if evaluator_type in ["sem_seg"]:
-            evaluator = GeneralizedSemSegEvaluator
-            evaluator_list.append(
-                evaluator(
-                    dataset_name,
-                    distributed=True,
-                    output_dir=output_folder,
-                    post_process_func=dense_crf_post_process
-                    if cfg.TEST.DENSE_CRF
-                    else None,
-                )
-            )
-        if len(evaluator_list) == 0:
-            raise NotImplementedError(
-                "no Evaluator for the dataset {} with the type {}".format(
-                    dataset_name, evaluator_type
-                )
-            )
-        elif len(evaluator_list) == 1:
-            return evaluator_list[0]
-        return DatasetEvaluators(evaluator_list)
-    @classmethod
-    def build_train_loader(cls, cfg):
-        dataset = None
-        # Semantic segmentation dataset mapper
-        if cfg.INPUT.DATASET_MAPPER_NAME == "mask_former_semantic":
-            mapper = MaskFormerSemanticDatasetMapper(cfg, True)
-        else:
-            raise NotImplementedError
-        return build_detection_train_loader(cfg, mapper=mapper, dataset=dataset)
-    @classmethod
-    def build_test_loader(cls, cfg, dataset_name):
-        """
-        Returns:
-            iterable
-        It now calls :func:`detectron2.data.build_detection_test_loader`.
-        Overwrite it if you'd like a different data loader.
-        """
-        return build_detection_test_loader(cfg, dataset_name, mapper=None)
-    def build_writers(self):
-        """
-        Build a list of writers to be used. By default it contains
-        writers that write metrics to the screen,
-        a json file, and a tensorboard event file respectively.
-        If you'd like a different list of writers, you can overwrite it in
-        your trainer.
-        Returns:
-            list[EventWriter]: a list of :class:`EventWriter` objects.
-        It is now implemented by:
-        ::
-            return [
-                CommonMetricPrinter(self.max_iter),
-                JSONWriter(os.path.join(self.cfg.OUTPUT_DIR, "metrics.json")),
-                TensorboardXWriter(self.cfg.OUTPUT_DIR),
-            ]
-        """
-        # Here the default print/log frequency of each writer is used.
-        return [
-            # It may not always print what you want to see, since it prints "common" metrics only.
-            CommonMetricPrinter(self.max_iter),
-            JSONWriter(os.path.join(self.cfg.OUTPUT_DIR, "metrics.json")),
-            WandbWriter(),
-        ]
-    @classmethod
-    def build_lr_scheduler(cls, cfg, optimizer):
-        """
-        It now calls :func:`detectron2.solver.build_lr_scheduler`.
-        Overwrite it if you'd like a different scheduler.
-        """
-        return build_lr_scheduler(cfg, optimizer)
-    @classmethod
-    def build_optimizer(cls, cfg, model):
-        weight_decay_norm = cfg.SOLVER.WEIGHT_DECAY_NORM
-        weight_decay_embed = cfg.SOLVER.WEIGHT_DECAY_EMBED
-        defaults = {}
-        defaults["lr"] = cfg.SOLVER.BASE_LR
-        defaults["weight_decay"] = cfg.SOLVER.WEIGHT_DECAY
-        norm_module_types = (
-            torch.nn.BatchNorm1d,
-            torch.nn.BatchNorm2d,
-            torch.nn.BatchNorm3d,
-            torch.nn.SyncBatchNorm,
-            # NaiveSyncBatchNorm inherits from BatchNorm2d
-            torch.nn.GroupNorm,
-            torch.nn.InstanceNorm1d,
-            torch.nn.InstanceNorm2d,
-            torch.nn.InstanceNorm3d,
-            torch.nn.LayerNorm,
-            torch.nn.LocalResponseNorm,
-        )
-        params: List[Dict[str, Any]] = []
-        memo: Set[torch.nn.parameter.Parameter] = set()
-        for module_name, module in model.named_modules():
-            for module_param_name, value in module.named_parameters(recurse=False):
-                if not value.requires_grad:
-                    continue
-                # Avoid duplicating parameters
-                if value in memo:
-                    continue
-                memo.add(value)
-                hyperparams = copy.copy(defaults)
-                if "backbone" in module_name:
-                    hyperparams["lr"] = (
-                        hyperparams["lr"] * cfg.SOLVER.BACKBONE_MULTIPLIER
-                    )
-                if (
-                    "relative_position_bias_table" in module_param_name
-                    or "absolute_pos_embed" in module_param_name
-                ):
-                    print(module_param_name)
-                    hyperparams["weight_decay"] = 0.0
-                if isinstance(module, norm_module_types):
-                    hyperparams["weight_decay"] = weight_decay_norm
-                if isinstance(module, torch.nn.Embedding):
-                    hyperparams["weight_decay"] = weight_decay_embed
-                params.append({"params": [value], **hyperparams})
-        def maybe_add_full_model_gradient_clipping(optim):
-            # detectron2 doesn't have full model gradient clipping now
-            clip_norm_val = cfg.SOLVER.CLIP_GRADIENTS.CLIP_VALUE
-            enable = (
-                cfg.SOLVER.CLIP_GRADIENTS.ENABLED
-                and cfg.SOLVER.CLIP_GRADIENTS.CLIP_TYPE == "full_model"
-                and clip_norm_val > 0.0
-            )
-            class FullModelGradientClippingOptimizer(optim):
-                def step(self, closure=None):
-                    all_params = itertools.chain(
-                        *[x["params"] for x in self.param_groups]
-                    )
-                    torch.nn.utils.clip_grad_norm_(all_params, clip_norm_val)
-                    super().step(closure=closure)
-            return FullModelGradientClippingOptimizer if enable else optim
-        optimizer_type = cfg.SOLVER.OPTIMIZER
-        if optimizer_type == "SGD":
-            optimizer = maybe_add_full_model_gradient_clipping(torch.optim.SGD)(
-                params, cfg.SOLVER.BASE_LR, momentum=cfg.SOLVER.MOMENTUM
-            )
-        elif optimizer_type == "ADAMW":
-            optimizer = maybe_add_full_model_gradient_clipping(torch.optim.AdamW)(
-                params, cfg.SOLVER.BASE_LR
-            )
-        else:
-            raise NotImplementedError(f"no optimizer type {optimizer_type}")
-        if not cfg.SOLVER.CLIP_GRADIENTS.CLIP_TYPE == "full_model":
-            optimizer = maybe_add_gradient_clipping(cfg, optimizer)
-        return optimizer
-    @classmethod
-    def test_with_TTA(cls, cfg, model):
-        logger = logging.getLogger("detectron2.trainer")
-        # In the end of training, run an evaluation with TTA.
-        logger.info("Running inference with test-time augmentation ...")
-        model = SemanticSegmentorWithTTA(cfg, model)
-        evaluators = [
-            cls.build_evaluator(
-                cfg, name, output_folder=os.path.join(cfg.OUTPUT_DIR, "inference_TTA")
-            )
-            for name in cfg.DATASETS.TEST
-        ]
-        res = cls.test(cfg, model, evaluators)
-        res = OrderedDict({k + "_TTA": v for k, v in res.items()})
-        return res
-def setup(args):
-    """
-    Create configs and perform basic setups.
-    """
-    cfg = get_cfg()
-    # for poly lr schedule
-    add_deeplab_config(cfg)
-    add_ovseg_config(cfg)
-    cfg.merge_from_file(args.config_file)
-    cfg.merge_from_list(args.opts)
-    cfg.freeze()
-    default_setup(cfg, args)
-    # Setup logger for "ovseg" module
-    if not args.eval_only:
-        setup_wandb(cfg, args)
-    setup_logger(
-        output=cfg.OUTPUT_DIR, distributed_rank=comm.get_rank(), name="ovseg"
-    )
-    return cfg
-def main(args):
-    cfg = setup(args)
-    if args.eval_only:
-        model = Trainer.build_model(cfg)
-        DetectionCheckpointer(model, save_dir=cfg.OUTPUT_DIR).resume_or_load(
-            cfg.MODEL.WEIGHTS, resume=args.resume
-        )
-        if cfg.TEST.AUG.ENABLED:
-            res = Trainer.test_with_TTA(cfg, model)
-        else:
-            res = Trainer.test(cfg, model)
-        if comm.is_main_process():
-            verify_results(cfg, res)
-        return res
-    trainer = Trainer(cfg)
-    trainer.resume_or_load(resume=args.resume)
-    return trainer.train()
-if __name__ == "__main__":
-    args = default_argument_parser().parse_args()
-    print("Command Line Args:", args)
-    launch(
-        main,
-        args.num_gpus,
-        num_machines=args.num_machines,
-        machine_rank=args.machine_rank,
-        dist_url=args.dist_url,
-        args=(args,),
-    )