File size: 8,335 Bytes
4d7dacd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60dca27
4d7dacd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6e35277
4d7dacd
 
 
 
 
 
 
 
 
 
60dca27
6e35277
4d7dacd
60dca27
 
 
4d7dacd
 
60dca27
4d7dacd
60dca27
4d7dacd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60dca27
4d7dacd
 
 
 
 
 
 
60dca27
4d7dacd
 
 
 
 
 
60dca27
 
 
4d7dacd
 
 
60dca27
4d7dacd
 
 
 
6e35277
60dca27
4d7dacd
60dca27
4d7dacd
 
 
 
 
 
 
 
 
 
 
 
 
60dca27
4d7dacd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60dca27
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
license: mit
language:
- en
tags:
- zero-shot-image-classification
- OpenCLIP
- clip
- biology
- biodiversity
- agronomy
- CV
- images
- animals
- species
- taxonomy
- rare species
- endangered species
- evolutionary biology
- multimodal
- knowledge-guided
datasets:
- ChihHsuan-Yang/Arboretum
- EOL
base_model:
- openai/clip-vit-base-patch16
- openai/clip-vit-large-patch14
pipeline_tag: zero-shot-image-classification
---


# Model Card for BioTrove

<!-- Banner links -->
<div style="text-align:center;">
  <a href="https://baskargroup.github.io/Arboretum/" target="_blank">
    <img src="https://img.shields.io/badge/Project%20Page-Visit-blue" alt="Project Page" style="margin-right:10px;">
  </a>
  <a href="https://github.com/baskargroup/Arboretum" target="_blank">
    <img src="https://img.shields.io/badge/GitHub-Visit-lightgrey" alt="GitHub" style="margin-right:10px;">
  </a>
  <a href="https://pypi.org/project/arbor-process/" target="_blank">
    <img src="https://img.shields.io/badge/PyPI-arbor--process%200.1.0-orange" alt="PyPI arbor-process 0.1.0">
  </a>
</div>


BIOTROVE is a new suite of vision-language foundation models for biodiversity. These CLIP-style foundation models were trained on [BIOTROVE-40M](https://baskargroup.github.io/Arboretum/), which is a large-scale dataset of 40 million images of 33K species of plants and animals. The models are evaluated on zero-shot image classification tasks.

- **Model type:** Vision Transformer (ViT-B/16, ViT-L/14)
- **License:** MIT
- **Fine-tuned from model:** [OpenAI CLIP](https://github.com/mlfoundations/open_clip), [MetaCLIP](https://github.com/facebookresearch/MetaCLIP), [BioCLIP](https://github.com/Imageomics/BioCLIP)

These models were developed for the benefit of the AI community as an open-source product. Thus, we request that any derivative products are also open-source.


### Model Description

BioTrove is based on OpenAI's [CLIP](https://openai.com/research/clip) model. 
The models were trained on [BIOTROVE-40M](https://baskargroup.github.io/Arboretum/) for the following configurations: 

- **BIOTROVE-O:** Trained a ViT-B/16 backbone initialized from the [OpenCLIP's](https://github.com/mlfoundations/open_clip) checkpoint. The training was conducted for 40 epochs.
- **BIOTROVE-B:** Trained a ViT-B/16 backbone initialized from the [BioCLIP's](https://github.com/Imageomics/BioCLIP) checkpoint. The training was conducted for 8 epochs.
- **BIOTROVE-M:** Trained a ViT-L/14 backbone initialized from the [MetaCLIP's](https://github.com/facebookresearch/MetaCLIP) checkpoint. The training was conducted for 12 epochs.

### Model Training
**See the [Model Training](https://github.com/baskargroup/Arboretum?tab=readme-ov-file#model-training) section on the [Github](https://github.com/baskargroup/Arboretum) for examples of how to use BioTrove models in zero-shot  image classification tasks.**

We train three models using a modified version of the [BioCLIP / OpenCLIP](https://github.com/Imageomics/bioclip/tree/main/src/training) codebase. Each model is trained on BioTrove-40M, on 2 nodes, 8xH100 GPUs, on NYU's [Greene](https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/greene) high-performance compute cluster. We publicly release all code needed to reproduce our results on the [Github](https://github.com/baskargroup/Arboretum) page.

We optimize our hyperparameters prior to training with [Ray](https://docs.ray.io/en/latest/index.html). Our standard training parameters are as follows:

```
--dataset-type webdataset 
--pretrained openai 
--text_type random 
--dataset-resampled 
--warmup 5000 
--batch-size 4096 
--accum-freq 1 
--epochs 40
--workers 8 
--model ViT-B-16 
--lr 0.0005 
--wd 0.0004 
--precision bf16 
--beta1 0.98 
--beta2 0.99 
--eps 1.0e-6 
--local-loss 
--gather-with-grad 
--ddp-static-graph 
--grad-checkpointing
```

For more extensive documentation of the training process and the significance of each hyperparameter, we recommend referencing the [OpenCLIP](https://github.com/mlfoundations/open_clip) and [BioCLIP](https://github.com/Imageomics/BioCLIP) documentation, respectively.

### Model Validation

For validating the zero-shot accuracy of our trained models and comparing to other benchmarks, we use the [VLHub](https://github.com/penfever/vlhub) repository with some slight modifications.

#### Pre-Run

After cloning the [Github](https://github.com/baskargroup/Arboretum) repository and navigating to the `BioTrove/model_validation` directory, we recommend installing all the project requirements into a conda container; `pip install -r requirements.txt`. Also, before executing a command in VLHub, please add `BioTrove/model_validation/src` to your PYTHONPATH.

```bash
export PYTHONPATH="$PYTHONPATH:$PWD/src";
```

#### Base Command

A basic BioTrove model evaluation command can be launched as follows. This example would evaluate a CLIP-ResNet50 checkpoint whose weights resided at the path designated via the `--resume` flag on the ImageNet validation set, and would report the results to Weights and Biases.

```bash
python src/training/main.py --batch-size=32 --workers=8 --imagenet-val "/imagenet/val/" --model="resnet50" --zeroshot-frequency=1 --image-size=224 --resume "/PATH/TO/WEIGHTS.pth" --report-to wandb
```

### Training Dataset
- **Dataset Repository:** [BioTrove](https://github.com/baskargroup/Arboretum)
- **Dataset Paper:** BioTrove: A Large Multimodal Dataset Enabling AI for Biodiversity ([arXiv](https://arxiv.org/abs/2406.17720))
- **HF Dataset card:** [BioTrove](https://huggingface.co./datasets/ChihHsuan-Yang/Arboretum)


### Model's Limitation
All the `BioTrove` models were evaluated on the challenging [CONFOUNDING-SPECIES](https://arxiv.org/abs/2306.02507) benchmark. However, all the models performed at or below random chance. This could be an interesting avenue for follow-up work and further expand the models capabilities.

In general, we found that models trained on web-scraped data performed better with common
names, whereas models trained on specialist datasets performed better when using scientific names.
Additionally, models trained on web-scraped data excel at classifying at the highest taxonomic
level (kingdom), while models begin to benefit from specialist datasets like [BIOTROVE-40M](https://baskargroup.github.io/Arboretum/) and
[Tree-of-Life-10M](https://huggingface.co./datasets/imageomics/TreeOfLife-10M) at the lower taxonomic levels (order and species). From a practical standpoint, `BioTrove` is highly accurate at the species level, and higher-level taxa can be deterministically derived from lower ones.

Addressing these limitations will further enhance the applicability of models like `BioTrove` in real-world biodiversity monitoring tasks.

### Acknowledgements
This work was supported by the AI Research Institutes program supported by the NSF and USDA-NIFA under [AI Institute: for Resilient Agriculture](https://aiira.iastate.edu/), Award No. 2021-67021-35329. This was also
partly supported by the NSF under CPS Frontier grant CNS-1954556. Also, we gratefully
acknowledge the support of NYU IT [High Performance Computing](https://www.nyu.edu/life/information-technology/research-computing-services/high-performance-computing.html) resources, services, and staff
expertise.

<!--BibTex citation -->
<section class="section" id="BibTeX">
  <div class="container is-max-widescreen content">
      <h2 class="title">Citation</h2>
      If you find the models and datasets useful in your research, please consider citing our paper:
      <pre><code>@misc{yang2024arboretumlargemultimodaldataset,
        title={BioTrove: A Large Multimodal Dataset Enabling AI for Biodiversity}, 
        author={Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab,
           Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall, Nirmal Baishnab, Asheesh K Singh,
            Arti Singh, Soumik Sarkar, Nirav Merchant, Chinmay Hegde, Baskar Ganapathysubramanian},
        year={2024},
        eprint={2406.17720},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2406.17720}, 
  }</code></pre>
  </div>
</section>
<!--End BibTex citation -->

---

For more details and access to the BioTrove dataset, please visit the [Project Page](https://baskargroup.github.io/Arboretum/).