File size: 8,335 Bytes
4d7dacd 60dca27 4d7dacd 6e35277 4d7dacd 60dca27 6e35277 4d7dacd 60dca27 4d7dacd 60dca27 4d7dacd 60dca27 4d7dacd 60dca27 4d7dacd 60dca27 4d7dacd 60dca27 4d7dacd 60dca27 4d7dacd 6e35277 60dca27 4d7dacd 60dca27 4d7dacd 60dca27 4d7dacd 60dca27 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
---
license: mit
language:
- en
tags:
- zero-shot-image-classification
- OpenCLIP
- clip
- biology
- biodiversity
- agronomy
- CV
- images
- animals
- species
- taxonomy
- rare species
- endangered species
- evolutionary biology
- multimodal
- knowledge-guided
datasets:
- ChihHsuan-Yang/Arboretum
- EOL
base_model:
- openai/clip-vit-base-patch16
- openai/clip-vit-large-patch14
pipeline_tag: zero-shot-image-classification
---
# Model Card for BioTrove
<!-- Banner links -->
<div style="text-align:center;">
<a href="https://baskargroup.github.io/Arboretum/" target="_blank">
<img src="https://img.shields.io/badge/Project%20Page-Visit-blue" alt="Project Page" style="margin-right:10px;">
</a>
<a href="https://github.com/baskargroup/Arboretum" target="_blank">
<img src="https://img.shields.io/badge/GitHub-Visit-lightgrey" alt="GitHub" style="margin-right:10px;">
</a>
<a href="https://pypi.org/project/arbor-process/" target="_blank">
<img src="https://img.shields.io/badge/PyPI-arbor--process%200.1.0-orange" alt="PyPI arbor-process 0.1.0">
</a>
</div>
BIOTROVE is a new suite of vision-language foundation models for biodiversity. These CLIP-style foundation models were trained on [BIOTROVE-40M](https://baskargroup.github.io/Arboretum/), which is a large-scale dataset of 40 million images of 33K species of plants and animals. The models are evaluated on zero-shot image classification tasks.
- **Model type:** Vision Transformer (ViT-B/16, ViT-L/14)
- **License:** MIT
- **Fine-tuned from model:** [OpenAI CLIP](https://github.com/mlfoundations/open_clip), [MetaCLIP](https://github.com/facebookresearch/MetaCLIP), [BioCLIP](https://github.com/Imageomics/BioCLIP)
These models were developed for the benefit of the AI community as an open-source product. Thus, we request that any derivative products are also open-source.
### Model Description
BioTrove is based on OpenAI's [CLIP](https://openai.com/research/clip) model.
The models were trained on [BIOTROVE-40M](https://baskargroup.github.io/Arboretum/) for the following configurations:
- **BIOTROVE-O:** Trained a ViT-B/16 backbone initialized from the [OpenCLIP's](https://github.com/mlfoundations/open_clip) checkpoint. The training was conducted for 40 epochs.
- **BIOTROVE-B:** Trained a ViT-B/16 backbone initialized from the [BioCLIP's](https://github.com/Imageomics/BioCLIP) checkpoint. The training was conducted for 8 epochs.
- **BIOTROVE-M:** Trained a ViT-L/14 backbone initialized from the [MetaCLIP's](https://github.com/facebookresearch/MetaCLIP) checkpoint. The training was conducted for 12 epochs.
### Model Training
**See the [Model Training](https://github.com/baskargroup/Arboretum?tab=readme-ov-file#model-training) section on the [Github](https://github.com/baskargroup/Arboretum) for examples of how to use BioTrove models in zero-shot image classification tasks.**
We train three models using a modified version of the [BioCLIP / OpenCLIP](https://github.com/Imageomics/bioclip/tree/main/src/training) codebase. Each model is trained on BioTrove-40M, on 2 nodes, 8xH100 GPUs, on NYU's [Greene](https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/greene) high-performance compute cluster. We publicly release all code needed to reproduce our results on the [Github](https://github.com/baskargroup/Arboretum) page.
We optimize our hyperparameters prior to training with [Ray](https://docs.ray.io/en/latest/index.html). Our standard training parameters are as follows:
```
--dataset-type webdataset
--pretrained openai
--text_type random
--dataset-resampled
--warmup 5000
--batch-size 4096
--accum-freq 1
--epochs 40
--workers 8
--model ViT-B-16
--lr 0.0005
--wd 0.0004
--precision bf16
--beta1 0.98
--beta2 0.99
--eps 1.0e-6
--local-loss
--gather-with-grad
--ddp-static-graph
--grad-checkpointing
```
For more extensive documentation of the training process and the significance of each hyperparameter, we recommend referencing the [OpenCLIP](https://github.com/mlfoundations/open_clip) and [BioCLIP](https://github.com/Imageomics/BioCLIP) documentation, respectively.
### Model Validation
For validating the zero-shot accuracy of our trained models and comparing to other benchmarks, we use the [VLHub](https://github.com/penfever/vlhub) repository with some slight modifications.
#### Pre-Run
After cloning the [Github](https://github.com/baskargroup/Arboretum) repository and navigating to the `BioTrove/model_validation` directory, we recommend installing all the project requirements into a conda container; `pip install -r requirements.txt`. Also, before executing a command in VLHub, please add `BioTrove/model_validation/src` to your PYTHONPATH.
```bash
export PYTHONPATH="$PYTHONPATH:$PWD/src";
```
#### Base Command
A basic BioTrove model evaluation command can be launched as follows. This example would evaluate a CLIP-ResNet50 checkpoint whose weights resided at the path designated via the `--resume` flag on the ImageNet validation set, and would report the results to Weights and Biases.
```bash
python src/training/main.py --batch-size=32 --workers=8 --imagenet-val "/imagenet/val/" --model="resnet50" --zeroshot-frequency=1 --image-size=224 --resume "/PATH/TO/WEIGHTS.pth" --report-to wandb
```
### Training Dataset
- **Dataset Repository:** [BioTrove](https://github.com/baskargroup/Arboretum)
- **Dataset Paper:** BioTrove: A Large Multimodal Dataset Enabling AI for Biodiversity ([arXiv](https://arxiv.org/abs/2406.17720))
- **HF Dataset card:** [BioTrove](https://huggingface.co./datasets/ChihHsuan-Yang/Arboretum)
### Model's Limitation
All the `BioTrove` models were evaluated on the challenging [CONFOUNDING-SPECIES](https://arxiv.org/abs/2306.02507) benchmark. However, all the models performed at or below random chance. This could be an interesting avenue for follow-up work and further expand the models capabilities.
In general, we found that models trained on web-scraped data performed better with common
names, whereas models trained on specialist datasets performed better when using scientific names.
Additionally, models trained on web-scraped data excel at classifying at the highest taxonomic
level (kingdom), while models begin to benefit from specialist datasets like [BIOTROVE-40M](https://baskargroup.github.io/Arboretum/) and
[Tree-of-Life-10M](https://huggingface.co./datasets/imageomics/TreeOfLife-10M) at the lower taxonomic levels (order and species). From a practical standpoint, `BioTrove` is highly accurate at the species level, and higher-level taxa can be deterministically derived from lower ones.
Addressing these limitations will further enhance the applicability of models like `BioTrove` in real-world biodiversity monitoring tasks.
### Acknowledgements
This work was supported by the AI Research Institutes program supported by the NSF and USDA-NIFA under [AI Institute: for Resilient Agriculture](https://aiira.iastate.edu/), Award No. 2021-67021-35329. This was also
partly supported by the NSF under CPS Frontier grant CNS-1954556. Also, we gratefully
acknowledge the support of NYU IT [High Performance Computing](https://www.nyu.edu/life/information-technology/research-computing-services/high-performance-computing.html) resources, services, and staff
expertise.
<!--BibTex citation -->
<section class="section" id="BibTeX">
<div class="container is-max-widescreen content">
<h2 class="title">Citation</h2>
If you find the models and datasets useful in your research, please consider citing our paper:
<pre><code>@misc{yang2024arboretumlargemultimodaldataset,
title={BioTrove: A Large Multimodal Dataset Enabling AI for Biodiversity},
author={Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab,
Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall, Nirmal Baishnab, Asheesh K Singh,
Arti Singh, Soumik Sarkar, Nirav Merchant, Chinmay Hegde, Baskar Ganapathysubramanian},
year={2024},
eprint={2406.17720},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2406.17720},
}</code></pre>
</div>
</section>
<!--End BibTex citation -->
---
For more details and access to the BioTrove dataset, please visit the [Project Page](https://baskargroup.github.io/Arboretum/). |