|
--- |
|
license: mit |
|
language: |
|
- en |
|
tags: |
|
- zero-shot-image-classification |
|
- OpenCLIP |
|
- clip |
|
- biology |
|
- biodiversity |
|
- agronomy |
|
- CV |
|
- images |
|
- animals |
|
- species |
|
- taxonomy |
|
- rare species |
|
- endangered species |
|
- evolutionary biology |
|
- multimodal |
|
- knowledge-guided |
|
datasets: |
|
- ChihHsuan-Yang/Arboretum |
|
- EOL |
|
base_model: |
|
- openai/clip-vit-base-patch16 |
|
- openai/clip-vit-large-patch14 |
|
pipeline_tag: zero-shot-image-classification |
|
--- |
|
|
|
|
|
# Model Card for BioTrove |
|
|
|
<!-- Banner links --> |
|
<div style="text-align:center;"> |
|
<a href="https://baskargroup.github.io/Arboretum/" target="_blank"> |
|
<img src="https://img.shields.io/badge/Project%20Page-Visit-blue" alt="Project Page" style="margin-right:10px;"> |
|
</a> |
|
<a href="https://github.com/baskargroup/Arboretum" target="_blank"> |
|
<img src="https://img.shields.io/badge/GitHub-Visit-lightgrey" alt="GitHub" style="margin-right:10px;"> |
|
</a> |
|
<a href="https://pypi.org/project/arbor-process/" target="_blank"> |
|
<img src="https://img.shields.io/badge/PyPI-arbor--process%200.1.0-orange" alt="PyPI arbor-process 0.1.0"> |
|
</a> |
|
</div> |
|
|
|
|
|
BIOTROVE is a new suite of vision-language foundation models for biodiversity. These CLIP-style foundation models were trained on [BIOTROVE-40M](https://baskargroup.github.io/Arboretum/), which is a large-scale dataset of 40 million images of 33K species of plants and animals. The models are evaluated on zero-shot image classification tasks. |
|
|
|
- **Model type:** Vision Transformer (ViT-B/16, ViT-L/14) |
|
- **License:** MIT |
|
- **Fine-tuned from model:** [OpenAI CLIP](https://github.com/mlfoundations/open_clip), [MetaCLIP](https://github.com/facebookresearch/MetaCLIP), [BioCLIP](https://github.com/Imageomics/BioCLIP) |
|
|
|
These models were developed for the benefit of the AI community as an open-source product. Thus, we request that any derivative products are also open-source. |
|
|
|
|
|
### Model Description |
|
|
|
BioTrove is based on OpenAI's [CLIP](https://openai.com/research/clip) model. |
|
The models were trained on [BIOTROVE-40M](https://baskargroup.github.io/Arboretum/) for the following configurations: |
|
|
|
- **BIOTROVE-O:** Trained a ViT-B/16 backbone initialized from the [OpenCLIP's](https://github.com/mlfoundations/open_clip) checkpoint. The training was conducted for 40 epochs. |
|
- **BIOTROVE-B:** Trained a ViT-B/16 backbone initialized from the [BioCLIP's](https://github.com/Imageomics/BioCLIP) checkpoint. The training was conducted for 8 epochs. |
|
- **BIOTROVE-M:** Trained a ViT-L/14 backbone initialized from the [MetaCLIP's](https://github.com/facebookresearch/MetaCLIP) checkpoint. The training was conducted for 12 epochs. |
|
|
|
### Model Training |
|
**See the [Model Training](https://github.com/baskargroup/Arboretum?tab=readme-ov-file#model-training) section on the [Github](https://github.com/baskargroup/Arboretum) for examples of how to use BioTrove models in zero-shot image classification tasks.** |
|
|
|
We train three models using a modified version of the [BioCLIP / OpenCLIP](https://github.com/Imageomics/bioclip/tree/main/src/training) codebase. Each model is trained on BioTrove-40M, on 2 nodes, 8xH100 GPUs, on NYU's [Greene](https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/greene) high-performance compute cluster. We publicly release all code needed to reproduce our results on the [Github](https://github.com/baskargroup/Arboretum) page. |
|
|
|
We optimize our hyperparameters prior to training with [Ray](https://docs.ray.io/en/latest/index.html). Our standard training parameters are as follows: |
|
|
|
``` |
|
--dataset-type webdataset |
|
--pretrained openai |
|
--text_type random |
|
--dataset-resampled |
|
--warmup 5000 |
|
--batch-size 4096 |
|
--accum-freq 1 |
|
--epochs 40 |
|
--workers 8 |
|
--model ViT-B-16 |
|
--lr 0.0005 |
|
--wd 0.0004 |
|
--precision bf16 |
|
--beta1 0.98 |
|
--beta2 0.99 |
|
--eps 1.0e-6 |
|
--local-loss |
|
--gather-with-grad |
|
--ddp-static-graph |
|
--grad-checkpointing |
|
``` |
|
|
|
For more extensive documentation of the training process and the significance of each hyperparameter, we recommend referencing the [OpenCLIP](https://github.com/mlfoundations/open_clip) and [BioCLIP](https://github.com/Imageomics/BioCLIP) documentation, respectively. |
|
|
|
### Model Validation |
|
|
|
For validating the zero-shot accuracy of our trained models and comparing to other benchmarks, we use the [VLHub](https://github.com/penfever/vlhub) repository with some slight modifications. |
|
|
|
#### Pre-Run |
|
|
|
After cloning the [Github](https://github.com/baskargroup/Arboretum) repository and navigating to the `BioTrove/model_validation` directory, we recommend installing all the project requirements into a conda container; `pip install -r requirements.txt`. Also, before executing a command in VLHub, please add `BioTrove/model_validation/src` to your PYTHONPATH. |
|
|
|
```bash |
|
export PYTHONPATH="$PYTHONPATH:$PWD/src"; |
|
``` |
|
|
|
#### Base Command |
|
|
|
A basic BioTrove model evaluation command can be launched as follows. This example would evaluate a CLIP-ResNet50 checkpoint whose weights resided at the path designated via the `--resume` flag on the ImageNet validation set, and would report the results to Weights and Biases. |
|
|
|
```bash |
|
python src/training/main.py --batch-size=32 --workers=8 --imagenet-val "/imagenet/val/" --model="resnet50" --zeroshot-frequency=1 --image-size=224 --resume "/PATH/TO/WEIGHTS.pth" --report-to wandb |
|
``` |
|
|
|
### Training Dataset |
|
- **Dataset Repository:** [BioTrove](https://github.com/baskargroup/Arboretum) |
|
- **Dataset Paper:** BioTrove: A Large Multimodal Dataset Enabling AI for Biodiversity ([arXiv](https://arxiv.org/abs/2406.17720)) |
|
- **HF Dataset card:** [BioTrove](https://huggingface.co./datasets/ChihHsuan-Yang/Arboretum) |
|
|
|
|
|
### Model's Limitation |
|
All the `BioTrove` models were evaluated on the challenging [CONFOUNDING-SPECIES](https://arxiv.org/abs/2306.02507) benchmark. However, all the models performed at or below random chance. This could be an interesting avenue for follow-up work and further expand the models capabilities. |
|
|
|
In general, we found that models trained on web-scraped data performed better with common |
|
names, whereas models trained on specialist datasets performed better when using scientific names. |
|
Additionally, models trained on web-scraped data excel at classifying at the highest taxonomic |
|
level (kingdom), while models begin to benefit from specialist datasets like [BIOTROVE-40M](https://baskargroup.github.io/Arboretum/) and |
|
[Tree-of-Life-10M](https://huggingface.co./datasets/imageomics/TreeOfLife-10M) at the lower taxonomic levels (order and species). From a practical standpoint, `BioTrove` is highly accurate at the species level, and higher-level taxa can be deterministically derived from lower ones. |
|
|
|
Addressing these limitations will further enhance the applicability of models like `BioTrove` in real-world biodiversity monitoring tasks. |
|
|
|
### Acknowledgements |
|
This work was supported by the AI Research Institutes program supported by the NSF and USDA-NIFA under [AI Institute: for Resilient Agriculture](https://aiira.iastate.edu/), Award No. 2021-67021-35329. This was also |
|
partly supported by the NSF under CPS Frontier grant CNS-1954556. Also, we gratefully |
|
acknowledge the support of NYU IT [High Performance Computing](https://www.nyu.edu/life/information-technology/research-computing-services/high-performance-computing.html) resources, services, and staff |
|
expertise. |
|
|
|
<!--BibTex citation --> |
|
<section class="section" id="BibTeX"> |
|
<div class="container is-max-widescreen content"> |
|
<h2 class="title">Citation</h2> |
|
If you find the models and datasets useful in your research, please consider citing our paper: |
|
<pre><code>@misc{yang2024arboretumlargemultimodaldataset, |
|
title={BioTrove: A Large Multimodal Dataset Enabling AI for Biodiversity}, |
|
author={Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab, |
|
Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall, Nirmal Baishnab, Asheesh K Singh, |
|
Arti Singh, Soumik Sarkar, Nirav Merchant, Chinmay Hegde, Baskar Ganapathysubramanian}, |
|
year={2024}, |
|
eprint={2406.17720}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV}, |
|
url={https://arxiv.org/abs/2406.17720}, |
|
}</code></pre> |
|
</div> |
|
</section> |
|
<!--End BibTex citation --> |
|
|
|
--- |
|
|
|
For more details and access to the BioTrove dataset, please visit the [Project Page](https://baskargroup.github.io/Arboretum/). |