Update README.md

01df22c verified 7 months ago

4.42 kB

	---
	datasets:
	- Lin-Chen/ShareGPT4V
	pipeline_tag: image-text-to-text
	library_name: xtuner
	license: llama3
	---

	! Notice: This version of the `llava-llama-3-8b-v1_1-hf` model has been manually modified to ensure compatibility with the pure Transformers library. The original model faced loading issues which have been addressed in this update. For users seeking to deploy this model using the Transformers library, please ensure you are using this modified version for optimal performance and compatibility.

	<div align="center">
	<img src="https://github.com/InternLM/lmdeploy/assets/36994684/0cf8d00f-e86b-40ba-9b54-dc8f1bc6c8d8" width="600"/>


	[![Generic badge](https://img.shields.io/badge/GitHub-%20XTuner-black.svg)](https://github.com/InternLM/xtuner)


	</div>

	## Model

	llava-llama-3-8b-v1_1-hf is a LLaVA model fine-tuned from [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co./meta-llama/Meta-Llama-3-8B-Instruct) and [CLIP-ViT-Large-patch14-336](https://huggingface.co./openai/clip-vit-large-patch14-336) with [ShareGPT4V-PT](https://huggingface.co./datasets/Lin-Chen/ShareGPT4V) and [InternVL-SFT](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat#prepare-training-datasets) by [XTuner](https://github.com/InternLM/xtuner).


	## Details

	\| Model \| Visual Encoder \| Projector \| Resolution \| Pretraining Strategy \| Fine-tuning Strategy \| Pretrain Dataset \| Fine-tune Dataset \|
	\| :-------------------- \| ------------------: \| --------: \| ---------: \| ---------------------: \| ------------------------: \| ------------------------: \| -----------------------: \|
	\| LLaVA-v1.5-7B \| CLIP-L \| MLP \| 336 \| Frozen LLM, Frozen ViT \| Full LLM, Frozen ViT \| LLaVA-PT (558K) \| LLaVA-Mix (665K) \|
	\| LLaVA-Llama-3-8B \| CLIP-L \| MLP \| 336 \| Frozen LLM, Frozen ViT \| Full LLM, LoRA ViT \| LLaVA-PT (558K) \| LLaVA-Mix (665K) \|
	\| LLaVA-Llama-3-8B-v1.1 \| CLIP-L \| MLP \| 336 \| Frozen LLM, Frozen ViT \| Full LLM, LoRA ViT \| ShareGPT4V-PT (1246K) \| InternVL-SFT (1268K) \|

	## Results

	<div align="center">
	<img src="https://github.com/InternLM/xtuner/assets/36994684/a157638c-3500-44ed-bfab-d8d8249f91bb" alt="Image" width=500" />
	</div>

	\| Model \| MMBench Test (EN) \| MMBench Test (CN) \| CCBench Dev \| MMMU Val \| SEED-IMG \| AI2D Test \| ScienceQA Test \| HallusionBench aAcc \| POPE \| GQA \| TextVQA \| MME \| MMStar \|
	\| :-------------------- \| :---------------: \| :---------------: \| :---------: \| :-------: \| :------: \| :-------: \| :------------: \| :-----------------: \| :--: \| :--: \| :-----: \| :------: \| :----: \|
	\| LLaVA-v1.5-7B \| 66.5 \| 59.0 \| 27.5 \| 35.3 \| 60.5 \| 54.8 \| 70.4 \| 44.9 \| 85.9 \| 62.0 \| 58.2 \| 1511/348 \| 30.3 \|
	\| LLaVA-Llama-3-8B \| 68.9 \| 61.6 \| 30.4 \| 36.8 \| 69.8 \| 60.9 \| 73.3 \| 47.3 \| 87.2 \| 63.5 \| 58.0 \| 1506/295 \| 38.2 \|
	\| LLaVA-Llama-3-8B-v1.1 \| 72.3 \| 66.4 \| 31.6 \| 36.8 \| 70.1 \| 70.0 \| 72.9 \| 47.7 \| 86.4 \| 62.6 \| 59.0 \| 1469/349 \| 45.1 \|


	## QuickStart

	### Chat with lmdeploy

	1. Installation
	```
	pip install 'lmdeploy>=0.4.0'
	pip install git+https://github.com/haotian-liu/LLaVA.git
	```

	2. Run

	```python
	from lmdeploy import pipeline, ChatTemplateConfig
	from lmdeploy.vl import load_image
	pipe = pipeline('xtuner/llava-llama-3-8b-v1_1-hf',
	chat_template_config=ChatTemplateConfig(model_name='llama3'))

	image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
	response = pipe(('describe this image', image))
	print(response)
	```

	More details can be found on [inference](https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html) and [serving](https://lmdeploy.readthedocs.io/en/latest/serving/api_server_vl.html) docs.

	### Chat with CLI

	See [here](https://huggingface.co./xtuner/llava-llama-3-8b-v1_1-hf/discussions/1)!


	## Citation

	```bibtex
	@misc{2023xtuner,
	title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
	author={XTuner Contributors},
	howpublished = {\url{https://github.com/InternLM/xtuner}},
	year={2023}
	}
	```