beomi's picture
Update README.md
b8bccaf verified
|
raw
history blame
4.75 kB
---
language:
- ko
- en
license: cc-by-nc-sa-4.0
library_name: transformers
---
# Llama-3-KoEn-8B-xtuner-llava-preview πŸŒ‹
<!-- Provide a quick summary of what the model is/does. -->
Llama-3-KoEn-8B-xtuner-llava-preview πŸŒ‹ is Korean based MutliModal based on Llava architecture, merged with [ChatVector](https://arxiv.org/abs/2310.04799) methods leveraging 2 models:
1) [beomi/Llama-3-KoEn-8B-preview](https://huggingface.co./beomi/Llama-3-KoEn-8B-preview)
2) [xtuner/llava-llama-3-8b-transformers](https://huggingface.co./xtuner/llava-llama-3-8b-transformers)
## Model Details
### Model Description
- **Developed by:** Junbum Lee (Beomi)
- **Model type:** HuggingFace Llava πŸŒ‹
- **Language(s) (NLP):** Korean, English
- **License:** cc-by-nc-sa-4.0 under Llama3 License
- **Merged from model:** [beomi/Llama-3-KoEn-8B-preview](https://huggingface.co./beomi/Llama-3-KoEn-8B-preview) & [xtuner/llava-llama-3-8b-transformers](https://huggingface.co./xtuner/llava-llama-3-8b-transformers)
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
![Cat walking on frozen Han-River, Seoul](https://cdn-uploads.huggingface.co/production/uploads/5e56829137cb5b49818287ea/NWfoArWI4UPAxpEnolkwT.jpeg)
> Two version recommended
>
> v1. `revision='a38aac3'`: Basic ChatVector, with [25B+ trained KoEn ckpt(rev. d4d25a2)](https://huggingface.co./beomi/Llama-3-KoEn-8B-preview/commit/d4d25a2).
>
> v1-1. `revision='0224971'`: Basic ChatVector, with [40B+ trained KoEn ckpt(rev. ad39b32)](https://huggingface.co./beomi/Llama-3-KoEn-8B-preview/commit/ad39b32cd4207f37f61f16e79d3f4020c5b744ef).
>
> v2. `revision='4f04d1e'`: Model diff based merging(ref. https://huggingface.co./blog/maywell/llm-feature-transfer), with [25B+ trained KoEn ckpt(rev. d4d25a2)](https://huggingface.co./beomi/Llama-3-KoEn-8B-preview/commit/d4d25a2).
```python
import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration
model_id = "beomi/Llama-3-KoEn-8B-xtuner-llava-preview"
model = LlavaForConditionalGeneration.from_pretrained(
model_id,
torch_dtype='auto',
device_map='auto',
revision='a38aac3', # 'a38aac3' for basic ChatVector, '4f04d1e' for Model diff based merging(ref. https://huggingface.co./blog/maywell/llm-feature-transfer)
)
processor = AutoProcessor.from_pretrained(model_id)
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('./llava-llama-3-KoEn-8b-v1_1-transformers')
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
prompt = ("<|start_header_id|>user<|end_header_id|>\n\n<image>\n이 이미지에 λŒ€ν•΄μ„œ μ„€λͺ…ν•΄μ£Όμ„Έμš”.<|eot_id|>"
"<|start_header_id|>assistant<|end_header_id|>\n\n이 μ΄λ―Έμ§€μ—λŠ”")
image_file = "https://cdn-uploads.huggingface.co/production/uploads/5e56829137cb5b49818287ea/NWfoArWI4UPAxpEnolkwT.jpeg"
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)
output = model.generate(**inputs, max_new_tokens=400, do_sample=True, eos_token_id=terminators,)
print(processor.decode(output[0][2:], skip_special_tokens=False))
# --- Example Output [Chat Vector] ---
user<|end_header_id|>
<image>
이 이미지에 λŒ€ν•΄μ„œ μ„€λͺ…ν•΄μ£Όμ„Έμš”.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
이 μ΄λ―Έμ§€μ—λŠ” 고양이 ν•œ λ§ˆλ¦¬κ°€ κ°•λ¬Ό μœ„λ₯Ό κ±Έμ–΄κ°€λŠ” λͺ¨μŠ΅μ΄ λ³΄μ—¬μ§‘λ‹ˆλ‹€. κ³ μ–‘μ΄λŠ” κ°•λ¬Όμ˜ μž”λ¬Όκ²°μ— λ―Έλ„λŸΌμ„ 타고 κ°• κ°€λ‘œλ₯Ό μ§€λ‚˜λŠ” 데 λŠ₯μˆ™ν•˜κ²Œ λ³΄μž…λ‹ˆλ‹€. κ³ μ–‘μ΄μ˜ λ°œμ€ κ°•λ¬Όλ‘œ 잘 λ“€μ–΄κ°€, 그것을 즐기며 κ±Έμ–΄κ°‘λ‹ˆλ‹€.
λ˜ν•œ 이 이미지도 μŒμ„± λ…ΉμŒμ„ ν•˜κ±°λ‚˜ λ…Ήν™”λœ 자료둜 μ œμž‘λ˜μ—ˆμœΌλ©°, 주둜 κ³ μ–‘μ΄μ˜ λͺ¨μŠ΅μ„ κ°•ν•˜κ²Œ λ³΄μ—¬μ€λ‹ˆλ‹€. μ†Œλ¦¬ νš¨κ³Όλ„ μ—¬λŸ¬ κ°€μ§€λ‘œ μΆ”κ°€ν•˜μ—¬ κ³ μ–‘μ΄μ˜ μŠ€ν† λ¦¬λ₯Ό λ‹€μ–‘ν•˜κ²Œ μ „λ‹¬ν•©λ‹ˆλ‹€. 강물은 μž”λ¬Όκ²°μ„ λ‚˜νƒ€λ‚΄λ©° κ°•λ¬Ό μœ„λ₯Ό κ±·λŠ” κ³ μ–‘μ΄μ˜ λͺ¨μŠ΅μ„ λ”μš± κ°•λ ¬ν•˜κ²Œ κ°•μ‘°ν•˜κΈ° μœ„ν•΄ μž”λ¬Όκ²°μ„ 톡해 더 λ””ν…ŒμΌν•œ μž₯면을 λ³΄μ—¬μ€λ‹ˆλ‹€.<|eot_id|>
# --- Example Output [Model diff based merging] ---
user<|end_header_id|>
<image>
이 이미지에 λŒ€ν•΄μ„œ μ„€λͺ…ν•΄μ£Όμ„Έμš”.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
이 μ΄λ―Έμ§€μ—λŠ” ν•œκ΅­μ–΄ μžλ§‰κ³Ό ν•¨κ»˜ 고양이가 물에 λ°œμ„ λ””λ””κ³  κ±·λŠ” λͺ¨μŠ΅μ΄ 담겨 μžˆμŠ΅λ‹ˆλ‹€. κ³ μ–‘μ΄λŠ” 였λ₯Έμͺ½ λ°œμ„ 물에 λ‹΄κ·Έκ³  κ±·λŠ” 쀑이며, ν•œκ΅­μ–΄ μžλ§‰μ€ "κ³ μ–‘μ΄λŠ” 물을 μ’‹μ•„ν•©λ‹ˆλ‹€"λΌλŠ” λ¬Έμž₯을 ν¬ν•¨ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. 이 μžλ§‰μ€ 고양이가 물을 μ’‹μ•„ν•˜λŠ” 것을 κ°•μ‘°ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.<|eot_id|>
```