beomi's picture
Upload processor
17b3b21 verified
|
raw
history blame
3.39 kB
---
language:
- ko
- en
license: cc-by-nc-sa-4.0
library_name: transformers
---
# Llama-3-KoEn-8B-xtuner-llava-preview ๐ŸŒ‹
<!-- Provide a quick summary of what the model is/does. -->
Llama-3-KoEn-8B-xtuner-llava-preview ๐ŸŒ‹ is Korean based MutliModal based on Llava architecture, merged with [ChatVector](https://arxiv.org/abs/2310.04799) methods leveraging 2 models:
1) [beomi/Llama-3-KoEn-8B-preview](https://huggingface.co./beomi/Llama-3-KoEn-8B-preview),
2) [xtuner/llava-llama-3-8b-transformers](https://huggingface.co./xtuner/llava-llama-3-8b-transformers)
## Model Details
### Model Description
- **Developed by:** Junbum Lee (Beomi)
- **Model type:** HuggingFace Llava ๐ŸŒ‹
- **Language(s) (NLP):** Korean, English
- **License:** cc-by-nc-sa-4.0 under Llama3 License
- **Merged from model:** [beomi/Llama-3-KoEn-8B-preview](https://huggingface.co./beomi/Llama-3-KoEn-8B-preview) / [xtuner/llava-llama-3-8b-transformers](https://huggingface.co./xtuner/llava-llama-3-8b-transformers)
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
![Cat walking on frozen Han-River, Seoul](https://cdn-uploads.huggingface.co/production/uploads/5e56829137cb5b49818287ea/NWfoArWI4UPAxpEnolkwT.jpeg)
```python
import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration
model_id = "beomi/Llama-3-KoEn-8B-xtuner-llava-preview"
model = LlavaForConditionalGeneration.from_pretrained(
model_id,
torch_dtype='auto',
device_map='auto',
)
processor = AutoProcessor.from_pretrained(model_id)
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('./llava-llama-3-KoEn-8b-v1_1-transformers')
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
prompt = ("<|start_header_id|>user<|end_header_id|>\n\n<image>\n์ด ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ ์„ค๋ช…ํ•ด์ฃผ์„ธ์š”.<|eot_id|>"
"<|start_header_id|>assistant<|end_header_id|>\n\n์ด ์ด๋ฏธ์ง€์—๋Š”")
image_file = "https://cdn-uploads.huggingface.co/production/uploads/5e56829137cb5b49818287ea/NWfoArWI4UPAxpEnolkwT.jpeg"
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)
output = model.generate(**inputs, max_new_tokens=400, do_sample=True, eos_token_id=terminators,)
print(processor.decode(output[0][2:], skip_special_tokens=False))
# --- Example Output ---
user<|end_header_id|>
<image>
์ด ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ ์„ค๋ช…ํ•ด์ฃผ์„ธ์š”.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
์ด ์ด๋ฏธ์ง€์—๋Š” ๊ณ ์–‘์ด ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๊ฐ•๋ฌผ ์œ„๋ฅผ ๊ฑธ์–ด๊ฐ€๋Š” ๋ชจ์Šต์ด ๋ณด์—ฌ์ง‘๋‹ˆ๋‹ค. ๊ณ ์–‘์ด๋Š” ๊ฐ•๋ฌผ์˜ ์ž”๋ฌผ๊ฒฐ์— ๋ฏธ๋„๋Ÿผ์„ ํƒ€๊ณ  ๊ฐ• ๊ฐ€๋กœ๋ฅผ ์ง€๋‚˜๋Š” ๋ฐ ๋Šฅ์ˆ™ํ•˜๊ฒŒ ๋ณด์ž…๋‹ˆ๋‹ค. ๊ณ ์–‘์ด์˜ ๋ฐœ์€ ๊ฐ•๋ฌผ๋กœ ์ž˜ ๋“ค์–ด๊ฐ€, ๊ทธ๊ฒƒ์„ ์ฆ๊ธฐ๋ฉฐ ๊ฑธ์–ด๊ฐ‘๋‹ˆ๋‹ค.
๋˜ํ•œ ์ด ์ด๋ฏธ์ง€๋„ ์Œ์„ฑ ๋…น์Œ์„ ํ•˜๊ฑฐ๋‚˜ ๋…นํ™”๋œ ์ž๋ฃŒ๋กœ ์ œ์ž‘๋˜์—ˆ์œผ๋ฉฐ, ์ฃผ๋กœ ๊ณ ์–‘์ด์˜ ๋ชจ์Šต์„ ๊ฐ•ํ•˜๊ฒŒ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์†Œ๋ฆฌ ํšจ๊ณผ๋„ ์—ฌ๋Ÿฌ ๊ฐ€์ง€๋กœ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ณ ์–‘์ด์˜ ์Šคํ† ๋ฆฌ๋ฅผ ๋‹ค์–‘ํ•˜๊ฒŒ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ๊ฐ•๋ฌผ์€ ์ž”๋ฌผ๊ฒฐ์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ ๊ฐ•๋ฌผ ์œ„๋ฅผ ๊ฑท๋Š” ๊ณ ์–‘์ด์˜ ๋ชจ์Šต์„ ๋”์šฑ ๊ฐ•๋ ฌํ•˜๊ฒŒ ๊ฐ•์กฐํ•˜๊ธฐ ์œ„ํ•ด ์ž”๋ฌผ๊ฒฐ์„ ํ†ตํ•ด ๋” ๋””ํ…Œ์ผํ•œ ์žฅ๋ฉด์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.<|eot_id|>
```