File size: 3,394 Bytes
feb57de 634b601 17b3b21 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de 634b601 feb57de |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
---
language:
- ko
- en
license: cc-by-nc-sa-4.0
library_name: transformers
---
# Llama-3-KoEn-8B-xtuner-llava-preview ๐
<!-- Provide a quick summary of what the model is/does. -->
Llama-3-KoEn-8B-xtuner-llava-preview ๐ is Korean based MutliModal based on Llava architecture, merged with [ChatVector](https://arxiv.org/abs/2310.04799) methods leveraging 2 models:
1) [beomi/Llama-3-KoEn-8B-preview](https://huggingface.co./beomi/Llama-3-KoEn-8B-preview),
2) [xtuner/llava-llama-3-8b-transformers](https://huggingface.co./xtuner/llava-llama-3-8b-transformers)
## Model Details
### Model Description
- **Developed by:** Junbum Lee (Beomi)
- **Model type:** HuggingFace Llava ๐
- **Language(s) (NLP):** Korean, English
- **License:** cc-by-nc-sa-4.0 under Llama3 License
- **Merged from model:** [beomi/Llama-3-KoEn-8B-preview](https://huggingface.co./beomi/Llama-3-KoEn-8B-preview) / [xtuner/llava-llama-3-8b-transformers](https://huggingface.co./xtuner/llava-llama-3-8b-transformers)
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
![Cat walking on frozen Han-River, Seoul](https://cdn-uploads.huggingface.co/production/uploads/5e56829137cb5b49818287ea/NWfoArWI4UPAxpEnolkwT.jpeg)
```python
import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration
model_id = "beomi/Llama-3-KoEn-8B-xtuner-llava-preview"
model = LlavaForConditionalGeneration.from_pretrained(
model_id,
torch_dtype='auto',
device_map='auto',
)
processor = AutoProcessor.from_pretrained(model_id)
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('./llava-llama-3-KoEn-8b-v1_1-transformers')
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
prompt = ("<|start_header_id|>user<|end_header_id|>\n\n<image>\n์ด ์ด๋ฏธ์ง์ ๋ํด์ ์ค๋ช
ํด์ฃผ์ธ์.<|eot_id|>"
"<|start_header_id|>assistant<|end_header_id|>\n\n์ด ์ด๋ฏธ์ง์๋")
image_file = "https://cdn-uploads.huggingface.co/production/uploads/5e56829137cb5b49818287ea/NWfoArWI4UPAxpEnolkwT.jpeg"
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)
output = model.generate(**inputs, max_new_tokens=400, do_sample=True, eos_token_id=terminators,)
print(processor.decode(output[0][2:], skip_special_tokens=False))
# --- Example Output ---
user<|end_header_id|>
<image>
์ด ์ด๋ฏธ์ง์ ๋ํด์ ์ค๋ช
ํด์ฃผ์ธ์.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
์ด ์ด๋ฏธ์ง์๋ ๊ณ ์์ด ํ ๋ง๋ฆฌ๊ฐ ๊ฐ๋ฌผ ์๋ฅผ ๊ฑธ์ด๊ฐ๋ ๋ชจ์ต์ด ๋ณด์ฌ์ง๋๋ค. ๊ณ ์์ด๋ ๊ฐ๋ฌผ์ ์๋ฌผ๊ฒฐ์ ๋ฏธ๋๋ผ์ ํ๊ณ ๊ฐ ๊ฐ๋ก๋ฅผ ์ง๋๋ ๋ฐ ๋ฅ์ํ๊ฒ ๋ณด์
๋๋ค. ๊ณ ์์ด์ ๋ฐ์ ๊ฐ๋ฌผ๋ก ์ ๋ค์ด๊ฐ, ๊ทธ๊ฒ์ ์ฆ๊ธฐ๋ฉฐ ๊ฑธ์ด๊ฐ๋๋ค.
๋ํ ์ด ์ด๋ฏธ์ง๋ ์์ฑ ๋
น์์ ํ๊ฑฐ๋ ๋
นํ๋ ์๋ฃ๋ก ์ ์๋์์ผ๋ฉฐ, ์ฃผ๋ก ๊ณ ์์ด์ ๋ชจ์ต์ ๊ฐํ๊ฒ ๋ณด์ฌ์ค๋๋ค. ์๋ฆฌ ํจ๊ณผ๋ ์ฌ๋ฌ ๊ฐ์ง๋ก ์ถ๊ฐํ์ฌ ๊ณ ์์ด์ ์คํ ๋ฆฌ๋ฅผ ๋ค์ํ๊ฒ ์ ๋ฌํฉ๋๋ค. ๊ฐ๋ฌผ์ ์๋ฌผ๊ฒฐ์ ๋ํ๋ด๋ฉฐ ๊ฐ๋ฌผ ์๋ฅผ ๊ฑท๋ ๊ณ ์์ด์ ๋ชจ์ต์ ๋์ฑ ๊ฐ๋ ฌํ๊ฒ ๊ฐ์กฐํ๊ธฐ ์ํด ์๋ฌผ๊ฒฐ์ ํตํด ๋ ๋ํ
์ผํ ์ฅ๋ฉด์ ๋ณด์ฌ์ค๋๋ค.<|eot_id|>
```
|