--- language: - ko - en license: cc-by-nc-sa-4.0 library_name: transformers --- # Llama-3-KoEn-8B-xtuner-llava-preview ๐ŸŒ‹ Llama-3-KoEn-8B-xtuner-llava-preview ๐ŸŒ‹ is Korean based MutliModal based on Llava architecture, merged with [ChatVector](https://arxiv.org/abs/2310.04799) methods leveraging 2 models: 1) [beomi/Llama-3-KoEn-8B-preview](https://huggingface.co./beomi/Llama-3-KoEn-8B-preview), 2) [xtuner/llava-llama-3-8b-transformers](https://huggingface.co./xtuner/llava-llama-3-8b-transformers) ## Model Details ### Model Description - **Developed by:** Junbum Lee (Beomi) - **Model type:** HuggingFace Llava ๐ŸŒ‹ - **Language(s) (NLP):** Korean, English - **License:** cc-by-nc-sa-4.0 under Llama3 License - **Merged from model:** [beomi/Llama-3-KoEn-8B-preview](https://huggingface.co./beomi/Llama-3-KoEn-8B-preview) / [xtuner/llava-llama-3-8b-transformers](https://huggingface.co./xtuner/llava-llama-3-8b-transformers) ### Direct Use ![Cat walking on frozen Han-River, Seoul](https://cdn-uploads.huggingface.co/production/uploads/5e56829137cb5b49818287ea/NWfoArWI4UPAxpEnolkwT.jpeg) ```python import requests from PIL import Image import torch from transformers import AutoProcessor, LlavaForConditionalGeneration model_id = "beomi/Llama-3-KoEn-8B-xtuner-llava-preview" model = LlavaForConditionalGeneration.from_pretrained( model_id, torch_dtype='auto', device_map='auto', ) processor = AutoProcessor.from_pretrained(model_id) from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('./llava-llama-3-KoEn-8b-v1_1-transformers') terminators = [ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>") ] prompt = ("<|start_header_id|>user<|end_header_id|>\n\n\n์ด ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ ์„ค๋ช…ํ•ด์ฃผ์„ธ์š”.<|eot_id|>" "<|start_header_id|>assistant<|end_header_id|>\n\n์ด ์ด๋ฏธ์ง€์—๋Š”") image_file = "https://cdn-uploads.huggingface.co/production/uploads/5e56829137cb5b49818287ea/NWfoArWI4UPAxpEnolkwT.jpeg" raw_image = Image.open(requests.get(image_file, stream=True).raw) inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16) output = model.generate(**inputs, max_new_tokens=400, do_sample=True, eos_token_id=terminators,) print(processor.decode(output[0][2:], skip_special_tokens=False)) # --- Example Output --- user<|end_header_id|> ์ด ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ ์„ค๋ช…ํ•ด์ฃผ์„ธ์š”.<|eot_id|><|start_header_id|>assistant<|end_header_id|> ์ด ์ด๋ฏธ์ง€์—๋Š” ๊ณ ์–‘์ด ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๊ฐ•๋ฌผ ์œ„๋ฅผ ๊ฑธ์–ด๊ฐ€๋Š” ๋ชจ์Šต์ด ๋ณด์—ฌ์ง‘๋‹ˆ๋‹ค. ๊ณ ์–‘์ด๋Š” ๊ฐ•๋ฌผ์˜ ์ž”๋ฌผ๊ฒฐ์— ๋ฏธ๋„๋Ÿผ์„ ํƒ€๊ณ  ๊ฐ• ๊ฐ€๋กœ๋ฅผ ์ง€๋‚˜๋Š” ๋ฐ ๋Šฅ์ˆ™ํ•˜๊ฒŒ ๋ณด์ž…๋‹ˆ๋‹ค. ๊ณ ์–‘์ด์˜ ๋ฐœ์€ ๊ฐ•๋ฌผ๋กœ ์ž˜ ๋“ค์–ด๊ฐ€, ๊ทธ๊ฒƒ์„ ์ฆ๊ธฐ๋ฉฐ ๊ฑธ์–ด๊ฐ‘๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด ์ด๋ฏธ์ง€๋„ ์Œ์„ฑ ๋…น์Œ์„ ํ•˜๊ฑฐ๋‚˜ ๋…นํ™”๋œ ์ž๋ฃŒ๋กœ ์ œ์ž‘๋˜์—ˆ์œผ๋ฉฐ, ์ฃผ๋กœ ๊ณ ์–‘์ด์˜ ๋ชจ์Šต์„ ๊ฐ•ํ•˜๊ฒŒ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์†Œ๋ฆฌ ํšจ๊ณผ๋„ ์—ฌ๋Ÿฌ ๊ฐ€์ง€๋กœ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ณ ์–‘์ด์˜ ์Šคํ† ๋ฆฌ๋ฅผ ๋‹ค์–‘ํ•˜๊ฒŒ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ๊ฐ•๋ฌผ์€ ์ž”๋ฌผ๊ฒฐ์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ ๊ฐ•๋ฌผ ์œ„๋ฅผ ๊ฑท๋Š” ๊ณ ์–‘์ด์˜ ๋ชจ์Šต์„ ๋”์šฑ ๊ฐ•๋ ฌํ•˜๊ฒŒ ๊ฐ•์กฐํ•˜๊ธฐ ์œ„ํ•ด ์ž”๋ฌผ๊ฒฐ์„ ํ†ตํ•ด ๋” ๋””ํ…Œ์ผํ•œ ์žฅ๋ฉด์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.<|eot_id|> ```