beomi's picture
Update README.md
c4d83c0 verified
|
raw
history blame
4.32 kB
metadata
language:
  - ko
  - en
license: cc-by-nc-sa-4.0
library_name: transformers

Llama-3-KoEn-8B-xtuner-llava-preview πŸŒ‹

Llama-3-KoEn-8B-xtuner-llava-preview πŸŒ‹ is Korean based MutliModal based on Llava architecture, merged with ChatVector methods leveraging 2 models:

  1. beomi/Llama-3-KoEn-8B-preview,
  2. xtuner/llava-llama-3-8b-transformers

Model Details

Model Description

Direct Use

Cat walking on frozen Han-River, Seoul

Two version recommended

v1. revision='a38aac3': Basic ChatVector v2. revision='4f04d1e': Model diff based merging(ref. https://huggingface.co./blog/maywell/llm-feature-transfer)

import requests
from PIL import Image

import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration

model_id = "beomi/Llama-3-KoEn-8B-xtuner-llava-preview"

model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype='auto', 
    device_map='auto',
    revision='a38aac3', # 'a38aac3' for basic ChatVector, '4f04d1e' for Model diff based merging(ref. https://huggingface.co./blog/maywell/llm-feature-transfer)
)

processor = AutoProcessor.from_pretrained(model_id)

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('./llava-llama-3-KoEn-8b-v1_1-transformers')
terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

prompt = ("<|start_header_id|>user<|end_header_id|>\n\n<image>\n이 이미지에 λŒ€ν•΄μ„œ μ„€λͺ…ν•΄μ£Όμ„Έμš”.<|eot_id|>"
          "<|start_header_id|>assistant<|end_header_id|>\n\n이 μ΄λ―Έμ§€μ—λŠ”")
image_file = "https://cdn-uploads.huggingface.co/production/uploads/5e56829137cb5b49818287ea/NWfoArWI4UPAxpEnolkwT.jpeg"

raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)

output = model.generate(**inputs, max_new_tokens=400, do_sample=True, eos_token_id=terminators,)
print(processor.decode(output[0][2:], skip_special_tokens=False))

# --- Example Output [Chat Vector] ---
user<|end_header_id|>

<image>
이 이미지에 λŒ€ν•΄μ„œ μ„€λͺ…ν•΄μ£Όμ„Έμš”.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

이 μ΄λ―Έμ§€μ—λŠ” 고양이 ν•œ λ§ˆλ¦¬κ°€ κ°•λ¬Ό μœ„λ₯Ό κ±Έμ–΄κ°€λŠ” λͺ¨μŠ΅μ΄ λ³΄μ—¬μ§‘λ‹ˆλ‹€. κ³ μ–‘μ΄λŠ” κ°•λ¬Όμ˜ μž”λ¬Όκ²°μ— λ―Έλ„λŸΌμ„ 타고 κ°• κ°€λ‘œλ₯Ό μ§€λ‚˜λŠ” 데 λŠ₯μˆ™ν•˜κ²Œ λ³΄μž…λ‹ˆλ‹€. κ³ μ–‘μ΄μ˜ λ°œμ€ κ°•λ¬Όλ‘œ 잘 λ“€μ–΄κ°€, 그것을 즐기며 κ±Έμ–΄κ°‘λ‹ˆλ‹€. 

λ˜ν•œ 이 이미지도 μŒμ„± λ…ΉμŒμ„ ν•˜κ±°λ‚˜ λ…Ήν™”λœ 자료둜 μ œμž‘λ˜μ—ˆμœΌλ©°, 주둜 κ³ μ–‘μ΄μ˜ λͺ¨μŠ΅μ„ κ°•ν•˜κ²Œ λ³΄μ—¬μ€λ‹ˆλ‹€. μ†Œλ¦¬ νš¨κ³Όλ„ μ—¬λŸ¬ κ°€μ§€λ‘œ μΆ”κ°€ν•˜μ—¬ κ³ μ–‘μ΄μ˜ μŠ€ν† λ¦¬λ₯Ό λ‹€μ–‘ν•˜κ²Œ μ „λ‹¬ν•©λ‹ˆλ‹€. 강물은 μž”λ¬Όκ²°μ„ λ‚˜νƒ€λ‚΄λ©° κ°•λ¬Ό μœ„λ₯Ό κ±·λŠ” κ³ μ–‘μ΄μ˜ λͺ¨μŠ΅μ„ λ”μš± κ°•λ ¬ν•˜κ²Œ κ°•μ‘°ν•˜κΈ° μœ„ν•΄ μž”λ¬Όκ²°μ„ 톡해 더 λ””ν…ŒμΌν•œ μž₯면을 λ³΄μ—¬μ€λ‹ˆλ‹€.<|eot_id|>

# --- Example Output [Model diff based merging] ---
user<|end_header_id|>

<image>
이 이미지에 λŒ€ν•΄μ„œ μ„€λͺ…ν•΄μ£Όμ„Έμš”.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

이 μ΄λ―Έμ§€μ—λŠ” ν•œκ΅­μ–΄ μžλ§‰κ³Ό ν•¨κ»˜ 고양이가 물에 λ°œμ„ λ””λ””κ³  κ±·λŠ” λͺ¨μŠ΅μ΄ 담겨 μžˆμŠ΅λ‹ˆλ‹€. κ³ μ–‘μ΄λŠ” 였λ₯Έμͺ½ λ°œμ„ 물에 λ‹΄κ·Έκ³  κ±·λŠ” 쀑이며, ν•œκ΅­μ–΄ μžλ§‰μ€ "κ³ μ–‘μ΄λŠ” 물을 μ’‹μ•„ν•©λ‹ˆλ‹€"λΌλŠ” λ¬Έμž₯을 ν¬ν•¨ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. 이 μžλ§‰μ€ 고양이가 물을 μ’‹μ•„ν•˜λŠ” 것을 κ°•μ‘°ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.<|eot_id|>