Cool but i have a quick question please.

#1
by ctranslate2-4you - opened

When you say that it's not perfect, what do you mean exactly? I've been looking for this to be made for the longest time so thanks first of all...

Can it use bitsandbytes and "better transformer" just like the llava 1.5-hf models can? Here is a portion of a script of mine where you can see that I give users the option to use bitsandbytes depending on what they choose within my GUI, and I'd like to do the same thing with this model...as long as the issues you've mentioned can be fixed with a little code!

class loader_llava:
    def initialize_model_and_tokenizer(self, config):
        chosen_model = config['vision']['chosen_model']
        chosen_size = config['vision']['chosen_size']
        chosen_quant = config['vision']['chosen_quant']
        
        model_id = ""
        if chosen_model == 'llava' and chosen_size == '7b':
            model_id = "llava-hf/llava-1.5-7b-hf"
        elif chosen_model == 'bakllava':
            model_id = "llava-hf/bakLlava-v1-hf"
        elif chosen_model == 'llava' and chosen_size == '13b':
            model_id = "llava-hf/llava-1.5-13b-hf"

        print(f"Selected model: {chosen_model}")
        print(f"Selected size: {chosen_size}")
        print(f"Selected quant: {chosen_quant}")

        device = get_best_device()
        print(f"Using device: {device}")

        if chosen_model == 'llava' and chosen_quant == 'float16':
            model = LlavaForConditionalGeneration.from_pretrained(
                model_id,
                torch_dtype=torch.float16,
                low_cpu_mem_usage=True,
                resume_download=True
            ).to(device)
        elif chosen_model == 'llava' and chosen_quant == '8-bit':
            model = LlavaForConditionalGeneration.from_pretrained(
                model_id,
                torch_dtype=torch.float16,
                low_cpu_mem_usage=True,
                load_in_8bit=True,
                resume_download=True
            )
        elif chosen_model == 'llava' and chosen_quant == '4-bit':
            model = LlavaForConditionalGeneration.from_pretrained(
                model_id,
                torch_dtype=torch.float32,
                low_cpu_mem_usage=True,
                load_in_4bit=True,
                resume_download=True
            )
        elif chosen_model == 'bakllava' and chosen_quant == 'float16':
            model = LlavaForConditionalGeneration.from_pretrained(
                model_id,
                torch_dtype=torch.float16,
                low_cpu_mem_usage=True,
                resume_download=True
            ).to(device)
        elif chosen_model == 'bakllava' and chosen_quant == '8-bit':
            model = LlavaForConditionalGeneration.from_pretrained(
                model_id,
                torch_dtype=torch.float16,
                low_cpu_mem_usage=True,
                load_in_8bit=True,
                resume_download=True
            )
        elif chosen_model == 'bakllava' and chosen_quant == '4-bit':
            model = LlavaForConditionalGeneration.from_pretrained(
                model_id,
                torch_dtype=torch.float32,
                low_cpu_mem_usage=True,
                load_in_4bit=True,
                resume_download=True
            )

what i mean 'not perfect' is 3 reasons.

  1. as it written at READ.me, this model keep generate "\n"despite the completion of generation.
  2. in converting process, there is so many errors about tokenizer. i tried and fixed it, but i'm not sure it fixed perfectly.
  3. the results are not satisfactory compared to the llava GitHub version (repetitions, hallucinations, etc.).

and about quantization, "load_in_4bit=True" is working. if you run READ.me code with "load_in_4bit=True", it takes vram less than 24GB.

model_id = "PerRing/llava-v1.6-34b-hf"
model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=True, 
    load_in_4bit=True
)
processor = AutoProcessor.from_pretrained(model_id)

Thanks, I'll try that code snippet. Can you paste here any examples of the poor output...would hate to spend 2 hours writing a script for my larger program if it's too bad ya know? I have no idea why the quality would be bad...the "\n" issue seems like it could be addressed, but just the output being "not as good" is hard...Would you say that it's currently worst than llava 1.5 even?

Also, do you plan on doing other sizes besides the 34b version???

Are you affiliated with these guys by chance? If not, maybe you could look at their code base if they're willing to share to troubleshoot...

https://huggingface.co./llava-hf

Owner
This comment has been hidden

Sign up or log in to comment