General feedback and discussion.

#1
by Lewdiculous - opened

General feedback and discussion.

You can also ask any questions you feel like here.

Lewdiculous pinned discussion

Can I use the model and its vision capabilities using Oobabooga as a back end?

Will there be an exl2 quant?

@Leaf45 I only use KoboldCpp, I am not sure how Ooba would handle this.

@Nitral-AI might know more about this?

@mjh657 I only use and make GGUF quants at the moment. Maybe you can request it from someone like @LoneStriker who might pick them up.

@mjh657 @Lewdiculous Until we have support for projecting clip into the matrixes with exl2 the vision aspect will not work. I can otherwise do 5bpw exl2, or gptq (without vision)

is a 8bpw version possible?

@mjh657 it wouldnt have vision but yes.

@Nitral-AI - Convince me to migrate to EXL2.

;'3

Not until vision is added (clip projection into matrixes), im using kcpp myself at the moment. lmao

@mjh657 I will upload this weekend if thats ok?

I am a filthy EXL2 hater/doomer, I need to stay strong.

I want this on koboldcpp

Then we can be EXL2 haters with pride.

@Virt-io , seeing that that would actually make me aroused. Amen.

I'll leave it to the brave explorers before me to, um, beta test these.

@Lewdiculous

  - sources:
      - model: NousResearch/Yarn-Mistral-7b-128k
        layer_range: [0, 32]
      - model: Nitral-AI/Eris_PrimeV3.05-Vision-7B
        layer_range: [0, 32]
merge_method: slerp
base_model: NousResearch/Yarn-Mistral-7b-128k
parameters:
  t:
    - filter: self_attn
      value: [0.25, 0.25, 0.25, 0.25, 0.25]
    - filter: mlp
      value: [0.75, 0.75, 0.75, 0.75, 0.75]
    - value: 0.5
dtype: bfloat16``` trying it as a single 25/75 slerp for science

Alright.

@Nitral-AI You are the best! How do you make quantizations?

@mjh657 I use the cloned exllama2 repo via git.

@Nitral-AI I'm surprised you're getting it to work, I only get errors using their convert.py.

Traceback (most recent call last):
  File "D:\sillytavern\exllamav2\exllamav2-0.0.16\convert.py", line 1, in <module>
    from exllamav2 import ExLlamaV2, ExLlamaV2Config, ExLlamaV2Tokenizer
  File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\__init__.py", line 3, in <module>
    from exllamav2.model import ExLlamaV2
  File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\model.py", line 23, in <module>
    from exllamav2.config import ExLlamaV2Config
  File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\config.py", line 2, in <module>
    from exllamav2.fasttensors import STFile
  File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\fasttensors.py", line 5, in <module>
    from exllamav2.ext import exllamav2_ext as ext_c
  File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\ext.py", line 153, in <module>
    exllamav2_ext = load \
                    ^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 1306, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 1710, in _jit_compile
    _write_ninja_file_and_build_library(
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 1800, in _write_ninja_file_and_build_library
    extra_ldflags = _prepare_ldflags(
                    ^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 1887, in _prepare_ldflags
    extra_ldflags.append(f'/LIBPATH:{_join_cuda_home("lib", "x64")}')
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 2407, in _join_cuda_home
    raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Tried reinstalling Torch and no luck.

It cant find the nvidia toolkit, possibly a PATH issue? "raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root."

Nitral-AI/Nitral-AI-Eris_PrimeV3.05-Vision-7B-8bpw-exl2 did this up last night for @mjh657 without issue.

How does this Vision thing work exactly? Can I just load it normally in Koboldccp? How do I send photos in Koboldccp and is it also for making pictures with something like stable diffusion so I don't have to write the prompts?

@WesPro

Loading the model and functionality example (click the sections to expand):
https://huggingface.co./Lewdiculous/Eris_PrimeV3-Vision-7B-GGUF-IQ-Imatrix#visionmultimodal-capabilities

How does this Vision thing work exactly? Can I just load it normally in Koboldccp?
How do I send photos in Koboldccp and is it also for making pictures with something like stable diffusion so I don't have to write the prompts?

This is only for it to 'see' and caption images you send in chat. I don't use the Kobold interface, but on SillyTavern you just need to configure Image Captioning as the screenshot in the README, and you can use the hamburger menu in the bottom left corner to add an image for Captioning.

image.png

Yes you can use multimodal captioning via kcpp in ST to make sd prompts.

Ok, I have Silly Tavern installed but I don't understand how to connect Koboldccp with it. I can send pics with the Koboldccp UI and also generate with img2text models now but I when I add the standard port in Silly Tavern it doesn't seem to connect. In Koboldccp the vision capability is pretty underwhelming though. I asked: "What's on the picture?" , it was a google image of an apple and it talked about my smile and the lightning on my face, etc..

This is how you connect kcpp to ST:
image.png

Did you make sure to select the llava projector included in the repo when loading kcpp? Because it sounds like the model didnt caption the image properly, so the lm just stuck with context in chat.

Examples:
image.png

image.png

What am I doing wrong? I loaded the Llava mmproj and Eris Prime Vision in Koboldccp and tried to copy your settings but it doesn't work.

image.png

image.png

image.png

Btw is there a way see to already see the beginning of the message while it's still being generated like in LM Studio or Faraday? Because I like it better to already start reading and not having to wait until the complete answer is generated. I hope you know what I mean. And thanks for taking the time to explain it to me :). I always picked KoboldAIClassic and not Text Completion in the API settings that's why it never worked.

Generate caption, not just attach file.
image.png

text-completetion-presets: streaming mode for text as its generated.
image.png

Thanks for your help I really appreciate it.

image.png

Time to make a whole wiki for multimodal usage, haha.

Yes, this thread is already really helpful so maybe it's not a bad idea @lewdiculous
I tried to find something that would help me but haven't found anything especially not about multimodal models.
I'm also interested in converting models when they're only available in safetensor or pytorch etc. when there isn't already a gguf uploaded and maybe even merging models. So if there are any good tutorials out there I would appreciate the links. I have an i5 12500h with 64GB RAM but only a 3050RTX with 4GB VRAM so I need gguf files if I want to run a model and I don't know how to run safetensor or pytorch anyway so it would be good to know how to convert.

Converting to GGUF is pretty simple... If you want to make GGUF-Imatrix quants my script should work very well for you.

https://huggingface.co./FantasiaFoundry/GGUF-Quantization-Script

Change -ngu in line 120 to 0, this will load the model only on RAM, it will be fine for imatrix data generation.

Adjust the list of quants in line 133 to the ones you need.

The manual way is downloading the latest cuda dll and cuda win binaries from the llama.cpp releases page and using the convert.py script.

For regular non imatrix GGUF quants, you can follow this tutorial:

https://github.com/ggerganov/llama.cpp/discussions/2948

A wiki for KCPP Multimodal would be helpful to the new users's for sure. Which reminds me, i really should do some tutorials about merging so i can just link them that instead of info dumping in these threads lmao.

instead of info dumping in these threads

It happens more than one would think, haha. Look, maybe you'll even get me merging if you do. Mistral 0.2 watching silently...

Sign up or log in to comment