Lewdiculous/Eris_PrimeV3-Vision-7B-GGUF-IQ-Imatrix · General feedback and discussion.

Lewdiculous

Owner Mar 21

•

edited Mar 21

General feedback and discussion.

You can also ask any questions you feel like here.

Lewdiculous pinned discussion Mar 21

Leaf45

Mar 22

Can I use the model and its vision capabilities using Oobabooga as a back end?

mjh657

Mar 22

Will there be an exl2 quant?

Lewdiculous

Owner Mar 22

@Leaf45 I only use KoboldCpp, I am not sure how Ooba would handle this.

@Nitral-AI might know more about this?

Lewdiculous

Owner Mar 22

@mjh657 I only use and make GGUF quants at the moment. Maybe you can request it from someone like @LoneStriker who might pick them up.

Nitral-AI

Mar 22

•

edited Mar 22

@mjh657 @Lewdiculous Until we have support for projecting clip into the matrixes with exl2 the vision aspect will not work. I can otherwise do 5bpw exl2, or gptq (without vision)

mjh657

Mar 22

is a 8bpw version possible?

Nitral-AI

Mar 22

@mjh657 it wouldnt have vision but yes.

Lewdiculous

Owner Mar 23

•

edited Mar 23

@Nitral-AI - Convince me to migrate to EXL2.

;'3

Nitral-AI

Mar 23

•

edited Mar 23

Not until vision is added (clip projection into matrixes), im using kcpp myself at the moment. lmao

Nitral-AI

Mar 23

@mjh657 I will upload this weekend if thats ok?

Lewdiculous

Owner Mar 23

I am a filthy EXL2 hater/doomer, I need to stay strong.

Virt-io

Mar 23

•

edited Mar 23

I want this on koboldcpp

Then we can be EXL2 haters with pride.

Lewdiculous

Owner Mar 23

•

edited Mar 23

@Virt-io , seeing that that would actually make me aroused. Amen.

I'll leave it to the brave explorers before me to, um, beta test these.

Nitral-AI

Mar 23

•

edited Mar 23

@Lewdiculous

  - sources:
      - model: NousResearch/Yarn-Mistral-7b-128k
        layer_range: [0, 32]
      - model: Nitral-AI/Eris_PrimeV3.05-Vision-7B
        layer_range: [0, 32]
merge_method: slerp
base_model: NousResearch/Yarn-Mistral-7b-128k
parameters:
  t:
    - filter: self_attn
      value: [0.25, 0.25, 0.25, 0.25, 0.25]
    - filter: mlp
      value: [0.75, 0.75, 0.75, 0.75, 0.75]
    - value: 0.5
dtype: bfloat16``` trying it as a single 25/75 slerp for science

Lewdiculous

Owner Mar 23

Alright.

mjh657

Mar 23

@Nitral-AI You are the best! How do you make quantizations?

Nitral-AI

Mar 23

@mjh657 I use the cloned exllama2 repo via git.

Lewdiculous

Owner Mar 23

•

edited Mar 23

@Nitral-AI I'm surprised you're getting it to work, I only get errors using their convert.py.

Traceback (most recent call last):
  File "D:\sillytavern\exllamav2\exllamav2-0.0.16\convert.py", line 1, in <module>
    from exllamav2 import ExLlamaV2, ExLlamaV2Config, ExLlamaV2Tokenizer
  File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\__init__.py", line 3, in <module>
    from exllamav2.model import ExLlamaV2
  File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\model.py", line 23, in <module>
    from exllamav2.config import ExLlamaV2Config
  File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\config.py", line 2, in <module>
    from exllamav2.fasttensors import STFile
  File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\fasttensors.py", line 5, in <module>
    from exllamav2.ext import exllamav2_ext as ext_c
  File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\ext.py", line 153, in <module>
    exllamav2_ext = load \
                    ^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 1306, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 1710, in _jit_compile
    _write_ninja_file_and_build_library(
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 1800, in _write_ninja_file_and_build_library
    extra_ldflags = _prepare_ldflags(
                    ^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 1887, in _prepare_ldflags
    extra_ldflags.append(f'/LIBPATH:{_join_cuda_home("lib", "x64")}')
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 2407, in _join_cuda_home
    raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Tried reinstalling Torch and no luck.

Nitral-AI

Mar 23

•

edited Mar 23

It cant find the nvidia toolkit, possibly a PATH issue? "raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root."

Nitral-AI

Mar 23

•

edited Mar 23

Nitral-AI/Nitral-AI-Eris_PrimeV3.05-Vision-7B-8bpw-exl2 did this up last night for @mjh657 without issue.

WesPro

Mar 26

How does this Vision thing work exactly? Can I just load it normally in Koboldccp? How do I send photos in Koboldccp and is it also for making pictures with something like stable diffusion so I don't have to write the prompts?

Lewdiculous

Owner Mar 26

•

edited Mar 26

@WesPro

Loading the model and functionality example (click the sections to expand):
https://huggingface.co./Lewdiculous/Eris_PrimeV3-Vision-7B-GGUF-IQ-Imatrix#visionmultimodal-capabilities

How does this Vision thing work exactly? Can I just load it normally in Koboldccp?
How do I send photos in Koboldccp and is it also for making pictures with something like stable diffusion so I don't have to write the prompts?

This is only for it to 'see' and caption images you send in chat. I don't use the Kobold interface, but on SillyTavern you just need to configure Image Captioning as the screenshot in the README, and you can use the hamburger menu in the bottom left corner to add an image for Captioning.

Nitral-AI

Mar 26

•

edited Mar 26

Yes you can use multimodal captioning via kcpp in ST to make sd prompts.

WesPro

Mar 26

•

edited Mar 26

Ok, I have Silly Tavern installed but I don't understand how to connect Koboldccp with it. I can send pics with the Koboldccp UI and also generate with img2text models now but I when I add the standard port in Silly Tavern it doesn't seem to connect. In Koboldccp the vision capability is pretty underwhelming though. I asked: "What's on the picture?" , it was a google image of an apple and it talked about my smile and the lightning on my face, etc..

Nitral-AI

Mar 26

•

edited Mar 26

This is how you connect kcpp to ST:

Did you make sure to select the llava projector included in the repo when loading kcpp? Because it sounds like the model didnt caption the image properly, so the lm just stuck with context in chat.

Nitral-AI

Mar 26

•

edited Mar 26

Examples:

WesPro

Mar 26

•

edited Mar 26

What am I doing wrong? I loaded the Llava mmproj and Eris Prime Vision in Koboldccp and tried to copy your settings but it doesn't work.

Btw is there a way see to already see the beginning of the message while it's still being generated like in LM Studio or Faraday? Because I like it better to already start reading and not having to wait until the complete answer is generated. I hope you know what I mean. And thanks for taking the time to explain it to me :). I always picked KoboldAIClassic and not Text Completion in the API settings that's why it never worked.

Nitral-AI

Mar 26

•

edited Mar 26

Generate caption, not just attach file.

text-completetion-presets: streaming mode for text as its generated.

WesPro

Mar 26

Thanks for your help I really appreciate it.

Lewdiculous

Owner Mar 26

Time to make a whole wiki for multimodal usage, haha.

WesPro

Mar 26

Yes, this thread is already really helpful so maybe it's not a bad idea @lewdiculous
I tried to find something that would help me but haven't found anything especially not about multimodal models.
I'm also interested in converting models when they're only available in safetensor or pytorch etc. when there isn't already a gguf uploaded and maybe even merging models. So if there are any good tutorials out there I would appreciate the links. I have an i5 12500h with 64GB RAM but only a 3050RTX with 4GB VRAM so I need gguf files if I want to run a model and I don't know how to run safetensor or pytorch anyway so it would be good to know how to convert.

Lewdiculous

Owner Mar 26

•

edited Mar 26

Converting to GGUF is pretty simple... If you want to make GGUF-Imatrix quants my script should work very well for you.

https://huggingface.co./FantasiaFoundry/GGUF-Quantization-Script

Change -ngu in line 120 to 0, this will load the model only on RAM, it will be fine for imatrix data generation.

Adjust the list of quants in line 133 to the ones you need.

The manual way is downloading the latest cuda dll and cuda win binaries from the llama.cpp releases page and using the convert.py script.

For regular non imatrix GGUF quants, you can follow this tutorial:

https://github.com/ggerganov/llama.cpp/discussions/2948

Nitral-AI

Mar 26

A wiki for KCPP Multimodal would be helpful to the new users's for sure. Which reminds me, i really should do some tutorials about merging so i can just link them that instead of info dumping in these threads lmao.

Lewdiculous

Owner Mar 26

•

edited Mar 26

instead of info dumping in these threads

It happens more than one would think, haha. Look, maybe you'll even get me merging if you do. Mistral 0.2 watching silently...