General feedback and discussion.
General feedback and discussion.
You can also ask any questions you feel like here.
Can I use the model and its vision capabilities using Oobabooga as a back end?
Will there be an exl2 quant?
@Leaf45 I only use KoboldCpp, I am not sure how Ooba would handle this.
@Nitral-AI might know more about this?
@mjh657 I only use and make GGUF quants at the moment. Maybe you can request it from someone like @LoneStriker who might pick them up.
@mjh657 @Lewdiculous Until we have support for projecting clip into the matrixes with exl2 the vision aspect will not work. I can otherwise do 5bpw exl2, or gptq (without vision)
is a 8bpw version possible?
@Nitral-AI - Convince me to migrate to EXL2.
;'3
Not until vision is added (clip projection into matrixes), im using kcpp myself at the moment. lmao
I am a filthy EXL2 hater/doomer, I need to stay strong.
@Virt-io , seeing that that would actually make me aroused. Amen.
I'll leave it to the brave explorers before me to, um, beta test these.
- sources:
- model: NousResearch/Yarn-Mistral-7b-128k
layer_range: [0, 32]
- model: Nitral-AI/Eris_PrimeV3.05-Vision-7B
layer_range: [0, 32]
merge_method: slerp
base_model: NousResearch/Yarn-Mistral-7b-128k
parameters:
t:
- filter: self_attn
value: [0.25, 0.25, 0.25, 0.25, 0.25]
- filter: mlp
value: [0.75, 0.75, 0.75, 0.75, 0.75]
- value: 0.5
dtype: bfloat16``` trying it as a single 25/75 slerp for science
Alright.
@Nitral-AI I'm surprised you're getting it to work, I only get errors using their convert.py.
Traceback (most recent call last):
File "D:\sillytavern\exllamav2\exllamav2-0.0.16\convert.py", line 1, in <module>
from exllamav2 import ExLlamaV2, ExLlamaV2Config, ExLlamaV2Tokenizer
File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\__init__.py", line 3, in <module>
from exllamav2.model import ExLlamaV2
File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\model.py", line 23, in <module>
from exllamav2.config import ExLlamaV2Config
File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\config.py", line 2, in <module>
from exllamav2.fasttensors import STFile
File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\fasttensors.py", line 5, in <module>
from exllamav2.ext import exllamav2_ext as ext_c
File "D:\sillytavern\exllamav2\exllamav2-0.0.16\exllamav2\ext.py", line 153, in <module>
exllamav2_ext = load \
^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 1306, in load
return _jit_compile(
^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 1710, in _jit_compile
_write_ninja_file_and_build_library(
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 1800, in _write_ninja_file_and_build_library
extra_ldflags = _prepare_ldflags(
^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 1887, in _prepare_ldflags
extra_ldflags.append(f'/LIBPATH:{_join_cuda_home("lib", "x64")}')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\cpp_extension.py", line 2407, in _join_cuda_home
raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
Tried reinstalling Torch and no luck.
It cant find the nvidia toolkit, possibly a PATH issue? "raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root."
Nitral-AI/Nitral-AI-Eris_PrimeV3.05-Vision-7B-8bpw-exl2 did this up last night for @mjh657 without issue.
How does this Vision thing work exactly? Can I just load it normally in Koboldccp? How do I send photos in Koboldccp and is it also for making pictures with something like stable diffusion so I don't have to write the prompts?
Loading the model and functionality example (click the sections to expand):
https://huggingface.co./Lewdiculous/Eris_PrimeV3-Vision-7B-GGUF-IQ-Imatrix#visionmultimodal-capabilities
How does this Vision thing work exactly? Can I just load it normally in Koboldccp?
How do I send photos in Koboldccp and is it also for making pictures with something like stable diffusion so I don't have to write the prompts?
This is only for it to 'see' and caption images you send in chat. I don't use the Kobold interface, but on SillyTavern you just need to configure Image Captioning as the screenshot in the README, and you can use the hamburger menu in the bottom left corner to add an image for Captioning.
Ok, I have Silly Tavern installed but I don't understand how to connect Koboldccp with it. I can send pics with the Koboldccp UI and also generate with img2text models now but I when I add the standard port in Silly Tavern it doesn't seem to connect. In Koboldccp the vision capability is pretty underwhelming though. I asked: "What's on the picture?" , it was a google image of an apple and it talked about my smile and the lightning on my face, etc..
What am I doing wrong? I loaded the Llava mmproj and Eris Prime Vision in Koboldccp and tried to copy your settings but it doesn't work.
Btw is there a way see to already see the beginning of the message while it's still being generated like in LM Studio or Faraday? Because I like it better to already start reading and not having to wait until the complete answer is generated. I hope you know what I mean. And thanks for taking the time to explain it to me :). I always picked KoboldAIClassic and not Text Completion in the API settings that's why it never worked.
Time to make a whole wiki for multimodal usage, haha.
Yes, this thread is already really helpful so maybe it's not a bad idea
@lewdiculous
I tried to find something that would help me but haven't found anything especially not about multimodal models.
I'm also interested in converting models when they're only available in safetensor or pytorch etc. when there isn't already a gguf uploaded and maybe even merging models. So if there are any good tutorials out there I would appreciate the links. I have an i5 12500h with 64GB RAM but only a 3050RTX with 4GB VRAM so I need gguf files if I want to run a model and I don't know how to run safetensor or pytorch anyway so it would be good to know how to convert.
Converting to GGUF is pretty simple... If you want to make GGUF-Imatrix quants my script should work very well for you.
https://huggingface.co./FantasiaFoundry/GGUF-Quantization-Script
Change -ngu in line 120 to 0, this will load the model only on RAM, it will be fine for imatrix data generation.
Adjust the list of quants in line 133 to the ones you need.
The manual way is downloading the latest cuda dll and cuda win binaries from the llama.cpp releases page and using the convert.py script.
For regular non imatrix GGUF quants, you can follow this tutorial:
A wiki for KCPP Multimodal would be helpful to the new users's for sure. Which reminds me, i really should do some tutorials about merging so i can just link them that instead of info dumping in these threads lmao.
instead of info dumping in these threads
It happens more than one would think, haha. Look, maybe you'll even get me merging if you do. Mistral 0.2 watching silently...