Really dumb qustion
How the heck do I upload images in SillyTavern? This is the first time I have used a vision model and I can't find anything useful on how to do it exactly?? I was able to find this and upload a image, but I don't think it's being passed to the model at all since I check what Kobold sees and it shows it with the image removed and the model says something completely unrelated
Seems like it should maybe be here?
I don't see it
How the heck do I upload images in SillyTavern? This is the first time I have used a vision model and I can't find anything useful on how to do it exactly?? I was able to find this and upload a image, but I don't think it's being passed to the model at all since I check what Kobold sees and it shows it with the image removed and the model says something completely unrelated
Seems like it should maybe be here?
I don't see it
Generate caption, make sure your caption source is set to kcpp.
Generate caption isn't inline? Is there no way to do it inline in text completion mode?
Generate caption isn't inline? Is there no way to do it inline in text completion mode?
I dont handle how st does css elements for chat, i make models chief.
That isn't even what I was talking about but sure... That just captions the image and sends it to the model, a lot of information is likely lost I wasn't talking about CSS at all
That isn't even what I was stalking about but sure... That just captions the image and sends it to the model, a lot of information is likely lost I wasn't talking about CSS at all
Usually when people talk about inline elements regarding ST the first thing i think of is how the chat displays.
The image portion of the model is inside the llava projector, and upcasted into the matrix of the main model during inference via text completions with kcpp/lcpp. (this is why you need the projector)
There are no multimodal tensors in the hathor weights uploaded here.
Whatever I give up lol