Really dumb qustion

by nonetrix - opened Jun 14

Jun 14

•

How the heck do I upload images in SillyTavern? This is the first time I have used a vision model and I can't find anything useful on how to do it exactly?? I was able to find this and upload a image, but I don't think it's being passed to the model at all since I check what Kobold sees and it shows it with the image removed and the model says something completely unrelated

Seems like it should maybe be here?

I don't see it

Nitral-AI

The Chaotic Neutrals org Jun 14

How the heck do I upload images in SillyTavern? This is the first time I have used a vision model and I can't find anything useful on how to do it exactly?? I was able to find this and upload a image, but I don't think it's being passed to the model at all since I check what Kobold sees and it shows it with the image removed and the model says something completely unrelated

Seems like it should maybe be here?

I don't see it

Generate caption, make sure your caption source is set to kcpp.

nonetrix

Jun 14

This comment has been hidden

nonetrix

Jun 14

Generate caption isn't inline? Is there no way to do it inline in text completion mode?

Nitral-AI

The Chaotic Neutrals org Jun 14

•

edited Jun 14

Generate caption isn't inline? Is there no way to do it inline in text completion mode?

I dont handle how st does css elements for chat, i make models chief.

nonetrix

Jun 14

•

edited Jun 14

That isn't even what I was talking about but sure... That just captions the image and sends it to the model, a lot of information is likely lost I wasn't talking about CSS at all

Nitral-AI

The Chaotic Neutrals org Jun 14

•

edited Jun 14

That isn't even what I was stalking about but sure... That just captions the image and sends it to the model, a lot of information is likely lost I wasn't talking about CSS at all

Usually when people talk about inline elements regarding ST the first thing i think of is how the chat displays.

The image portion of the model is inside the llava projector, and upcasted into the matrix of the main model during inference via text completions with kcpp/lcpp. (this is why you need the projector)

There are no multimodal tensors in the hathor weights uploaded here.

nonetrix

Jun 14

•

edited Jun 14

Whatever I give up lol

nonetrix changed discussion status to closed Jun 14

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment