llasa gguf?

by supercharge19 - opened 23 days ago

Discussion

supercharge19

23 days ago

how to use llasa?

concedo

Koboldcpp org 21 days ago

wut

supercharge19

20 days ago

@concedo gguf model is: NikolayKozloff/Llasa-1B-Q8_0-GGUF
this is full model: https://huggingface.co./HKUSTAudio/Llasa-1B

problem with full model is that it requires python3.9 and other dependencies which consume too much RAM/VRAM and disk space. Also, I just could not run it with my current GPU (old AMD RX 480 or old Nvidia GTX 1050). So, I thought of using gguf, however, using straight guff did not work with llamacpp. But there is option to use text to speech models with kobold cpp, but that require specific models. So, I thought perhaps this text to speech (or voice clone) model can be used with koboldcpp, but I could not find a way to do that. So, could you please extent support for this model?

concedo

Koboldcpp org 20 days ago

xcodec2 decoding is currently not supported, someone would need to implement the decoder. Currently, only WavTokenizer (used by OuteTTS) is supported, you can try that. You can try that here: https://huggingface.co./koboldcpp/tts/tree/main

supercharge19

20 days ago

Thank you for response, how about minicpm?
gguf: openbmb/MiniCPM-o-2_6-gguf
full model: https://huggingface.co./openbmb/MiniCPM-o-2_6

MiniCPM is supported by llamacpp. How can that be used?

concedo

Koboldcpp org 20 days ago

•

edited 20 days ago

Yes, MiniCPM can be used. Grab the minicpm GGUF model here https://huggingface.co./openbmb/MiniCPM-V-2_6-gguf/tree/main , and load that as the main model. Then grab the vision projector for minicpm .mmproj here : https://huggingface.co./koboldcpp/mmproj/tree/main and load that in with --mmproj or by selecting the Vision MMPROJ under the Model Files tab.

supercharge19

19 days ago

Thank you for quick response, also good to know that you are a hardworking person (people if more) for working on sunday. Anyway, above model is vision model, so, is it possible to use omni (all modalities, like audio/text/vision simultaneously) for example, ask the model read content of image (containing pictures of two characters, male and female, when it comes to read utterance of female then female voice is synthesize, similarly for male) in various voices? For this: https://huggingface.co./openbmb/MiniCPM-o-2_6-gguf model will be used not just vision model. So, if it is possible, can you verify and guide how?

supercharge19

19 days ago

Thank you for the link: https://huggingface.co./koboldcpp/mmproj/tree/main

concedo

Koboldcpp org 19 days ago

Yes. Each modality requires different model files, you can get a quick start at https://github.com/LostRuins/koboldcpp/wiki#getting-an-ai-model-file

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment