llasa gguf?

#1
by supercharge19 - opened

how to use llasa?

Koboldcpp org

wut

@concedo gguf model is: NikolayKozloff/Llasa-1B-Q8_0-GGUF
this is full model: https://huggingface.co./HKUSTAudio/Llasa-1B

problem with full model is that it requires python3.9 and other dependencies which consume too much RAM/VRAM and disk space. Also, I just could not run it with my current GPU (old AMD RX 480 or old Nvidia GTX 1050). So, I thought of using gguf, however, using straight guff did not work with llamacpp. But there is option to use text to speech models with kobold cpp, but that require specific models. So, I thought perhaps this text to speech (or voice clone) model can be used with koboldcpp, but I could not find a way to do that. So, could you please extent support for this model?

Koboldcpp org

xcodec2 decoding is currently not supported, someone would need to implement the decoder. Currently, only WavTokenizer (used by OuteTTS) is supported, you can try that. You can try that here: https://huggingface.co./koboldcpp/tts/tree/main

Thank you for response, how about minicpm?
gguf: openbmb/MiniCPM-o-2_6-gguf
full model: https://huggingface.co./openbmb/MiniCPM-o-2_6

MiniCPM is supported by llamacpp. How can that be used?

Yes, MiniCPM can be used. Grab the minicpm GGUF model here https://huggingface.co./openbmb/MiniCPM-V-2_6-gguf/tree/main , and load that as the main model. Then grab the vision projector for minicpm .mmproj here : https://huggingface.co./koboldcpp/mmproj/tree/main and load that in with --mmproj or by selecting the Vision MMPROJ under the Model Files tab.

Thank you for quick response, also good to know that you are a hardworking person (people if more) for working on sunday. Anyway, above model is vision model, so, is it possible to use omni (all modalities, like audio/text/vision simultaneously) for example, ask the model read content of image (containing pictures of two characters, male and female, when it comes to read utterance of female then female voice is synthesize, similarly for male) in various voices? For this: https://huggingface.co./openbmb/MiniCPM-o-2_6-gguf model will be used not just vision model. So, if it is possible, can you verify and guide how?

Koboldcpp org

Yes. Each modality requires different model files, you can get a quick start at https://github.com/LostRuins/koboldcpp/wiki#getting-an-ai-model-file

Sign up or log in to comment