catallama/CataLlama-Chat · Apply for community grant: Personal project (gpu)

CataLlama org May 26, 2024

CataLlama is a fine-tune of Llama-3 8B on the Catalan language.

CataLlama was trained on roughly 445 million new tokens in three separate stages:

Language enhancement with raw text .
Supervised fine-tuning on instructions consisting of 70% Catalan Language and 30% English Language.
DPO fine-tuning on preferences consisting of 70% Catalan language and 30% English Language.

This is my first open-source model.

It is not intended to beat benchmarks, but to demonstrate techniques for augmenting LLMs on new languages and preserve rare languages as part of our world heritage.

I would love for people to be able to try it in the chat, but as a personal project, it's too big of a budget commitment to pay for the inference, considering also that the GPUs for training were already quite expensive.

I appreciate your time and decision regardless of what you will see more fit.

Thank you,
Laurentiu

hysts

May 26, 2024

Hi @laurentiubp , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.

laurentiubp

CataLlama org May 27, 2024

thanks @hysts that's awesome!