A llama.c model based on Karpathy's Llama2.c project. https://github.com/karpathy/llama2.c

Vocab of 4096, trained on Tinystories, and my custom littlestories dataset (currently unreleased.)

This version was further trained on following instructions... somewhat... using https://github.com/mlabonne/llm-course/blob/main/Fine_tune_Llama_2_in_Google_Colab.ipynb

Model uses ↨ as a shift key, instead of using capial letters, this allowed simplification of the tokenizer to avoid duplicates that are uppercase.

To convert normal text to the right format I use:

def add_caseifer(text):
    # Using list comprehension for more efficient concatenation
    return ''.join(['↨' + char.lower() if char.isupper() else char for char in text])

To return the text to human format I use:

def remove_caseifer(text):
    new_text = ""
    i = 0
    while i < len(text):
        if text[i] == "↨":
            if i+1 < len(text):
                new_text += text[i+1].upper()
                i += 1
            else:
                pass  # skip this index
        else:
            new_text += text[i]
        i += 1
    return new_text
Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Corianas/TinyTask-minipaca

Merges
3 models