A llama.c model based on Karpathy's Llama2.c project. https://github.com/karpathy/llama2.c
Vocab of 4096, trained on Tinystories, and my custom littlestories dataset (currently unreleased.)
Model uses ↨ as a shift key, instead of using capial letters, this allowed simplification of the tokenizer to avoid duplicates that are uppercase.
To convert normal text to the right format I use:
def add_caseifer(text):
# Using list comprehension for more efficient concatenation
return ''.join(['↨' + char.lower() if char.isupper() else char for char in text])
To return the text to human format I use:
def remove_caseifer(text):
new_text = ""
i = 0
while i < len(text):
if text[i] == "↨":
if i+1 < len(text):
new_text += text[i+1].upper()
i += 1
else:
pass # skip this index
else:
new_text += text[i]
i += 1
return new_text
- Downloads last month
- 162
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.