metadata
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
language:
- en
- am
- ar
- as
- az
- be
- bg
- bn
- br
- bs
- ca
- cs
- cy
- da
- de
- el
- eo
- es
- et
- eu
- fa
- ff
- fi
- fr
- fy
- ga
- gd
- gl
- gn
- gu
- ha
- he
- hi
- hr
- ht
- hu
- hy
- id
- ig
- is
- it
- ja
- jv
- ka
- kk
- km
- kn
- ko
- ku
- ky
- la
- lg
- li
- ln
- lo
- lt
- lv
- mg
- mk
- ml
- mn
- mr
- ms
- my
- ne
- nl
- 'no'
- ns
- om
- or
- pa
- pl
- ps
- pt
- qu
- rm
- ro
- ru
- sa
- si
- sc
- sd
- sk
- sl
- so
- sq
- sr
- ss
- su
- sv
- sw
- ta
- te
- th
- tl
- tn
- tr
- ug
- uk
- ur
- uz
- vi
- wo
- xh
- yi
- yo
- zu
datasets:
- Replete-AI/Everything_Instruct_Multilingual
- HuggingFaceH4/ultrachat_200k
- HuggingFaceH4/no_robots
- datatab/ultrachat_200k_serbian
- datatab/ultrafeedback_binarized_serbian
- datatab/alpaca-cleaned-serbian-full
- datatab/orca_math_world_problem_200k_serbian
- datatab/open-orca-slim-serbian
tags:
- litgpt
- litdata
tangled-llama-33m-32k-instruct-v0.1
A pretrained language model based on the Llama model with about 33M parameters. This model has been trained on 4.2B (4,252,334,823
) tokens from more than 6.2M (6,271,145
) dataset rows.
This model isn't designed for immediate use but rather for Continued Pretraining and Finetuning on a downstream task. While it can handle a context length of up to 32K (32,768
) tokens, it was pretrained with sequences of 32K (32768
) tokens.
The objective is to streamline the cognitive or reasoning core, eliminating any redundant knowledge from the model.
lm-evaluation-harness
litgpt evaluate --tasks 'leaderboard' --out_dir 'evaluate-0/' --batch_size 4 --dtype 'bfloat16' out/contrain/final/
litgpt evaluate --tasks 'hellaswag,gsm8k,truthfulqa_mc2,mmlu,winogrande,arc_challenge' --out_dir 'evaluate-1/' --batch_size 4 --dtype 'bfloat16' out/contrain/final/