tangled-alpha-0.3-core

time python -B prepare_core_datasets.py

i=0, min_len=0, max_len=1048576, block_size=2049, chunk_size=16392000, len(dataset)=3134311, len(dataset) * block_size=6422203239
Total number of tokens in the optimized dataset '../core-data-0-0-1048576-2049-8000' is 6422203239
i=1, min_len=2049, max_len=8193, block_size=8193, chunk_size=16386000, len(dataset)=179944, len(dataset) * block_size=1474281192
Total number of tokens in the optimized dataset '../core-data-1-2049-8193-8193-2000' is 1474281192
i=2, min_len=8193, max_len=1048577, block_size=32769, chunk_size=16384500, len(dataset)=48261, len(dataset) * block_size=1581464709
Total number of tokens in the optimized dataset '../core-data-2-8193-1048577-32769-500' is 1581464709

CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model-0.yaml

# ...

Backup wandb:

mv wandb wandb-pretrain-core

Chat with model:

CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core-0/final

CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core-0/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core-0/final'

# ...

tangledgroup
/

tangled-alpha-0.3-core

tangled-alpha-0.3-core

Datasets used to train tangledgroup/tangled-alpha-0.3-core