smcleish
/

clrs_llama_3_8b_100k_finetune_with_traces

Update README.md

70c0c5a verified about 2 months ago

436 Bytes

metadata

library_name: transformers
license: mit
base_model:
  - meta-llama/Meta-Llama-3-8B

Model Details

meta-llama/Meta-Llama-3-8B model finetuned on 100,000 CLRS-Text examples.

Learning Rate: 1e-4, 150 warmup steps then cosine decayed to 5e-06 using AdamW optimiser
Batch size: 128
Loss taken over answer only, not on question.