noneUsername's picture
Create README.md
12fe2a4 verified

vllm (pretrained=/root/autodl-tmp/output,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.556 ± 0.0315
strict-match 5 exact_match 0.832 ± 0.0237

vllm (pretrained=/root/autodl-tmp/Replete-LLM-V2.5-Qwen-14b,add_bos_token=true,tensor_parallel_size=2,max_model_len=2048), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.528 ± 0.0316
strict-match 5 exact_match 0.844 ± 0.0230