Evaluation Details of QwQ-32B-Preview on LiveCodeBench Benchmark
#43
by
survivi
- opened
Hi, we saw the LiveCodeBench evaluation results (50.0) on QwQ-32B-Preview's blog, but we're having trouble trying to reproduce the result. QwQ-32B-Preview seems to be having trouble following the instructions for outputting the code, both in terms of outputting the code itself and in terms of the code format. And these are problems we didn't encounter when testing other models.
Can you reveal more details about the evaluation of QwQ-32B-Preview on LiveCodeBench? Specifically, can you provide the prompt (used to standardize the output code and output formatting requirements) provided to QwQ-32B-Preview during evaluation?