Qwen
/

Text Generation
GGUF
English
chat
Inference Endpoints
conversational
feihu.hf commited on
Commit
a4e8fc8
·
1 Parent(s): da58b03

update README

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -91,11 +91,13 @@ To achieve optimal performance, we recommend the following settings:
91
  - Use Temperature=0.6 and TopP=0.95 instead of Greedy decoding to avoid endless repetitions.
92
  - Use TopK between 20 and 40 to filter out rare token occurrences while maintaining the diversity of the generated output.
93
 
94
- 3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
 
 
95
  - **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
96
  - **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the `answer` field with only the choice letter, e.g.,`\"answer\": \"C\"`." in the prompt.
97
 
98
- 4. **Handle Long Inputs**: For inputs exceeding 32,768 tokens, enable [YaRN](https://arxiv.org/abs/2309.00071) to improve the model's ability to capture long-sequence information effectively. Currently, only vLLM supports YARN for length extrapolating. If you want to process sequences up to 131,072 tokens, please refer to non-GGUF models.
99
 
100
  ## Evaluation & Performance
101
 
 
91
  - Use Temperature=0.6 and TopP=0.95 instead of Greedy decoding to avoid endless repetitions.
92
  - Use TopK between 20 and 40 to filter out rare token occurrences while maintaining the diversity of the generated output.
93
 
94
+ 3. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. This feature is already implemented in `apply_chat_template`.
95
+
96
+ 4. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
97
  - **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
98
  - **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the `answer` field with only the choice letter, e.g.,`\"answer\": \"C\"`." in the prompt.
99
 
100
+ 5. **Handle Long Inputs**: For inputs exceeding 32,768 tokens, enable [YaRN](https://arxiv.org/abs/2309.00071) to improve the model's ability to capture long-sequence information effectively. Currently, only vLLM supports YARN for length extrapolating. If you want to process sequences up to 131,072 tokens, please refer to non-GGUF models.
101
 
102
  ## Evaluation & Performance
103