remove part about long context modifications
Browse filesthis was probably copied from an old model card. this model has a default of 131k without yarn
README.md
CHANGED
@@ -106,19 +106,6 @@ To achieve optimal performance, we recommend the following settings:
|
|
106 |
- **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
|
107 |
- **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the `answer` field with only the choice letter, e.g.,`\"answer\": \"C\"`." in the prompt.
|
108 |
|
109 |
-
5. **Handle Long Inputs**: For inputs exceeding 32,768 tokens, enable [YaRN](https://arxiv.org/abs/2309.00071) to improve the model's ability to capture long-sequence information effectively.
|
110 |
-
|
111 |
-
For supported frameworks, you could add the following to `config.json` to enable YaRN:
|
112 |
-
```json
|
113 |
-
{
|
114 |
-
...,
|
115 |
-
"rope_scaling": {
|
116 |
-
"factor": 4.0,
|
117 |
-
"original_max_position_embeddings": 32768,
|
118 |
-
"type": "yarn"
|
119 |
-
}
|
120 |
-
}
|
121 |
-
```
|
122 |
|
123 |
For deployment, we recommend using vLLM. Please refer to our [Documentation](https://qwen.readthedocs.io/en/latest/deployment/vllm.html) for usage if you are not familar with vLLM.
|
124 |
Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**.
|
|
|
106 |
- **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
|
107 |
- **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the `answer` field with only the choice letter, e.g.,`\"answer\": \"C\"`." in the prompt.
|
108 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
109 |
|
110 |
For deployment, we recommend using vLLM. Please refer to our [Documentation](https://qwen.readthedocs.io/en/latest/deployment/vllm.html) for usage if you are not familar with vLLM.
|
111 |
Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**.
|