Different number of attention heads, makes rotary_ndims vs rope scaling factors wrong?
In configuration_phi3.py, it has:
rotary_ndims = int(self.hidden_size // self.num_attention_heads * self.partial_rotary_factor)
so rotary_ndims would be 3072 // 24 * 1.0 = 128
Then rope_scaling_short_factor is a list of length 48
it then raises an error if
len(rope_scaling_short_factor) != rotary_ndims //2
and since 48 != 64, this is an error (and I get a similar one in llama.cpp)
The question is, is the number of heads incorrect? In both phi 3.5 mini the num_attention_heads is 32, which would give a rotary_ndims of 96, which when divided by 2 gives the number 48 that we expect
Any idea what's incorrect?
Thanks for your interest!
In the config, the rotary factor is 0.75.
Could you share how you are loading the config?
I was attempting to quantize this to an 8-bit EXL2 quant this morning, which also failed for what I assume are similar reasons. Looks like it's missing the check for partial_rotary_factor
. Very cool to see 128K context on Phi-4. Will work to get the associated infrastructure in place.
Maybe there is a same issue with Sglang?
When I run the following command:
python3 -m sglang.launch_server --model-path microsoft/Phi-4-mini-instruct --host 0.0.0.0 --port 30000 --dp 4 --enable-p2p-check --mem-fraction-static 0.95
I get this error:
File "/usr/local/lib/python3.10/dist-packages/transformers/models/phi3/configuration_phi3.py", line 159, in __init__
self._rope_scaling_validation()
File "/usr/local/lib/python3.10/dist-packages/transformers/models/phi3/configuration_phi3.py", line 208, in _rope_scaling_validation
raise ValueError(
ValueError: `rope_scaling`'s short_factor field must have length 64, got 48```
Maybe there is a same issue with Sglang?
When I run the following command:
python3 -m sglang.launch_server --model-path microsoft/Phi-4-mini-instruct --host 0.0.0.0 --port 30000 --dp 4 --enable-p2p-check --mem-fraction-static 0.95
I get this error:
File "/usr/local/lib/python3.10/dist-packages/transformers/models/phi3/configuration_phi3.py", line 159, in __init__ self._rope_scaling_validation() File "/usr/local/lib/python3.10/dist-packages/transformers/models/phi3/configuration_phi3.py", line 208, in _rope_scaling_validation raise ValueError( ValueError: `rope_scaling`'s short_factor field must have length 64, got 48```
Same issue with vllm even if with version 0.7.2 in OAI server mode.
Hi @leflak ,
Thanks for your interest!
We already integrate it to vllm, and it will be available from v0.7.3.
https://github.com/vllm-project/vllm/pull/12718
Thanks.
getting same error when GRPO training with unsloth - valueError: rope_scaling
's short_factor field must have length 64, got 48```
same error when sft with huggingface trl
ValueError: `rope_scaling`'s short_factor field must have length 64, got 48
This error is raised because the length of your rope_scaling dictionary’s short_factor list doesn’t match what the model configuration expects. In the validation method, the code calculates:
rotary_ndims = int(self.hidden_size // self.num_attention_heads * self.partial_rotary_factor)
Then it requires that the length of rope_scaling["short_factor"] be exactly rotary_ndims // 2. In your case, the error message indicates that it expected a length of 64, meaning:
rotary_ndims
128
rotary_ndims=128
128
÷
2
64
128÷2=64
But your provided list has only 48 elements.
To resolve this issue, you have two options:
Update the rope_scaling dictionary:
Modify your rope_scaling["short_factor"] (and similarly the long_factor, if applicable) so that its length is 64, matching the computed expectation.
Adjust model parameters:
If the list of 48 elements is what you intend to use, then you’ll need to adjust your model’s configuration (for example, by changing hidden_size, num_attention_heads, or partial_rotary_factor) so that the computed value of rotary_ndims // 2 equals 48.
Review your model configuration settings and ensure that the dimensions in rope_scaling align with the derived value from your model parameters.
Is there fix in Microsoft for that?
Hi
@legolasyiu
.
Thanks for your interest!
Yes, the new model feature is added to the latest HF(v4.49.0) and vllm (v0.7.3) already.
VLLM: https://github.com/vllm-project/vllm/pull/12718
HF: https://github.com/huggingface/transformers/pull/35947