Different number of attention heads, makes rotary_ndims vs rope scaling factors wrong?

by bartowski - opened 2 days ago

2 days ago

In configuration_phi3.py, it has:

rotary_ndims = int(self.hidden_size // self.num_attention_heads * self.partial_rotary_factor)

so rotary_ndims would be 3072 // 24 * 1.0 = 128

Then rope_scaling_short_factor is a list of length 48

it then raises an error if

len(rope_scaling_short_factor) != rotary_ndims //2

and since 48 != 64, this is an error (and I get a similar one in llama.cpp)

The question is, is the number of heads incorrect? In both phi 3.5 mini the num_attention_heads is 32, which would give a rotary_ndims of 96, which when divided by 2 gives the number 48 that we expect

Any idea what's incorrect?

ykim362

Microsoft org 1 day ago

•

edited 1 day ago

Thanks for your interest!

In the config, the rotary factor is 0.75.
Could you share how you are loading the config?

https://huggingface.co./microsoft/Phi-4-mini-instruct/blob/4b00ec8714b0cb224e4fb33380cbf0919f177f3e/config.json#L31

dinerburger

1 day ago

I was attempting to quantize this to an 8-bit EXL2 quant this morning, which also failed for what I assume are similar reasons. Looks like it's missing the check for partial_rotary_factor. Very cool to see 128K context on Phi-4. Will work to get the associated infrastructure in place.

aqweteddy

1 day ago

Maybe there is a same issue with Sglang?

When I run the following command:

python3 -m sglang.launch_server --model-path microsoft/Phi-4-mini-instruct --host 0.0.0.0 --port 30000 --dp 4 --enable-p2p-check --mem-fraction-static 0.95

I get this error:

  File "/usr/local/lib/python3.10/dist-packages/transformers/models/phi3/configuration_phi3.py", line 159, in __init__
self._rope_scaling_validation()
File "/usr/local/lib/python3.10/dist-packages/transformers/models/phi3/configuration_phi3.py", line 208, in _rope_scaling_validation
raise ValueError(
ValueError: `rope_scaling`'s short_factor field must have length 64, got 48```

bartowski

1 day ago

Yes it seems likely that most of these tools are ignoring the 0.75 scaling, thanks for pointing that out @ykim362 ! Will investigate

leflak

about 23 hours ago

•

edited about 23 hours ago

Maybe there is a same issue with Sglang?

When I run the following command:

python3 -m sglang.launch_server --model-path microsoft/Phi-4-mini-instruct --host 0.0.0.0 --port 30000 --dp 4 --enable-p2p-check --mem-fraction-static 0.95

I get this error:

  File "/usr/local/lib/python3.10/dist-packages/transformers/models/phi3/configuration_phi3.py", line 159, in __init__
self._rope_scaling_validation()
File "/usr/local/lib/python3.10/dist-packages/transformers/models/phi3/configuration_phi3.py", line 208, in _rope_scaling_validation
raise ValueError(
ValueError: `rope_scaling`'s short_factor field must have length 64, got 48```

Same issue with vllm even if with version 0.7.2 in OAI server mode.

ykim362

Microsoft org about 21 hours ago

•

edited about 21 hours ago

Hi @leflak ,

Thanks for your interest!
We already integrate it to vllm, and it will be available from v0.7.3.
https://github.com/vllm-project/vllm/pull/12718

Thanks.

legolasyiu

about 18 hours ago

getting same error when GRPO training with unsloth - valueError: rope_scaling's short_factor field must have length 64, got 48```

jhflow

about 18 hours ago

same error when sft with huggingface trl

ValueError: `rope_scaling`'s short_factor field must have length 64, got 48

legolasyiu

about 16 hours ago

This error is raised because the length of your rope_scaling dictionary’s short_factor list doesn’t match what the model configuration expects. In the validation method, the code calculates:

rotary_ndims = int(self.hidden_size // self.num_attention_heads * self.partial_rotary_factor)

Then it requires that the length of rope_scaling["short_factor"] be exactly rotary_ndims // 2. In your case, the error message indicates that it expected a length of 64, meaning:

rotary_ndims

128
rotary_ndims=128
128
÷
2

64
128÷2=64
But your provided list has only 48 elements.

To resolve this issue, you have two options:

Update the rope_scaling dictionary:
Modify your rope_scaling["short_factor"] (and similarly the long_factor, if applicable) so that its length is 64, matching the computed expectation.

Adjust model parameters:
If the list of 48 elements is what you intend to use, then you’ll need to adjust your model’s configuration (for example, by changing hidden_size, num_attention_heads, or partial_rotary_factor) so that the computed value of rotary_ndims // 2 equals 48.

Review your model configuration settings and ensure that the dimensions in rope_scaling align with the derived value from your model parameters.

legolasyiu

about 3 hours ago

Is there fix in Microsoft for that?

ykim362

Microsoft org about 2 hours ago

Hi @legolasyiu .
Thanks for your interest!
Yes, the new model feature is added to the latest HF(v4.49.0) and vllm (v0.7.3) already.

VLLM: https://github.com/vllm-project/vllm/pull/12718
HF: https://github.com/huggingface/transformers/pull/35947

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Different number of attention heads, makes rotary_ndims vs rope scaling factors wrong?

rotary_ndims

128rotary_ndims=128128÷2

128
rotary_ndims=128
128
÷
2