Weird fine-tuning problem
#2
by
joorei
- opened
Hello,
I am fine-tuning dolphin-mixtral with axolotl. I am inspired by your config, I chose qlora modules. What is interesting, that I can fine-tune dolphin-2.5-mixtral-8x7b, but when I just change "5" to "6" and otherwise keep the config the same (and remove and recreate the output directory), I get the following error:
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1854, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/workspace/axolotl/src/axolotl/core/trainer_builder.py", line 291, in compute_loss
return super().compute_loss(model, inputs, return_outputs=return_outputs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2758, in compute_loss
outputs = model(**inputs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 659, in forward
return model_forward(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 647, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/peft/peft_model.py", line 977, in forward
return self.base_model(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 106, in forward
return self.model.forward(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 1258, in forward
loss += self.router_aux_loss_coef * aux_loss
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0!
Any idea what could be different about 2.5 vs 2.6 that could cause this?
2.7 is the same btw.
I'm trying, but changing output_router_logits to false does not help