gabrielmbmb/Upcycled-Qwen1.5-MoE2.7B
Text Generation
•
Updated
•
25
•
2
Models I pre-trained initialising SMoE models using dense model weights and the upcycling process used for Qwen1.5-MoE2.7BA (or something similar)
Note This model hasn't been trained, just initialised using upcycling process and the weights from Qwen/Qwen1.5-18B.
Note Using LoRA and targeting up_proj, gate_proj, down_proj, gate, and shared_expert_gate. LoRA rank was 8. About 126M trainable parameters. Dataset used was wiki_demo from LLaMa Factory.
Note Using LoRA targeting all the layers and rank 32. About 500m trainable parameters.