anthracite-org
/

magnum-v2-123b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

kalomaze commited on Aug 19, 2024

Commit

73ca724

·

verified ·

1 Parent(s): 66193b2

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -49,10 +49,10 @@ In addition to this, we noticed that Mistral Large models seemed much more sensi
 We hypothesize this is primarily due to the particularly narrow and low variance weight distributions typical of Mistral derived models regardless of their scale.
-In the end, due to the costs that would be involved in training another full 2 epochs run ($600) on an even lower rate, we settled on our third attempt: 2e-6 with an effective batch size of 64, stopped earlier than the target 2 epochs.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6491e00e057b0928b3e07b75/d9_cBy-DuWrdnoVBbAvRV.png)
-We notice a correlation between the significance of the 2nd epoch loss drop and the strength of the learning rate, implying 4e-6 leads to more catastrophic forgetting.
 [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)

 We hypothesize this is primarily due to the particularly narrow and low variance weight distributions typical of Mistral derived models regardless of their scale.
+In the end, due to the costs that would be involved in training another full 2 epochs run ($600) on an even lower rate, we settled on our third attempt: 2e-6 with an effective batch size of 64. We chose to publish the 1.5 epoch run after manually testing and comparing it.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6491e00e057b0928b3e07b75/d9_cBy-DuWrdnoVBbAvRV.png)
+Also, we notice a correlation between the significance of the 2nd epoch loss drop and the strength of the learning rate, implying 4e-6 leads to more catastrophic forgetting.
 [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)