24/08/16 v2 Init

Browse files

Files changed (8) hide show

README.md +85 -22
model.safetensors +1 -1
optimizer.pt +1 -1
pytorch_model.bin +1 -1
rng_state.pth +1 -1
scheduler.pt +1 -1
trainer_state.json +0 -0
training_args.bin +2 -2

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ base_model: keeeeenw/MicroLlama
 tags:
 - generated_from_trainer
 model-index:
-- name: MicroLlama_305M_stage1
   results: []
 ---
@@ -12,12 +12,12 @@ model-index:
 should probably proofread and complete it, then remove this comment. -->
 [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/momorami-kaist/medusa_test/runs/3ou8zlhf)
-# MicroLlama_305M_stage1
 This model is a fine-tuned version of [keeeeenw/MicroLlama](https://huggingface.co/keeeeenw/MicroLlama) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.9645
 ## Model description
@@ -51,24 +51,87 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 3.0309        | 0.0242 | 40   | 3.0794          |
-| 3.0295        | 0.0483 | 80   | 2.9479          |
-| 2.8731        | 0.0725 | 120  | 2.9168          |
-| 2.8009        | 0.0966 | 160  | 2.9328          |
-| 2.9132        | 0.1208 | 200  | 2.9119          |
-| 2.7718        | 0.1449 | 240  | 2.8891          |
-| 2.884         | 0.1691 | 280  | 2.8912          |
-| 2.7161        | 0.1932 | 320  | 2.8810          |
-| 2.6438        | 0.2174 | 360  | 2.8815          |
-| 2.6152        | 0.2415 | 400  | 2.8896          |
-| 2.7979        | 0.2657 | 440  | 2.8809          |
-| 2.7002        | 0.2899 | 480  | 2.8611          |
-| 2.6969        | 0.3140 | 520  | 2.8598          |
-| 2.8914        | 0.3382 | 560  | 2.8694          |
-| 2.7377        | 0.3623 | 600  | 2.8479          |
-| 2.7575        | 0.3865 | 640  | 2.8544          |
-| 2.8079        | 0.4106 | 680  | 2.8756          |
-| 3.2421        | 0.4348 | 720  | 2.9645          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: medusa-microllama_305M_stage1_v2
   results: []
 ---
 should probably proofread and complete it, then remove this comment. -->
 [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/momorami-kaist/medusa_test/runs/nmejq8y2)
+# medusa-microllama_305M_stage1_v2
 This model is a fine-tuned version of [keeeeenw/MicroLlama](https://huggingface.co/keeeeenw/MicroLlama) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.5107
 ## Model description
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 3.0312        | 0.0244 | 40   | 3.0649          |
+| 3.026         | 0.0489 | 80   | 2.9528          |
+| 2.8781        | 0.0733 | 120  | 2.9163          |
+| 2.8075        | 0.0978 | 160  | 2.9268          |
+| 2.9164        | 0.1222 | 200  | 2.9027          |
+| 2.7724        | 0.1467 | 240  | 2.8815          |
+| 2.8856        | 0.1711 | 280  | 2.8871          |
+| 2.718         | 0.1955 | 320  | 2.8749          |
+| 2.6479        | 0.2200 | 360  | 2.8815          |
+| 2.6194        | 0.2444 | 400  | 2.8872          |
+| 2.7954        | 0.2689 | 440  | 2.8773          |
+| 2.7008        | 0.2933 | 480  | 2.8572          |
+| 2.6876        | 0.3178 | 520  | 2.8560          |
+| 2.879         | 0.3422 | 560  | 2.8665          |
+| 2.7377        | 0.3666 | 600  | 2.8482          |
+| 2.7459        | 0.3911 | 640  | 2.8512          |
+| 2.8036        | 0.4155 | 680  | 2.8712          |
+| 2.89          | 0.4400 | 720  | 2.8614          |
+| 2.7898        | 0.4644 | 760  | 2.8570          |
+| 2.891         | 0.4888 | 800  | 2.8384          |
+| 2.717         | 0.5133 | 840  | 2.8344          |
+| 2.8589        | 0.5377 | 880  | 2.8342          |
+| 2.8944        | 0.5622 | 920  | 2.8040          |
+| 2.85          | 0.5866 | 960  | 2.8012          |
+| 2.8057        | 0.6111 | 1000 | 2.8063          |
+| 2.6772        | 0.6355 | 1040 | 2.7957          |
+| 2.7905        | 0.6599 | 1080 | 2.7822          |
+| 2.7579        | 0.6844 | 1120 | 2.7922          |
+| 2.7625        | 0.7088 | 1160 | 2.7763          |
+| 2.85          | 0.7333 | 1200 | 2.7607          |
+| 2.8447        | 0.7577 | 1240 | 2.7611          |
+| 2.8027        | 0.7822 | 1280 | 2.7501          |
+| 2.461         | 0.8066 | 1320 | 2.7201          |
+| 2.6232        | 0.8310 | 1360 | 2.6906          |
+| 2.6998        | 0.8555 | 1400 | 2.6763          |
+| 2.7609        | 0.8799 | 1440 | 2.6603          |
+| 2.6003        | 0.9044 | 1480 | 2.6549          |
+| 2.2626        | 0.9288 | 1520 | 2.6484          |
+| 2.5896        | 0.9533 | 1560 | 2.6389          |
+| 2.5704        | 0.9777 | 1600 | 2.6245          |
+| 2.1629        | 1.0021 | 1640 | 2.6164          |
+| 2.1719        | 1.0266 | 1680 | 2.6152          |
+| 2.2115        | 1.0510 | 1720 | 2.6134          |
+| 2.359         | 1.0755 | 1760 | 2.6127          |
+| 2.3486        | 1.0999 | 1800 | 2.6066          |
+| 2.1864        | 1.1244 | 1840 | 2.6041          |
+| 2.1692        | 1.1488 | 1880 | 2.6023          |
+| 2.1455        | 1.1732 | 1920 | 2.5998          |
+| 2.195         | 1.1977 | 1960 | 2.5914          |
+| 2.3458        | 1.2221 | 2000 | 2.5883          |
+| 2.1419        | 1.2466 | 2040 | 2.5827          |
+| 2.1329        | 1.2710 | 2080 | 2.5743          |
+| 2.2733        | 1.2954 | 2120 | 2.5686          |
+| 2.2662        | 1.3199 | 2160 | 2.5654          |
+| 2.399         | 1.3443 | 2200 | 2.5637          |
+| 2.1518        | 1.3688 | 2240 | 2.5563          |
+| 2.1115        | 1.3932 | 2280 | 2.5483          |
+| 2.2048        | 1.4177 | 2320 | 2.5434          |
+| 2.2658        | 1.4421 | 2360 | 2.5390          |
+| 2.2186        | 1.4665 | 2400 | 2.5366          |
+| 2.1467        | 1.4910 | 2440 | 2.5321          |
+| 2.2352        | 1.5154 | 2480 | 2.5281          |
+| 2.2507        | 1.5399 | 2520 | 2.5250          |
+| 2.1987        | 1.5643 | 2560 | 2.5221          |
+| 2.2234        | 1.5888 | 2600 | 2.5205          |
+| 2.0497        | 1.6132 | 2640 | 2.5181          |
+| 2.1133        | 1.6376 | 2680 | 2.5166          |
+| 2.1047        | 1.6621 | 2720 | 2.5153          |
+| 2.1578        | 1.6865 | 2760 | 2.5148          |
+| 2.1869        | 1.7110 | 2800 | 2.5135          |
+| 2.0953        | 1.7354 | 2840 | 2.5126          |
+| 2.1413        | 1.7599 | 2880 | 2.5119          |
+| 2.1333        | 1.7843 | 2920 | 2.5115          |
+| 2.2001        | 1.8087 | 2960 | 2.5114          |
+| 2.1889        | 1.8332 | 3000 | 2.5111          |
+| 2.2247        | 1.8576 | 3040 | 2.5110          |
+| 2.2258        | 1.8821 | 3080 | 2.5108          |
+| 2.157         | 1.9065 | 3120 | 2.5107          |
+| 2.181         | 1.9310 | 3160 | 2.5107          |
+| 2.1441        | 1.9554 | 3200 | 2.5107          |
+| 2.4097        | 1.9798 | 3240 | 2.5107          |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e05bd9032388dd9907eb54ae31488e88ebee115128bedc7b086671d1991323aa
 size 879828432

 version https://git-lfs.github.com/spec/v1
+oid sha256:fd398c657bb5e30a2dd992cba2fd12da76f090e6e19f3340d208881b168e33fe
 size 879828432

optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:caae61e64640d554cf284b55ffaac33450b791b06cec745ff52ed3de666a0ecf
 size 271108660

 version https://git-lfs.github.com/spec/v1
+oid sha256:0bcdc04c6ef2a91016040853d592f583e763082b76510dce226ccd1a9c1680dd
 size 271108660

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b3edaa40748ca2f1db06ce3d4081be5b7f0f7caf22c32d24b9ed70cfa1efdb77
 size 879855990

 version https://git-lfs.github.com/spec/v1
+oid sha256:04e409eafa14b811f25d9d9e659cc3617fa835f52b94af1bb3c0bb718e9bc1fb
 size 879855990

rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1ff264f99d31b522cc7e2a4eac9d38606d0c58a34c0adc74d71e0ca8b371dc36
 size 14244

 version https://git-lfs.github.com/spec/v1
+oid sha256:9196a1e708bf24d6abba41cce3f8558820acc3e50f9394c5955e29eb41ffea3d
 size 14244

scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:29706500f8fe69aa61a62e9f1c542da177090efb816b613770204adcd5cc04b0
 size 1064

 version https://git-lfs.github.com/spec/v1
+oid sha256:dab72756239cdb754fa43fd3130c56c2d70df0f7bafb820d1b3badda5b2013a7
 size 1064

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:62c5aaac84a29e9b64e2a8c76ee66b8c53b521633219d795aed9292df981e6d1
-size 5560

 version https://git-lfs.github.com/spec/v1
+oid sha256:f412d7223295ffcc124f509ffcb0a38296a3e116746a3519a3cecd8640ca0333
+size 5624