LoneStriker
commited on
Commit
•
8157d71
1
Parent(s):
5958a58
Upload folder using huggingface_hub
Browse files- .gitattributes +9 -35
- Qwen1.5-8x7b-Q2_K.gguf +3 -0
- Qwen1.5-8x7b-Q3_K_L.gguf +3 -0
- Qwen1.5-8x7b-Q3_K_M.gguf +3 -0
- Qwen1.5-8x7b-Q3_K_S.gguf +3 -0
- Qwen1.5-8x7b-Q4_K_M.gguf +3 -0
- Qwen1.5-8x7b-Q4_K_S.gguf +3 -0
- Qwen1.5-8x7b-Q5_K_M.gguf +3 -0
- Qwen1.5-8x7b-Q5_K_S.gguf +3 -0
- Qwen1.5-8x7b-Q6_K.gguf +3 -0
- README.md +34 -0
- merges.txt +0 -0
.gitattributes
CHANGED
@@ -1,35 +1,9 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
11 |
-
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
12 |
-
*.model filter=lfs diff=lfs merge=lfs -text
|
13 |
-
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
14 |
-
*.npy filter=lfs diff=lfs merge=lfs -text
|
15 |
-
*.npz filter=lfs diff=lfs merge=lfs -text
|
16 |
-
*.onnx filter=lfs diff=lfs merge=lfs -text
|
17 |
-
*.ot filter=lfs diff=lfs merge=lfs -text
|
18 |
-
*.parquet filter=lfs diff=lfs merge=lfs -text
|
19 |
-
*.pb filter=lfs diff=lfs merge=lfs -text
|
20 |
-
*.pickle filter=lfs diff=lfs merge=lfs -text
|
21 |
-
*.pkl filter=lfs diff=lfs merge=lfs -text
|
22 |
-
*.pt filter=lfs diff=lfs merge=lfs -text
|
23 |
-
*.pth filter=lfs diff=lfs merge=lfs -text
|
24 |
-
*.rar filter=lfs diff=lfs merge=lfs -text
|
25 |
-
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
26 |
-
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
27 |
-
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
28 |
-
*.tar filter=lfs diff=lfs merge=lfs -text
|
29 |
-
*.tflite filter=lfs diff=lfs merge=lfs -text
|
30 |
-
*.tgz filter=lfs diff=lfs merge=lfs -text
|
31 |
-
*.wasm filter=lfs diff=lfs merge=lfs -text
|
32 |
-
*.xz filter=lfs diff=lfs merge=lfs -text
|
33 |
-
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
-
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
-
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
1 |
+
Qwen1.5-8x7b-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
|
2 |
+
Qwen1.5-8x7b-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
|
3 |
+
Qwen1.5-8x7b-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
4 |
+
Qwen1.5-8x7b-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
5 |
+
Qwen1.5-8x7b-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
6 |
+
Qwen1.5-8x7b-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
7 |
+
Qwen1.5-8x7b-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
8 |
+
Qwen1.5-8x7b-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
9 |
+
Qwen1.5-8x7b-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qwen1.5-8x7b-Q2_K.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c8f528f96723b94b0508f8610b0376ad04bcf8a08149aa7381e378e385cd424c
|
3 |
+
size 14943500352
|
Qwen1.5-8x7b-Q3_K_L.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:088ab011d1c877645783097bff46f66482b6fb3fb2c557a9a3242f5e45af729e
|
3 |
+
size 20243400768
|
Qwen1.5-8x7b-Q3_K_M.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e9c697b9f55aa1d56c472194e8600953cd21c597033a02be48e27022efd02fd2
|
3 |
+
size 19029149760
|
Qwen1.5-8x7b-Q3_K_S.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:358020283fbfa60d1625be369c2c443753a6f398b53817b64d936eae35de1924
|
3 |
+
size 17405954112
|
Qwen1.5-8x7b-Q4_K_M.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7d255bc27144007c2e8fcb912b2a453ed22dc28dd70b8551aff50bafbfa09894
|
3 |
+
size 23647033408
|
Qwen1.5-8x7b-Q4_K_S.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a7c3f3193d15e9566a80d6dd2aa085cab24857c6c9f270edc601857c93e15fb8
|
3 |
+
size 22339459136
|
Qwen1.5-8x7b-Q5_K_M.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:5621ffb6955ba7c4fbe44e0965cb1ef810fae29d06ad8f17b309268b8a076faa
|
3 |
+
size 27399166016
|
Qwen1.5-8x7b-Q5_K_S.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1557e0eb3a67bf396746cf80b239b8177a8cfa6a8554dbbc8f7e794fe4a310bb
|
3 |
+
size 26632656960
|
Qwen1.5-8x7b-Q6_K.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3b9547e0a3e114dc2b784a6c168a1663d96b26601ed6ca4c42d2ad9da18c430e
|
3 |
+
size 31457110080
|
README.md
ADDED
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
license_name: tongyi-qianwen-license-agreement
|
4 |
+
license_link: >-
|
5 |
+
https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT
|
6 |
+
datasets:
|
7 |
+
- Crystalcareai/MoD
|
8 |
+
---
|
9 |
+
|
10 |
+
# Please note this is the model that accompanies the dataset; https://huggingface.co/datasets/Crystalcareai/MoD. The readme is the same for both, with more detail below
|
11 |
+
|
12 |
+
## Hey, I'm Lucas
|
13 |
+
|
14 |
+
I'm excited to share an early release of a project that has kept me busy for the last couple of weeks. Mixtral's release propelled me into a deep dive into MoEs. This led to my first experiments with post-training, starting with fine tuning using monsterapi around the middle of December, and later transitioning to axolotl as I got more comfortable with command lines and terminals.
|
15 |
+
|
16 |
+
With the release of Qwen1.5, I was curious to see how it would compare to Mixtral. Thanks to lazymergekit, which simplifies the process for newcomers, I was able to give Qwen1.5-7B a unique twist.
|
17 |
+
|
18 |
+
Coming from a background as an acting teacher and coach, I saw parallels between high-quality scripts' impact on performances and the importance of curating high-quality data for training models. This led me to explore data curation, especially for training Mixture of Experts (MoE) models. I looked into Teknium's OpenHermes dataset, Jon Durbin's collections on GitHub, and Eric Hartford's methods for achieving specific outcomes with models.
|
19 |
+
|
20 |
+
I curated a dataset, named Mixture of Data (MoD), from various sources, including Bagel, OpenHermes, and many more, totaling about 780,000 distinct ShareGPT conversations. This dataset aims to encourage MoE models to develop their own distinct experts.
|
21 |
+
|
22 |
+
After training Qwen1.5-7b on 100k random samples from MoD over four epochs and merging the fine-tuned model 8x, I used an approach utilizing a random gate, without specialized fine-tuning done to any of the 8 experts. The result was a model that initially made no sense, lacking a base model and clear guidance on expert usage.
|
23 |
+
|
24 |
+
Despite challenges, such as training interruptions via cuda errors with Runpod , the model showed promising adaptability to the rest of the MoD dataset, even with limited training (0.45/4 planned epochs were completed before my compute budget ran out). While I haven't been able to benchmark it fully (I will when I can get this runpod situation sorted) it appears to perform comparably to Mixtral in (admittedly naive) preliminary reasoning tests.
|
25 |
+
|
26 |
+
These weeks have been incredibly rewarding and educational, thanks to the contributions of Jon Durbin, Maxime Labonne, Teknium, Eric Hartford, and Charles Goddard. Their work has made these technologies accessible and inspired my project. A special thank you to Teknium and Eric Hartford, who have been generous with their time - answering my questions with kindness and humility.
|
27 |
+
|
28 |
+
I’m hoping to receive compensation from Runpod for the interruptions (and the resulting LARGE amount of wasted $$$), and will complete the full fine-tuning and report the results here. I hope the MoD dataset and Qwen1.5-8x7b model will be valuable to the community and encourage further exploration with these architectures.
|
29 |
+
|
30 |
+
I am fully committed to this field and plan to continue developing models (eventually as a career). ML is fascinating, and I look forward to contributing to its advancement, however big or small.
|
31 |
+
|
32 |
+
Thank you for your interest and support. Let's push the boundaries of what's possible together.
|
33 |
+
|
34 |
+
Lucas
|
merges.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|