Update README.md
Browse files
README.md
CHANGED
@@ -5,9 +5,13 @@ library_name: transformers
|
|
5 |
tags:
|
6 |
- mergekit
|
7 |
- merge
|
8 |
-
|
|
|
|
|
9 |
---
|
10 |
-
#
|
|
|
|
|
11 |
|
12 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
13 |
|
@@ -21,9 +25,68 @@ This model was merged using the passthrough merge method.
|
|
21 |
The following models were included in the merge:
|
22 |
* [maywell/kiqu-70b](https://huggingface.co/maywell/kiqu-70b)
|
23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
### Configuration
|
25 |
|
26 |
-
The following YAML configuration was used to produce this model:
|
27 |
|
28 |
```yaml
|
29 |
dtype: bfloat16
|
@@ -50,6 +113,4 @@ slices:
|
|
50 |
- sources:
|
51 |
- layer_range: [60, 80]
|
52 |
model: maywell/kiqu-70b
|
53 |
-
|
54 |
-
|
55 |
-
```
|
|
|
5 |
tags:
|
6 |
- mergekit
|
7 |
- merge
|
8 |
+
license: cc-by-sa-4.0
|
9 |
+
language:
|
10 |
+
- ko
|
11 |
---
|
12 |
+
# Megakiqu-120b
|
13 |
+
<img src="./megakiqu.jpg" alt="megakiqu-120B" width="390"/>
|
14 |
+
MegaDolphin, Venus๊ณผ ๊ฐ์ passthrough method๋ก ํ์ฅ๋ ๋ชจ๋ธ.
|
15 |
|
16 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
17 |
|
|
|
25 |
The following models were included in the merge:
|
26 |
* [maywell/kiqu-70b](https://huggingface.co/maywell/kiqu-70b)
|
27 |
|
28 |
+
## Original Model Card
|
29 |
+
# **kiqu-70b** [(Arena Leaderboard)](https://huggingface.co/spaces/instructkr/ko-chatbot-arena-leaderboard)
|
30 |
+
|
31 |
+
|
32 |
+
**kiqu-70b** is a SFT+DPO trained model based on Miqu-70B-Alpaca-DPO using **Korean** datasets.
|
33 |
+
|
34 |
+
Since this model is finetune of miqu-1-70b using it on commercial purposes is at your own risk. โ leaked early version Mistral-Medium
|
35 |
+
|
36 |
+
๋ณธ ๋ชจ๋ธ **kiqu-70b**๋ Miqu-70B-Alpaca-DPO ๋ชจ๋ธ์ ๊ธฐ๋ฐ์ผ๋ก **ํ๊ตญ์ด** ๋ฐ์ดํฐ์
์ ์ฌ์ฉํ์ฌ SFT+DPO ํ๋ จ์ ์งํํ์ฌ ์ ์๋์์ต๋๋ค.
|
37 |
+
|
38 |
+
๋ฒ ์ด์ค ๋ชจ๋ธ์ธ miqu-1-70b ๋ชจ๋ธ์ด ๋ฏธ์คํธ๋-๋ฏธ๋์์ ์ด๊ธฐ ์ ์ถ ๋ฒ์ ์ด๊ธฐ์ ์์
์ ์ฌ์ฉ์ ๋ํ risk๋ ๋ณธ์ธ์๊ฒ ์์ต๋๋ค.
|
39 |
+
|
40 |
+
Beside that this model follows **cc-by-sa-4.0**
|
41 |
+
|
42 |
+
๋ณธ ๋ชจ๋ธ ์์ฒด๋ก์๋ **cc-by-sa-4.0**์ ๋ฐ๋ฆ
๋๋ค.
|
43 |
+
|
44 |
+
# **Model Details**
|
45 |
+
|
46 |
+
**Base Model**
|
47 |
+
miqu-1-70b (Early Mistral-Medium)
|
48 |
+
|
49 |
+
**Instruction format**
|
50 |
+
|
51 |
+
It follows **Mistral** format.
|
52 |
+
Giving few-shots to model is highly recommended
|
53 |
+
|
54 |
+
๋ณธ ๋ชจ๋ธ์ ๋ฏธ์คํธ๋ ํฌ๋งท์ ๋ฐ๋ฆ
๋๋ค.
|
55 |
+
few-shot ์ฌ์ฉ์ ์ ๊ทน ๊ถ์ฅํฉ๋๋ค.
|
56 |
+
```
|
57 |
+
[INST] {instruction}
|
58 |
+
[/INST] {output}
|
59 |
+
```
|
60 |
+
|
61 |
+
Multi-shot
|
62 |
+
```
|
63 |
+
[INST] {instruction}
|
64 |
+
[/INST] {output}
|
65 |
+
[INST] {instruction}
|
66 |
+
[/INST] {output}
|
67 |
+
[INST] {instruction}
|
68 |
+
[/INST] {output}
|
69 |
+
.
|
70 |
+
.
|
71 |
+
.
|
72 |
+
```
|
73 |
+
|
74 |
+
**Recommended Template** - 1-shot with system prompt
|
75 |
+
```
|
76 |
+
๋๋ kiqu-70B๋ผ๋ ํ๊ตญ์ด์ ํนํ๋ ์ธ์ด๋ชจ๋ธ์ด์ผ. ๊น๋ํ๊ณ ์์ฐ์ค๋ฝ๊ฒ ๋๋ตํด์ค!
|
77 |
+
[INST] ์๋
?
|
78 |
+
[/INST] ์๋
ํ์ธ์! ๋ฌด์์ ๋์๋๋ฆด๊น์? ์ง๋ฌธ์ด๋ ๊ถ๊ธํ ์ ์ด ์๋ค๋ฉด ์ธ์ ๋ ์ง ๋ง์ํด์ฃผ์ธ์.
|
79 |
+
[INST] {instruction}
|
80 |
+
[/INST]
|
81 |
+
```
|
82 |
+
|
83 |
+
Trailing space after [/INST] can affect models performance in significant margin. So, when doing inference it is recommended to not include trailing space in chat template.
|
84 |
+
|
85 |
+
[/INST] ๋ค์ ๋์ด์ฐ๊ธฐ๋ ๋ชจ๋ธ ์ฑ๋ฅ์ ์ ์๋ฏธํ ์ํฅ์ ๋ฏธ์นฉ๋๋ค. ๋ฐ๋ผ์, ์ธํผ๋ฐ์ค(์ถ๋ก )๊ณผ์ ์์๋ ์ฑ ํ
ํ๋ฆฟ์ ๋์ด์ฐ๊ธฐ๋ฅผ ์ ์ธํ๋ ๊ฒ์ ์ ๊ทน ๊ถ์ฅํฉ๋๋ค.
|
86 |
+
|
87 |
### Configuration
|
88 |
|
89 |
+
The following mergekit's YAML configuration was used to produce this model:
|
90 |
|
91 |
```yaml
|
92 |
dtype: bfloat16
|
|
|
113 |
- sources:
|
114 |
- layer_range: [60, 80]
|
115 |
model: maywell/kiqu-70b
|
116 |
+
```
|
|
|
|