Doctor-Shotgun
commited on
Commit
•
4b5b43d
1
Parent(s):
8d1a651
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- togethercomputer/RedPajama-Data-1T-Sample
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
tags:
|
8 |
+
- llama
|
9 |
+
- llama 2
|
10 |
+
---
|
11 |
+
# TinyLlama-1.1B-32k
|
12 |
+
|
13 |
+
32k context finetune of TinyLlama-1.1B using increased rope theta (rope frequency base) meant to serve as a long-context speculative decoding model.
|
14 |
+
|
15 |
+
Created using [TinyLlama-1.1B](https://huggingface.co/TinyLlama/tinyLlama-intermediate-checkpoints-after-1T-token) and further pretraining at 32768 context length on [togethercomputer/RedPajama-Data-1T-Sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample).
|
16 |
+
|
17 |
+
Of note, the base checkpoint used was from commit "final model" fad4f1a5cd0563ac41349b8fec2e6e51156568a0 which was subsequently reverted, and not the current main branch 3T checkpoint of TinyLlama-1.1B.
|
18 |
+
|
19 |
+
Wikitext (wikitext-2-raw-v1_train) Perplexity (64 rows) as evaluated by [exllamav2](https://github.com/turboderp/exllamav2):
|
20 |
+
```
|
21 |
+
Base Model
|
22 |
+
2048: 8.5633
|
23 |
+
4096: 208.3586
|
24 |
+
8192: 863.7507
|
25 |
+
16384: 1600.5021
|
26 |
+
32768: 6981.9021
|
27 |
+
|
28 |
+
32k Model
|
29 |
+
2048: 8.6548
|
30 |
+
4096: 7.8339
|
31 |
+
8192: 7.4904
|
32 |
+
16384: 7.3674
|
33 |
+
32768: 7.1338
|
34 |
+
```
|