Doctor-Shotgun commited on
Commit
4b5b43d
1 Parent(s): 8d1a651

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - togethercomputer/RedPajama-Data-1T-Sample
5
+ language:
6
+ - en
7
+ tags:
8
+ - llama
9
+ - llama 2
10
+ ---
11
+ # TinyLlama-1.1B-32k
12
+
13
+ 32k context finetune of TinyLlama-1.1B using increased rope theta (rope frequency base) meant to serve as a long-context speculative decoding model.
14
+
15
+ Created using [TinyLlama-1.1B](https://huggingface.co/TinyLlama/tinyLlama-intermediate-checkpoints-after-1T-token) and further pretraining at 32768 context length on [togethercomputer/RedPajama-Data-1T-Sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample).
16
+
17
+ Of note, the base checkpoint used was from commit "final model" fad4f1a5cd0563ac41349b8fec2e6e51156568a0 which was subsequently reverted, and not the current main branch 3T checkpoint of TinyLlama-1.1B.
18
+
19
+ Wikitext (wikitext-2-raw-v1_train) Perplexity (64 rows) as evaluated by [exllamav2](https://github.com/turboderp/exllamav2):
20
+ ```
21
+ Base Model
22
+ 2048: 8.5633
23
+ 4096: 208.3586
24
+ 8192: 863.7507
25
+ 16384: 1600.5021
26
+ 32768: 6981.9021
27
+
28
+ 32k Model
29
+ 2048: 8.6548
30
+ 4096: 7.8339
31
+ 8192: 7.4904
32
+ 16384: 7.3674
33
+ 32768: 7.1338
34
+ ```