Doctor-Shotgun
/

TinyLlama-1.1B-32k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Doctor-Shotgun commited on Dec 29, 2023

Commit

4b5b43d

•

1 Parent(s): 8d1a651

Create README.md

Files changed (1) hide show

README.md +34 -0

README.md ADDED Viewed

	@@ -0,0 +1,34 @@

+---
+license: apache-2.0
+datasets:
+- togethercomputer/RedPajama-Data-1T-Sample
+language:
+- en
+tags:
+- llama
+- llama 2
+---
+# TinyLlama-1.1B-32k
+32k context finetune of TinyLlama-1.1B using increased rope theta (rope frequency base) meant to serve as a long-context speculative decoding model.
+Created using [TinyLlama-1.1B](https://huggingface.co/TinyLlama/tinyLlama-intermediate-checkpoints-after-1T-token) and further pretraining at 32768 context length on [togethercomputer/RedPajama-Data-1T-Sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample).
+Of note, the base checkpoint used was from commit "final model" fad4f1a5cd0563ac41349b8fec2e6e51156568a0 which was subsequently reverted, and not the current main branch 3T checkpoint of TinyLlama-1.1B.
+Wikitext (wikitext-2-raw-v1_train) Perplexity (64 rows) as evaluated by [exllamav2](https://github.com/turboderp/exllamav2):
+```
+Base Model
+2048: 8.5633
+4096: 208.3586
+8192: 863.7507
+16384: 1600.5021
+32768: 6981.9021
+32k Model
+2048: 8.6548
+4096: 7.8339
+8192: 7.4904
+16384: 7.3674
+32768: 7.1338
+```