tsunemoto commited on
Commit
28a7679
1 Parent(s): 215c6f6

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,17 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ cosmo-1b.Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
37
+ cosmo-1b.Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
38
+ cosmo-1b.Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
39
+ cosmo-1b.Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
40
+ cosmo-1b.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
41
+ cosmo-1b.Q4_1.gguf filter=lfs diff=lfs merge=lfs -text
42
+ cosmo-1b.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
43
+ cosmo-1b.Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
44
+ cosmo-1b.Q5_0.gguf filter=lfs diff=lfs merge=lfs -text
45
+ cosmo-1b.Q5_1.gguf filter=lfs diff=lfs merge=lfs -text
46
+ cosmo-1b.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
47
+ cosmo-1b.Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
48
+ cosmo-1b.Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
49
+ cosmo-1b.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: "cosmo-1b Quantized in GGUF"
3
+ tags:
4
+ - GGUF
5
+ language: en
6
+ ---
7
+ ![Image description](https://i.postimg.cc/MGwhtFfF/tsune-fixed.png)
8
+
9
+ # Tsunemoto GGUF's of cosmo-1b
10
+
11
+ This is a GGUF quantization of cosmo-1b.
12
+
13
+ ## Original Repo Link:
14
+ [Original Repository](https://huggingface.co/HuggingFaceTB/cosmo-1b)
15
+
16
+ ## Original Model Card:
17
+ ---
18
+
19
+ # Model Summary
20
+ This is a 1.8B model trained on [Cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia) synthetic dataset.
21
+
22
+ # Training dataset
23
+ The training corpus consisted of 30B tokens, 25B of which are synthetic from Cosmopedia. Since we didn't explore the synthetic generation of code, we augmented the dataset with 5B tokens of non-synthetic sources like the `code-python-0.60-to-1.00` and `web-0.50-to-1.00` subsets of [AutoMathText](https://huggingface.co/datasets/math-ai/AutoMathText). We also added 1M files from [The Stack](https://huggingface.co/datasets/bigcode/the-stack)'s Jupyter Notebooks, converted to script. They tend to have educational code interleaved with text.
24
+ We also included [ultrachat](https://huggingface.co/datasets/stingning/ultrachat) formatted in the chat format of `LlaMa` models, so we don't have to instruction-tune the model after the pre-training. Additionally, we upsampled twice the data from these seed sources twice to help with commonsense and reasoning: stories, AutoMathText & KhanAcademy.
25
+
26
+ We trained for 6 epochs, resulting in a model trained on 180B tokens with a sequence length of 2k, a global batch size of 1.3M tokens and a learning rate of 3e-4 with a cosine schedule for 140k steps.
27
+ We used the tokenizer from [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1/).
28
+
29
+ # How to use
30
+
31
+ Although the model wasn't instruction-tuned after the pre-training. However, given that included UltraChat in the pre-training , you can use it in a Chat format using:
32
+
33
+ ```python
34
+ from transformers import AutoModelForCausalLM, AutoTokenizer
35
+
36
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
37
+
38
+ tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/cosmo-1b")
39
+ model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/cosmo-1b").to(device)
40
+ prompt = "Generate a story involving a dog, an astronaut and a baker"
41
+ prompt= tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=False)
42
+ inputs = tokenizer(prompt, return_tensors="pt").to(device)
43
+
44
+ output = model.generate(**inputs, max_length=300, do_sample=True, temperature=0.6, top_p=0.95, repetition_penalty=1.2)
45
+ print(tokenizer.decode(output[0]))
46
+ ```
47
+ Output:
48
+ ```
49
+ <s><s> [INST] Generate a story involving a dog, an astronaut and a baker [/INST] Once upon a time, there was a sweet little terrier named Max who lived in the bustling city of New York. He loved nothing more than chasing after his owner, Dr. Sarah Johnson, as she worked tirelessly on her latest invention - a spaceship that would take humans to distant galaxies!
50
+
51
+ One day, Dr. Johnson decided it was time for her to leave Earth's atmosphere behind and embark on this exciting adventure with her loyal companion, Max. She knew he had always been fascinated by space travel, so she hoped he would be just as excited about the journey ahead.
52
+
53
+ As they boarded their rocket ship and blasted off into outer space, Max felt both nervous and thrilled at the same time. His ears perked up every time they passed clouds or saw stars twinkling far out from earth. But as days turned into weeks, Max started feeling homesick. The vast emptiness around him made him feel lonely and isolated.
54
+
55
+ Meanwhile back on planet Earth, Mr. Baker was busy baking cookies when suddenly, an idea popped into his head. Why not send some treats along with Dr. Johnson's family? It might make them all feel better knowing that someone else was also having fun exploring the universe.
56
+ ```
57
+
58
+ You can also use the model in text completion mode i.e without applying the chat template, but it might not follow isntructions.
59
+
60
+ ```python
61
+ from transformers import AutoModelForCausalLM, AutoTokenizer
62
+
63
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
64
+
65
+ tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/cosmo-1b")
66
+ model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/cosmo-1b").to(device)
67
+ prompt = "Photosynthesis is"
68
+
69
+ inputs = tokenizer(prompt, return_tensors="pt").to(device)
70
+ output = model.generate(**inputs, max_length=300, do_sample=True, temperature=0.6, top_p=0.95, repetition_penalty=1.2)
71
+ print(tokenizer.decode(output[0]))
72
+ ```
73
+ Output:
74
+ ```
75
+ <s> Photosynthesis is the process by which green plants, algae and some bacteria convert light energy into chemical energy in order to fuel their metabolic processes. The reaction takes place within specialized cells called chloroplasts. This article focuses on the electron transport chain (ETC), a critical part of photosystem II where most of the solar-driven electrons are passed through before being reduced to water.
76
+ ```
77
+ # Evaluation
78
+ Below are the evaluation results of Cosmo-1B. The model is better than TinyLlama 1.1B on ARC-easy, ARC-challenge, OpenBookQA and MMLU, and has comparable performance to Qwen-1.5-1B on ARC-challenge and OpenBookQA.
79
+ However, we notice some perfoamnce gaps compared to Phi-1.5 suggesting a better synthetic generation quality which can be related to the LLM used for generation, topic coverage or prompts.
80
+
81
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/GgWzl6k9BO9jGhGd5O45y.png)
82
+
83
+ # Limitations
84
+
85
+ This is a small 1.8B model trained on synthetic data, so it might hallucinate, give incomplete or incorrect answers.
86
+
87
+ # Training
88
+
89
+ ## Model
90
+
91
+ - **Architecture:** Llama-2
92
+ - **Pretraining steps:** 120k
93
+ - **Pretraining tokens:** 180B
94
+ - **Precision:** bfloat16
95
+
96
+ ## Hardware
97
+
98
+ - **GPUs:** 160 H100
99
+ - **Training time:** 15hours
100
+
101
+ The training loss:
102
+
103
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/rJobY7F6tqTAvIox1ZGKR.png)
cosmo-1b.Q2_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7815dd3b28dba1fd8a6341fe64c0b79112fa22ef34e3012c402a2b94bffa0c0
3
+ size 666220032
cosmo-1b.Q3_K_L.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e2445bbfa2cc1b3133f5b493fbdeca0c72636aa8b82e0ec1c1226f6f5ba8d46
3
+ size 930825728
cosmo-1b.Q3_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ec9279948cc4267ff24750c10931305550243db0834f854e9b0a7b9bbfcd51b
3
+ size 858473984
cosmo-1b.Q3_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc2efc0ae4a4e58a1880889d7f95c236e10759a1e8d5f23eb51363bdb60554ff
3
+ size 775112192
cosmo-1b.Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a2555db0878f5c69d60df3d518e077123ff9b1b400fd05fc2fb50f29001fcd2f
3
+ size 997725696
cosmo-1b.Q4_1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:153e8ea9465abe73f928b6e7709402899a29256fef3e79bcec4ac6d01b1543a7
3
+ size 1102484992
cosmo-1b.Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a11bab5481b772e6dee06cb7b20792dd1e81413c612d47707af33a0550b1989e
3
+ size 1062606336
cosmo-1b.Q4_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:db3cfe23b5daf645f2dadf5d81cd2b156645e37c6dae8eaee7e34d83b950972c
3
+ size 1006114304
cosmo-1b.Q5_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0fedde974d371e3790a73d12c83845560de84569dea60ef97e49a0e9b970e7d9
3
+ size 1207244288
cosmo-1b.Q5_1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28eefab2de7053b6bf165e62cd4a33cfca987b293bcd079cfb5309ab639449d7
3
+ size 1312003584
cosmo-1b.Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f300ead874f72b6d0383d3f25815f9bc5541bde41bdb0656959a1e6ec74952d
3
+ size 1240667648
cosmo-1b.Q5_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:99037af0bb15a4419cff749cc04ed364cf330b810cf01f910b8e86dadf6f9c1d
3
+ size 1207244288
cosmo-1b.Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c34df1b64dc8c7daf31024c52a43731d091fd02303a390ed9244f344dd59a30
3
+ size 1429857792
cosmo-1b.Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ec3a04f5fb12535468f885a17fb2edcb2a429869ca6d04fc49ce6d29dff233a
3
+ size 1851672064