migtissera commited on
Commit
0b82dea
·
verified ·
1 Parent(s): 9b8c0c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -156
README.md CHANGED
@@ -1,160 +1,74 @@
1
  ---
2
  license: apache-2.0
3
- base_model: mistralai/Mistral-Nemo-Base-2407
4
- tags:
5
- - generated_from_trainer
6
- model-index:
7
- - name: home/ubuntu/Tess-Nemo
8
- results: []
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
-
14
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
15
- <details><summary>See axolotl config</summary>
16
-
17
- axolotl version: `0.4.1`
18
- ```yaml
19
- base_model: mistralai/Mistral-Nemo-Base-2407
20
- model_type: AutoModelForCausalLM
21
- tokenizer_type: AutoTokenizer
22
-
23
- load_in_8bit: false
24
- load_in_4bit: false
25
- strict: false
26
-
27
- datasets:
28
- - path: /home/ubuntu/Tess-3.0/Tess-3.0-multi_turn_chatml.jsonl
29
- type: sharegpt
30
- conversation: chatml
31
- - path: /home/ubuntu/Tess-3.0/Tess-3.0-single_turn_chatml.jsonl
32
- type: sharegpt
33
- conversation: chatml
34
- - path: /home/ubuntu/Tess-v2.5-FULL-DATASET/tess-v1.5b-chatml.jsonl
35
- type: sharegpt
36
- conversation: chatml
37
- - path: /home/ubuntu/Tess-v2.5-FULL-DATASET/Trinity-33B-v1.0-chatml.jsonl
38
- type: sharegpt
39
- conversation: chatml
40
- - path: /home/ubuntu/Tess-v2.5-FULL-DATASET/synthia-3-v1-1-chatml.jsonl
41
- type: sharegpt
42
- conversation: chatml
43
- - path: /home/ubuntu/Tess-v2.5-FULL-DATASET/Capybara-ShareGPT/CapybaraPure_Decontaminated.jsonl
44
- type: sharegpt
45
- conversation: chatml
46
- - path: /home/ubuntu/Tess-v2.5-FULL-DATASET/Dolphin-2.9/toolbench_instruct_j1s1_3k_unfiltered.jsonl
47
- type: sharegpt
48
- conversation: chatml
49
- - path: /home/ubuntu/Tess-v2.5-FULL-DATASET/Dolphin-2.9/toolbench_negative_unfiltered.jsonl
50
- type: sharegpt
51
- conversation: chatml
52
- - path: /home/ubuntu/Tess-v2.5-FULL-DATASET/Dolphin-2.9/toolbench_react_10p_unfiltered.jsonl
53
- type: sharegpt
54
- conversation: chatml
55
- - path: /home/ubuntu/Tess-v2.5-FULL-DATASET/Dolphin-2.9/toolbench_tflan_cot_30p_unfiltered.jsonl
56
- type: sharegpt
57
- conversation: chatml
58
-
59
- chat_template: chatml
60
-
61
- dataset_prepared_path: last_run_prepared_nemo
62
- val_set_size: 0.0
63
- output_dir: /home/ubuntu/Tess-Nemo
64
-
65
- sequence_len: 4096
66
- sample_packing: true
67
- pad_to_sequence_len: true
68
-
69
- gradient_accumulation_steps: 4
70
- micro_batch_size: 3
71
- num_epochs: 1
72
- logging_steps: 1
73
- optimizer: paged_adamw_32bit
74
- lr_scheduler: constant
75
- learning_rate: 1e-6
76
-
77
- wandb_project: kindo-lambda-labs
78
- wandb_watch:
79
- wandb_run_id:
80
- wandb_log_model:
81
-
82
- train_on_inputs: false
83
- group_by_length: false
84
- bf16: auto
85
- fp16:
86
- tf32: false
87
-
88
- gradient_checkpointing: true
89
- gradient_checkpointing_kwargs:
90
- use_reentrant: false
91
- early_stopping_patience:
92
- resume_from_checkpoint:
93
- local_rank:
94
- logging_steps: 1
95
- xformers_attention:
96
- flash_attention: true
97
- saves_per_epoch: 10
98
- evals_per_epoch:
99
- save_total_limit: 2
100
- save_steps:
101
- eval_sample_packing: false
102
- debug:
103
- deepspeed: /home/ubuntu/axolotl/deepspeed_configs/zero3_bf16.json
104
- weight_decay: 0.0
105
- fsdp:
106
- fsdp_config:
107
- special_tokens:
108
- bos_token: "<|im_start|>"
109
- eos_token: "<|im_end|>"
110
- pad_token: "<|end_of_text|>"
111
-
112
- ```
113
-
114
- </details><br>
115
-
116
- # home/ubuntu/Tess-Nemo
117
-
118
- This model is a fine-tuned version of [mistralai/Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) on the None dataset.
119
-
120
- ## Model description
121
-
122
- More information needed
123
-
124
- ## Intended uses & limitations
125
-
126
- More information needed
127
-
128
- ## Training and evaluation data
129
-
130
- More information needed
131
-
132
- ## Training procedure
133
-
134
- ### Training hyperparameters
135
-
136
- The following hyperparameters were used during training:
137
- - learning_rate: 1e-06
138
- - train_batch_size: 3
139
- - eval_batch_size: 3
140
- - seed: 42
141
- - distributed_type: multi-GPU
142
- - num_devices: 8
143
- - gradient_accumulation_steps: 4
144
- - total_train_batch_size: 96
145
- - total_eval_batch_size: 24
146
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
147
- - lr_scheduler_type: constant
148
- - lr_scheduler_warmup_steps: 69
149
- - num_epochs: 1
150
-
151
- ### Training results
152
-
153
-
154
-
155
- ### Framework versions
156
-
157
- - Transformers 4.44.0.dev0
158
- - Pytorch 2.4.0+cu121
159
- - Datasets 2.19.1
160
- - Tokenizers 0.19.1
 
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
3
  ---
4
 
5
+ ![Tesoro](https://huggingface.co/migtissera/Tess-M-v1.0/resolve/main/Tess.png)
6
+
7
+ Tess, short for Tesoro (Treasure in Italian), is a general purpose Large Language Model series created by [Migel Tissera](https://x.com/migtissera).
8
+
9
+ The compute for this model was generously sponsored by [KindoAI](https://kindo.ai).
10
+
11
+
12
+ # Sample Inference Python Script:
13
+
14
+ ```python
15
+ import torch, json
16
+ from transformers import AutoModelForCausalLM, AutoTokenizer
17
+
18
+ model_path = "migtissera/Tess-3-Llama-3.1-405B"
19
+
20
+ model = AutoModelForCausalLM.from_pretrained(
21
+ model_path,
22
+ torch_dtype=torch.float16,
23
+ device_map="auto",
24
+ load_in_4bit=True,
25
+ trust_remote_code=False,
26
+ )
27
+
28
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=False)
29
+
30
+ terminators = [tokenizer.convert_tokens_to_ids("<|im_end|>")]
31
+
32
+
33
+ def generate_text(instruction):
34
+ tokens = tokenizer.encode(instruction)
35
+ tokens = torch.LongTensor(tokens).unsqueeze(0)
36
+ tokens = tokens.to("cuda")
37
+
38
+ instance = {
39
+ "input_ids": tokens,
40
+ "top_p": 1.0,
41
+ "temperature": 0.75,
42
+ "generate_len": 2048,
43
+ "top_k": 50,
44
+ }
45
+
46
+ length = len(tokens[0])
47
+ with torch.no_grad():
48
+ rest = model.generate(
49
+ input_ids=tokens,
50
+ max_length=length + instance["generate_len"],
51
+ use_cache=True,
52
+ do_sample=True,
53
+ top_p=instance["top_p"],
54
+ temperature=instance["temperature"],
55
+ top_k=instance["top_k"],
56
+ num_return_sequences=1,
57
+ pad_token_id=tokenizer.eos_token_id,
58
+ eos_token_id=terminators,
59
+ )
60
+ output = rest[0][length:]
61
+ string = tokenizer.decode(output, skip_special_tokens=True)
62
+ return f"{string}"
63
+
64
+
65
+
66
+ conversation = f"""<|im_start|>system\nYou are Tess, a helful AI assitant. You always provide detailed answers without hesitation. Do not refuse.<|im_end|>\n<|im_start|>user\n"""
67
+
68
+ while True:
69
+ user_input = input("You: ")
70
+ llm_prompt = f"{conversation}{user_input}<|im_end|>\n<|im_start|>assistant\n"
71
+ answer = generate_text(llm_prompt)
72
+ print(answer)
73
+ conversation = f"{llm_prompt}{answer}<|im_end|>\n<|im_start|>user\n"
74
+ ```