sumo43 commited on
Commit
5e75229
1 Parent(s): 2a3e2e9

Model save

Browse files
README.md CHANGED
@@ -1,15 +1,12 @@
1
  ---
2
- base_model: lectura/TinyLlama-120M-news
 
3
  tags:
4
- - alignment-handbook
5
- - trl
6
- - sft
7
- - generated_from_trainer
8
  - trl
9
  - sft
10
  - generated_from_trainer
11
  datasets:
12
- - HuggingFaceH4/ultrachat_200k
13
  model-index:
14
  - name: zephyr-7b-sft-full
15
  results: []
@@ -20,9 +17,9 @@ should probably proofread and complete it, then remove this comment. -->
20
 
21
  # zephyr-7b-sft-full
22
 
23
- This model is a fine-tuned version of [lectura/TinyLlama-120M-news](https://huggingface.co/lectura/TinyLlama-120M-news) on the HuggingFaceH4/ultrachat_200k dataset.
24
  It achieves the following results on the evaluation set:
25
- - Loss: 2.2349
26
 
27
  ## Model description
28
 
@@ -45,6 +42,7 @@ The following hyperparameters were used during training:
45
  - train_batch_size: 16
46
  - eval_batch_size: 8
47
  - seed: 42
 
48
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
  - lr_scheduler_type: constant
50
  - lr_scheduler_warmup_ratio: 0.1
@@ -54,7 +52,7 @@ The following hyperparameters were used during training:
54
 
55
  | Training Loss | Epoch | Step | Validation Loss |
56
  |:-------------:|:-----:|:----:|:---------------:|
57
- | 2.2279 | 1.0 | 8969 | 2.2349 |
58
 
59
 
60
  ### Framework versions
 
1
  ---
2
+ license: apache-2.0
3
+ base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
4
  tags:
 
 
 
 
5
  - trl
6
  - sft
7
  - generated_from_trainer
8
  datasets:
9
+ - generator
10
  model-index:
11
  - name: zephyr-7b-sft-full
12
  results: []
 
17
 
18
  # zephyr-7b-sft-full
19
 
20
+ This model is a fine-tuned version of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) on the generator dataset.
21
  It achieves the following results on the evaluation set:
22
+ - Loss: 2.1644
23
 
24
  ## Model description
25
 
 
42
  - train_batch_size: 16
43
  - eval_batch_size: 8
44
  - seed: 42
45
+ - distributed_type: multi-GPU
46
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
  - lr_scheduler_type: constant
48
  - lr_scheduler_warmup_ratio: 0.1
 
52
 
53
  | Training Loss | Epoch | Step | Validation Loss |
54
  |:-------------:|:-----:|:----:|:---------------:|
55
+ | 2.2013 | 1.0 | 8969 | 2.1644 |
56
 
57
 
58
  ### Framework versions
all_results.json CHANGED
@@ -1,13 +1,8 @@
1
  {
2
  "epoch": 1.0,
3
- "eval_loss": 2.2348995208740234,
4
- "eval_runtime": 2502.8496,
5
- "eval_samples": 23109,
6
- "eval_samples_per_second": 6.345,
7
- "eval_steps_per_second": 0.793,
8
- "train_loss": 2.666482654486044,
9
- "train_runtime": 30048.5081,
10
  "train_samples": 207864,
11
- "train_samples_per_second": 4.775,
12
- "train_steps_per_second": 0.298
13
  }
 
1
  {
2
  "epoch": 1.0,
3
+ "train_loss": 2.789915239123786,
4
+ "train_runtime": 47169.5182,
 
 
 
 
 
5
  "train_samples": 207864,
6
+ "train_samples_per_second": 3.042,
7
+ "train_steps_per_second": 0.19
8
  }
generation_config.json CHANGED
@@ -1,7 +1,8 @@
1
  {
2
- "_from_model_config": true,
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
 
 
5
  "transformers_version": "4.39.0.dev0",
6
  "use_cache": false
7
  }
 
1
  {
 
2
  "bos_token_id": 1,
3
  "eos_token_id": 2,
4
+ "max_length": 2048,
5
+ "pad_token_id": 0,
6
  "transformers_version": "4.39.0.dev0",
7
  "use_cache": false
8
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b2ee5e731eddb04f5e6ce3fd8f40d730c25f51406ceb10086d189b3be0f5046b
3
  size 2201749928
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c54963b4ac54779cd0d017c620f11d9be54fd744b724b68336883377e1c3049
3
  size 2201749928
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "epoch": 1.0,
3
- "train_loss": 2.666482654486044,
4
- "train_runtime": 30048.5081,
5
  "train_samples": 207864,
6
- "train_samples_per_second": 4.775,
7
- "train_steps_per_second": 0.298
8
  }
 
1
  {
2
  "epoch": 1.0,
3
+ "train_loss": 2.789915239123786,
4
+ "train_runtime": 47169.5182,
5
  "train_samples": 207864,
6
+ "train_samples_per_second": 3.042,
7
+ "train_steps_per_second": 0.19
8
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff