mlinmg commited on
Commit
1697893
1 Parent(s): 0f797a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -17
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
- license:
3
- - other
 
4
  language:
5
  - en
6
  pipeline_tag: text-generation
@@ -25,31 +26,38 @@ AstraQuasar-4B-v.0.1 at the moment is an under trained model. Serving as a demon
25
 
26
  One of the key milestones achieved by AstraQuasar-4B is its successful application of backpropagation on the duplication trick, setting a precedent for future research and development in this area.
27
 
 
 
 
 
 
28
  Our model's architecture is fully compatible with leading training frameworks such as [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) and [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory), ensuring seamless integration into existing workflows leveraging the standard Hugging Face Transformers library.
29
 
30
  ## Example:
31
  AstraQuasar-4B can be easily instantiated using the Hugging Face Transformers library:
32
 
33
- from transformers import AutoTokenizer, AutoModelForCausalLM
34
-
35
- model = AutoModelForCausalLM.from_pretrained("AstraMindAI/AstraQuasar-4B", trust_remote_code=True)
36
- tokenizer = AutoTokenizer.from_pretrained("AstraMindAI/AstraQuasar-4B")
 
37
 
38
- # you can optionally disable the duplicate trick
39
- # model.model.duplicate_trick = False
40
 
41
- # You can also disable the duplicate gradient calculation during training
42
- # model.model.duplicate_grad = False
43
 
44
- # You can specify the layer ranges for the duplicate trick
45
- # model.model.layer_ranges = [(0, 16),(8, 24),(17, 32),(25, 40),(33, 49),(40, 56)]
46
 
47
- prompt = "This is an example script ."
48
- inputs = tokenizer(prompt, return_tensors="pt")
49
 
50
- # Generate
51
- generate_ids = model.generate(inputs.input_ids, max_length=30)
52
- tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
 
53
 
54
  Pre-training and fine-tuning can be performed using **accelerate** or **deepspeed**.
55
 
 
1
  ---
2
+ license: other
3
+ license_name: quasar-license
4
+ license_link: https://huggingface.co/AstraMindAI/AstraQuasar-4B/blob/main/LICENSE
5
  language:
6
  - en
7
  pipeline_tag: text-generation
 
26
 
27
  One of the key milestones achieved by AstraQuasar-4B is its successful application of backpropagation on the duplication trick, setting a precedent for future research and development in this area.
28
 
29
+ The use of the duplicate trick had shown to instantly decrease the loss by ~21% with no added instability
30
+ <p align="center">
31
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/644ba0c76ebb3ebf7264dbe9/V0QJe2S1y7pJfukFArsQ_.png"/>
32
+ </p>
33
+
34
  Our model's architecture is fully compatible with leading training frameworks such as [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) and [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory), ensuring seamless integration into existing workflows leveraging the standard Hugging Face Transformers library.
35
 
36
  ## Example:
37
  AstraQuasar-4B can be easily instantiated using the Hugging Face Transformers library:
38
 
39
+ ```python
40
+ from transformers import AutoTokenizer, AutoModelForCausalLM
41
+
42
+ model = AutoModelForCausalLM.from_pretrained("AstraMindAI/AstraQuasar-4B", trust_remote_code=True)
43
+ tokenizer = AutoTokenizer.from_pretrained("AstraMindAI/AstraQuasar-4B")
44
 
45
+ # you can optionally disable the duplicate trick
46
+ # model.model.duplicate_trick = False
47
 
48
+ # you can also disable the duplicate gradient calculation during training
49
+ # model.model.duplicate_grad = False
50
 
51
+ # You can specify the layer ranges for the duplicate trick
52
+ # model.model.layer_ranges = [(0, 16),(8, 24),(17, 32),(25, 40),(33, 49),(40, 56)]
53
 
54
+ prompt = "This is an example script ."
55
+ inputs = tokenizer(prompt, return_tensors="pt")
56
 
57
+ # Generate
58
+ generate_ids = model.generate(inputs.input_ids, max_length=30)
59
+ tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
60
+ ```
61
 
62
  Pre-training and fine-tuning can be performed using **accelerate** or **deepspeed**.
63