AstraMindAI
/

AstraQuasar-4B

@@ -1,6 +1,7 @@
 ---
-license:
-- other
 language:
 - en
 pipeline_tag: text-generation
@@ -25,31 +26,38 @@ AstraQuasar-4B-v.0.1 at the moment is an under trained model. Serving as a demon
 One of the key milestones achieved by AstraQuasar-4B is its successful application of backpropagation on the duplication trick, setting a precedent for future research and development in this area.
 Our model's architecture is fully compatible with leading training frameworks such as [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) and [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory), ensuring seamless integration into existing workflows leveraging the standard Hugging Face Transformers library.
 ## Example:
 AstraQuasar-4B can be easily instantiated using the Hugging Face Transformers library:
-    from transformers import AutoTokenizer, AutoModelForCausalLM
-    model = AutoModelForCausalLM.from_pretrained("AstraMindAI/AstraQuasar-4B", trust_remote_code=True)
-    tokenizer = AutoTokenizer.from_pretrained("AstraMindAI/AstraQuasar-4B")
-    # you can optionally disable the duplicate trick
-    # model.model.duplicate_trick = False
-    # You can also disable the duplicate gradient calculation during training
-    # model.model.duplicate_grad = False
-    # You can specify the layer ranges for the duplicate trick
-    # model.model.layer_ranges = [(0, 16),(8, 24),(17, 32),(25, 40),(33, 49),(40, 56)]
-    prompt = "This is an example script ."
-    inputs = tokenizer(prompt, return_tensors="pt")
-    # Generate
-    generate_ids = model.generate(inputs.input_ids, max_length=30)
-    tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
 Pre-training and fine-tuning can be performed using **accelerate** or **deepspeed**.

 ---
+license: other
+license_name: quasar-license
+license_link: https://huggingface.co/AstraMindAI/AstraQuasar-4B/blob/main/LICENSE
 language:
 - en
 pipeline_tag: text-generation
 One of the key milestones achieved by AstraQuasar-4B is its successful application of backpropagation on the duplication trick, setting a precedent for future research and development in this area.
+The use of the duplicate trick had shown to instantly decrease the loss by ~21% with no added instability
+<p align="center">
+<img src="https://cdn-uploads.huggingface.co/production/uploads/644ba0c76ebb3ebf7264dbe9/V0QJe2S1y7pJfukFArsQ_.png"/>
+  </p>
 Our model's architecture is fully compatible with leading training frameworks such as [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) and [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory), ensuring seamless integration into existing workflows leveraging the standard Hugging Face Transformers library.
 ## Example:
 AstraQuasar-4B can be easily instantiated using the Hugging Face Transformers library:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained("AstraMindAI/AstraQuasar-4B", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("AstraMindAI/AstraQuasar-4B")
+# you can optionally disable the duplicate trick
+# model.model.duplicate_trick = False
+# you can also disable the duplicate gradient calculation during training
+# model.model.duplicate_grad = False
+# You can specify the layer ranges for the duplicate trick
+# model.model.layer_ranges = [(0, 16),(8, 24),(17, 32),(25, 40),(33, 49),(40, 56)]
+prompt = "This is an example script ."
+inputs = tokenizer(prompt, return_tensors="pt")
+# Generate
+generate_ids = model.generate(inputs.input_ids, max_length=30)
+tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+```
 Pre-training and fine-tuning can be performed using **accelerate** or **deepspeed**.