xmanii
/

Llama3-8b-simorgh

Safetensors

Model card Files Files and versions Community

xmanii commited on Jun 17

Commit

6107dec

•

1 Parent(s): 0f6287a

Update README.md

Browse files

Files changed (1) hide show

README.md +49 -34

README.md CHANGED Viewed

@@ -1,45 +1,60 @@
-Model Information
-Developed by: xmanii License: Apache-2.0 Finetuned from model: unsloth/llama-3-8b-instruct-bnb-4bit
-This LLaMA model was fine-tuned on a unique Persian dataset of Alpaca chat conversations, consisting of approximately 8,000 rows. Our training process utilized two H100 GPUs, completing in just under 1 hour. We leveraged the power of Unsloth and Hugging Face's TRL library to accelerate our training process by 2x.
-<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>
-This model is open-source, and we invite the community to use and build upon our work. The fine-tuned LLaMA model is designed to improve Persian conversation capabilities, and we hope it will contribute to the advancement of natural language processing in the Persian language.
-Using Adapters with Unsloth
-To run the model with adapters, you can use the following code:
-(you need unsloth package)
-import torch
-from unsloth import FastLanguageModel
-from unsloth.chat_templates import get_chat_template
-model_save_path = "path to the download folder"  # Adjust this path as needed
-model, tokenizer = FastLanguageModel.from_pretrained(
-    model_name=model_save_path,
-    max_seq_length=4096,
-    load_in_4bit=True,
-)
-FastLanguageModel.for_inference(model)  # Enable native 2x faster inference
-tokenizer = get_chat_template(
-    tokenizer,
-    chat_template="llama-3",  # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
-    mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},  # ShareGPT style
-)
-messages = [    {"from": "human", "value": "your prompt"},]
-inputs = tokenizer.apply_chat_template(
-    messages,
-    tokenize=True,
-    add_generation_prompt=True,  # Must add for generation
-    return_tensors="pt",
-).to("cuda")
-outputs = model.generate(input_ids=inputs, max_new_tokens=2048, use_cache=True)
-response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
-print(response)
-We are working on quantizing the models and bringing them to ollama.

+model-index:
+  - name: xmanii/llama-3-8b-instruct-bnb-4bit-persian
+    description: |
+      **Model Information**
+      **Developed by:** xmanii
+      **License:** Apache-2.0
+      **Finetuned from model:** unsloth/llama-3-8b-instruct-bnb-4bit
+      **Model Description**
+      This LLaMA model was fine-tuned on a unique Persian dataset of Alpaca chat conversations, consisting of approximately 8,000 rows. Our training process utilized two H100 GPUs, completing in just under 1 hour. We leveraged the power of Unsloth and Hugging Face's TRL library to accelerate our training process by 2x.
+      ![Unsloth Made with Love](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png)
+      **Open-Source Contribution**
+      This model is open-source, and we invite the community to use and build upon our work. The fine-tuned LLaMA model is designed to improve Persian conversation capabilities, and we hope it will contribute to the advancement of natural language processing in the Persian language.
+      **Using Adapters with Unsloth**
+      To run the model with adapters, you can use the following code:
+      ```python
+      import torch
+      from unsloth import FastLanguageModel
+      from unsloth.chat_templates import get_chat_template
+      model_save_path = "path to the download folder"  # Adjust this path as needed
+      model, tokenizer = FastLanguageModel.from_pretrained(
+          model_name=model_save_path,
+          max_seq_length=4096,
+          load_in_4bit=True,
+      )
+      FastLanguageModel.for_inference(model)  # Enable native 2x faster inference
+      tokenizer = get_chat_template(
+          tokenizer,
+          chat_template="llama-3",  # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
+          mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},  # ShareGPT style
+      )
+      messages = [{"from": "human", "value": "your prompt"}]
+      inputs = tokenizer.apply_chat_template(
+          messages,
+          tokenize=True,
+          add_generation_prompt=True,  # Must add for generation
+          return_tensors="pt",
+      ).to("cuda")
+      outputs = model.generate(input_ids=inputs, max_new_tokens=2048, use_cache=True)
+      response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
+      print(response)
+      ```
+      **Future Work**
+      We are working on quantizing the models and bringing them to ollama.