File size: 4,891 Bytes
5151d22 6c0ec9e e45abbd 5151d22 6c0ec9e 7b4cc77 6c0ec9e b1754d8 6c0ec9e 7d21752 e45abbd 7d21752 e45abbd 7d21752 e190d3a 47c44fa e190d3a e45abbd 6c0ec9e 7d21752 6c0ec9e 7d21752 6c0ec9e 7d21752 6c0ec9e 7d21752 6c0ec9e 7d21752 6c0ec9e 7d21752 6c0ec9e 7d21752 6c0ec9e 7d21752 6c0ec9e 7d21752 e45abbd 6c0ec9e 0a53f89 7d21752 4c25b74 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
---
library_name: peft
license: apache-2.0
language:
- mn
- en
tags:
- Mongolian
- QLora
- Llama3
- Instructed-model
---
### Model Description
Mongolian-Llama3 implementation in Chat UI
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1LC0xx4i9xqFmwn9l8T6vw25RIr-BP0Tq?usp=sharing])
Mongolian-Llama3 is the first open source instruction-tuned language model for Mongolian & English users with various abilities such as roleplaying & tool-using built upon the quantized Meta-Llama-3-8B model.
Developed by: Dorjzodovsuren
License: Llama-3 License
Base Model: llama-3-8b-bnb-4bit
Model Size: 4.65B
Context length: 8K
## Bias, Risks, and Limitations
To combat fake news, current strategies rely heavily on synthetic and translated data. However, these approaches have inherent biases, risks, and limitations:
1. **Synthetic Data Bias**: Algorithms may inadvertently perpetuate biases present in training data.
2. **Translation Inaccuracy**: Translations can distort meaning or lose context, leading to misinformation.
3. **Cultural Nuances**: Synthetic and translated data may miss cultural intricacies, risking amplification of stereotypes.
4. **Algorithmic Limits**: Effectiveness is constrained by algorithm capabilities and training data quality.
5. **Dependency on Data**: Accuracy hinges on quality and representativeness of training data.
6. **Adversarial Attacks**: Malicious actors can exploit vulnerabilities to manipulate content.
7. **Different answer based on language**: Answer might be a bit different based on language.
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
Due to hallucinations and pretraining datasets characteristics, some information might be misleading, and answer might be a bit different based on language.
Please ask in <b>Mongolian</b> if possible.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
import torch
import gradio as gr
from threading import Thread
from peft import PeftModel, PeftConfig
from unsloth import FastLanguageModel
from transformers import TextStreamer
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer
config = PeftConfig.from_pretrained("Dorjzodovsuren/Mongolian_llama3")
model = AutoModelForCausalLM.from_pretrained("unsloth/llama-3-8b-bnb-4bit", torch_dtype = torch.float16)
model = PeftModel.from_pretrained(model, "Dorjzodovsuren/Mongolian_llama3")
#load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Dorjzodovsuren/Mn_llama3")
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
# Enable native 2x faster inference
FastLanguageModel.for_inference(model)
# Create a text streamer
text_streamer = TextStreamer(tokenizer, skip_prompt=False,skip_special_tokens=True)
# Get the device based on GPU availability
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# Move model into device
model = model.to(device)
class StopOnTokens(StoppingCriteria):
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
stop_ids = [29, 0]
for stop_id in stop_ids:
if input_ids[0][-1] == stop_id:
return True
return False
# Current implementation does not support conversation based on previous conversation.
# Highly recommend to experiment on various hyper parameters to compare qualities.
def predict(message, history):
stop = StopOnTokens()
messages = alpaca_prompt.format(
message,
"",
"",
)
model_inputs = tokenizer([messages], return_tensors="pt").to(device)
streamer = TextIteratorStreamer(tokenizer, timeout=10., skip_prompt=True, skip_special_tokens=True)
generate_kwargs = dict(
model_inputs,
streamer=streamer,
max_new_tokens=1024,
top_p=0.95,
temperature=0.001,
repetition_penalty=1.1,
stopping_criteria=StoppingCriteriaList([stop])
)
t = Thread(target=model.generate, kwargs=generate_kwargs)
t.start()
partial_message = ""
for new_token in streamer:
if new_token != '<':
partial_message += new_token
yield partial_message
gr.ChatInterface(predict).launch(debug=True, share=True, show_api=True)
```
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1LC0xx4i9xqFmwn9l8T6vw25RIr-BP0Tq?usp=sharing]) |