SorawitChok
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -46,78 +46,6 @@ SeaLLMs is tailored for handling a wide range of languages spoken in the SEA reg
|
|
46 |
This page introduces the SeaLLMs-v3-7B-Chat model, specifically fine-tuned to follow human instructions effectively for task completion, making it directly applicable to your applications.
|
47 |
|
48 |
|
49 |
-
### Get started with `Transformers`
|
50 |
-
|
51 |
-
To quickly try the model, we show how to conduct inference with `transformers` below. Make sure you have installed the latest transformers version (>4.40).
|
52 |
-
|
53 |
-
```python
|
54 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
55 |
-
|
56 |
-
device = "cuda" # the device to load the model onto
|
57 |
-
|
58 |
-
model = AutoModelForCausalLM.from_pretrained(
|
59 |
-
"SeaLLMs/SeaLLM3-7B-chat",
|
60 |
-
torch_dtype=torch.bfloat16,
|
61 |
-
device_map=device
|
62 |
-
)
|
63 |
-
tokenizer = AutoTokenizer.from_pretrained("SeaLLMs/SeaLLM3-7B-chat")
|
64 |
-
|
65 |
-
# prepare messages to model
|
66 |
-
prompt = "Hiii How are you?"
|
67 |
-
messages = [
|
68 |
-
{"role": "system", "content": "You are a helpful assistant."},
|
69 |
-
{"role": "user", "content": prompt}
|
70 |
-
]
|
71 |
-
|
72 |
-
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
73 |
-
model_inputs = tokenizer([text], return_tensors="pt").to(device)
|
74 |
-
print(f"Formatted text:\n {text}")
|
75 |
-
print(f"Model input:\n {model_inputs}")
|
76 |
-
|
77 |
-
generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=True, eos_token_id=tokenizer.eos_token_id)
|
78 |
-
generated_ids = [
|
79 |
-
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
80 |
-
]
|
81 |
-
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
|
82 |
-
|
83 |
-
print(f"Response:\n {response[0]}")
|
84 |
-
```
|
85 |
-
|
86 |
-
You can also utilize the following code snippet, which uses the streamer `TextStreamer` to enable the model to continue conversing with you:
|
87 |
-
|
88 |
-
```python
|
89 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
90 |
-
from transformers import TextStreamer
|
91 |
-
|
92 |
-
device = "cuda" # the device to load the model onto
|
93 |
-
|
94 |
-
model = AutoModelForCausalLM.from_pretrained(
|
95 |
-
"SeaLLMs/SeaLLM3-7B-chat",
|
96 |
-
torch_dtype=torch.bfloat16,
|
97 |
-
device_map=device
|
98 |
-
)
|
99 |
-
tokenizer = AutoTokenizer.from_pretrained("SeaLLMs/SeaLLM3-7B-chat")
|
100 |
-
|
101 |
-
# prepare messages to model
|
102 |
-
messages = [
|
103 |
-
{"role": "system", "content": "You are a helpful assistant."},
|
104 |
-
]
|
105 |
-
|
106 |
-
while True:
|
107 |
-
prompt = input("User:")
|
108 |
-
messages.append({"role": "user", "content": prompt})
|
109 |
-
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
110 |
-
model_inputs = tokenizer([text], return_tensors="pt").to(device)
|
111 |
-
|
112 |
-
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
|
113 |
-
generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, streamer=streamer)
|
114 |
-
generated_ids = [
|
115 |
-
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
116 |
-
]
|
117 |
-
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
118 |
-
messages.append({"role": "assistant", "content": response})
|
119 |
-
```
|
120 |
-
|
121 |
### Inference with `vllm`
|
122 |
|
123 |
You can also conduct inference with [vllm](https://docs.vllm.ai/en/stable/index.html), which is a fast and easy-to-use library for LLM inference and serving. To use vllm, first install the latest version via `pip install vllm`.
|
@@ -130,7 +58,7 @@ prompts = [
|
|
130 |
"Can you speak Indonesian?"
|
131 |
]
|
132 |
|
133 |
-
llm = LLM(
|
134 |
sparams = SamplingParams(temperature=0.1, max_tokens=512)
|
135 |
outputs = llm.generate(prompts, sparams)
|
136 |
|
|
|
46 |
This page introduces the SeaLLMs-v3-7B-Chat model, specifically fine-tuned to follow human instructions effectively for task completion, making it directly applicable to your applications.
|
47 |
|
48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
### Inference with `vllm`
|
50 |
|
51 |
You can also conduct inference with [vllm](https://docs.vllm.ai/en/stable/index.html), which is a fast and easy-to-use library for LLM inference and serving. To use vllm, first install the latest version via `pip install vllm`.
|
|
|
58 |
"Can you speak Indonesian?"
|
59 |
]
|
60 |
|
61 |
+
llm = LLM("SorawitChok/SeaLLM3-7B-Chat-AWQ", quantization="AWQ")
|
62 |
sparams = SamplingParams(temperature=0.1, max_tokens=512)
|
63 |
outputs = llm.generate(prompts, sparams)
|
64 |
|