SorawitChok commited on
Commit
8094257
·
verified ·
1 Parent(s): 95eec68

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -73
README.md CHANGED
@@ -46,78 +46,6 @@ SeaLLMs is tailored for handling a wide range of languages spoken in the SEA reg
46
  This page introduces the SeaLLMs-v3-7B-Chat model, specifically fine-tuned to follow human instructions effectively for task completion, making it directly applicable to your applications.
47
 
48
 
49
- ### Get started with `Transformers`
50
-
51
- To quickly try the model, we show how to conduct inference with `transformers` below. Make sure you have installed the latest transformers version (>4.40).
52
-
53
- ```python
54
- from transformers import AutoModelForCausalLM, AutoTokenizer
55
-
56
- device = "cuda" # the device to load the model onto
57
-
58
- model = AutoModelForCausalLM.from_pretrained(
59
- "SeaLLMs/SeaLLM3-7B-chat",
60
- torch_dtype=torch.bfloat16,
61
- device_map=device
62
- )
63
- tokenizer = AutoTokenizer.from_pretrained("SeaLLMs/SeaLLM3-7B-chat")
64
-
65
- # prepare messages to model
66
- prompt = "Hiii How are you?"
67
- messages = [
68
- {"role": "system", "content": "You are a helpful assistant."},
69
- {"role": "user", "content": prompt}
70
- ]
71
-
72
- text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
73
- model_inputs = tokenizer([text], return_tensors="pt").to(device)
74
- print(f"Formatted text:\n {text}")
75
- print(f"Model input:\n {model_inputs}")
76
-
77
- generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=True, eos_token_id=tokenizer.eos_token_id)
78
- generated_ids = [
79
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
80
- ]
81
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
82
-
83
- print(f"Response:\n {response[0]}")
84
- ```
85
-
86
- You can also utilize the following code snippet, which uses the streamer `TextStreamer` to enable the model to continue conversing with you:
87
-
88
- ```python
89
- from transformers import AutoModelForCausalLM, AutoTokenizer
90
- from transformers import TextStreamer
91
-
92
- device = "cuda" # the device to load the model onto
93
-
94
- model = AutoModelForCausalLM.from_pretrained(
95
- "SeaLLMs/SeaLLM3-7B-chat",
96
- torch_dtype=torch.bfloat16,
97
- device_map=device
98
- )
99
- tokenizer = AutoTokenizer.from_pretrained("SeaLLMs/SeaLLM3-7B-chat")
100
-
101
- # prepare messages to model
102
- messages = [
103
- {"role": "system", "content": "You are a helpful assistant."},
104
- ]
105
-
106
- while True:
107
- prompt = input("User:")
108
- messages.append({"role": "user", "content": prompt})
109
- text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
110
- model_inputs = tokenizer([text], return_tensors="pt").to(device)
111
-
112
- streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
113
- generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, streamer=streamer)
114
- generated_ids = [
115
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
116
- ]
117
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
118
- messages.append({"role": "assistant", "content": response})
119
- ```
120
-
121
  ### Inference with `vllm`
122
 
123
  You can also conduct inference with [vllm](https://docs.vllm.ai/en/stable/index.html), which is a fast and easy-to-use library for LLM inference and serving. To use vllm, first install the latest version via `pip install vllm`.
@@ -130,7 +58,7 @@ prompts = [
130
  "Can you speak Indonesian?"
131
  ]
132
 
133
- llm = LLM(ckpt_path, dtype="bfloat16")
134
  sparams = SamplingParams(temperature=0.1, max_tokens=512)
135
  outputs = llm.generate(prompts, sparams)
136
 
 
46
  This page introduces the SeaLLMs-v3-7B-Chat model, specifically fine-tuned to follow human instructions effectively for task completion, making it directly applicable to your applications.
47
 
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  ### Inference with `vllm`
50
 
51
  You can also conduct inference with [vllm](https://docs.vllm.ai/en/stable/index.html), which is a fast and easy-to-use library for LLM inference and serving. To use vllm, first install the latest version via `pip install vllm`.
 
58
  "Can you speak Indonesian?"
59
  ]
60
 
61
+ llm = LLM("SorawitChok/SeaLLM3-7B-Chat-AWQ", quantization="AWQ")
62
  sparams = SamplingParams(temperature=0.1, max_tokens=512)
63
  outputs = llm.generate(prompts, sparams)
64