Obtained from freecs/ThetaWave-7B after SFT fine tuning.
Open-Orca/SlimOrca datasets were used.
The model does not currently support system_prompt because it uses mistral's chat_template, and the next release is in training to switch to the chatml template to support system_prompt. system_prompt can be implemented if you manually change the chat_template, but the After testing, this seems to degrade the model performance.
More model details will be released...
Vllm deployment command
# Single graphics card
python /path/to/vllm/vllm/entrypoints/openai/api_server.py \
--model '/path/to/ThetaWave-7B-sft' \
--tokenizer '/path/to/ThetaWave-7B-sft' \
--tokenizer-mode auto \
--dtype float16 \
--enforce-eager \
--host 0.0.0.0 \
--port 6000 \
--disable-log-stats \
--disable-log-requests
# Dual graphics cards
python /path/to/vllm/vllm/entrypoints/openai/api_server.py \
--model '/path/to/ThetaWave-7B-sft' \
--tokenizer '/path/to/ThetaWave-7B-sft' \
--tokenizer-mode auto \
--dtype float16 \
--enforce-eager \
--tensor-parallel-size 2 \
--worker-use-ray \
--engine-use-ray \
--host 0.0.0.0 \
--port 6000 \
--disable-log-stats \
--disable-log-requests
Try it directly:
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained("Liangmingxin/ThetaWave-7B-sft")
tokenizer = AutoTokenizer.from_pretrained("Liangmingxin/ThetaWave-7B-sft")
messages = [
{"role": "user", "content": "Who are you?"},
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
- Downloads last month
- 121
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.