Model Card for Mobius-12B-base-m1

The Mobius-12B-base-m1 Large Language Model (LLM) is a pretrained model based on RWKV v5 arch. We utilized 0.01 billion tokens to conduct post-training on this model for alignment benchmarks, excluding the utilization of DPO and SFT. The process took approximately 10 hours, employing 4 * a800.

Warning

This repo contains weights that are not compatible with Hugging Face transformers library yet. But you can try thisPR as well. RWKV runner or AI00 server also work.

Instruction|Chat format

This format must be strictly respected, otherwise the model will generate sub-optimal outputs.

The template used to build a prompt for the Instruct model is defined as follows:

User: {Instruction|prompt}\n\nAssistant:

Run the model

need to convert checkpoint to HF format

Need to install this PR pip install -e git://github.com/BBuf/transformers.git

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("TimeMobius/Mobius-12B-base-m1", torch_dtype=torch.float16).to(0)
tokenizer = AutoTokenizer.from_pretrained("TimeMobius/Mobius-12B-base-m1", trust_remote_code=True)

text = "x"
prompt = f'Question: {text.strip()}\n\nAnswer:'

inputs = tokenizer(prompt, return_tensors="pt").to(0)
output = model.generate(inputs["input_ids"], max_new_tokens=40)
print(tokenizer.decode(output[0].tolist(), skip_special_tokens=True))

Limitations

The Mobius base m1 is the base model can be easily fine-tuned to achieve compelling performance. if you wanna better benchmark results use DPO and SFT ,details in readme

Benchmark

Mobius-12B-base-m1
lambda ppl 3.41
lambda 0.72
piqa 0.78
hellaswag 10 shots 0.72
winogrande 0.68
arc_challenge 25shots 0.47
arc_easy 0.73
openbookqa 0.40
sciq 0.93

@TimeMobius

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.