README.md · ce-lery/mistral-2b-base at main

metadata

library_name: transformers
base_model: None
tags:
  - generated_from_trainer
model-index:
  - name: trial2
    results: []
license: apache-2.0

mistral-2b-base

Welcome to my model card!

This Model feature is ...

trained by japanese
trained in two stages: patch level and token level
Suppression of unknown word generation by using byte fallback in SentencePiece tokenizer and conversion to huggingface Tokenizers format
Use of Mistral 2B

Yukkuri shite ittene!

How to use the model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "ce-lery/mistral-2b-base"
torch.set_float32_matmul_precision('high')

device = "cuda"
if (device != "cuda" and device != "cpu"):
    device = "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_path,use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_path,
                                             trust_remote_code=True,
                                             ).to(device)

prompt = "自然言語処理とは、"
inputs = tokenizer(prompt,
                   add_special_tokens=True,
                   return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(
        inputs["input_ids"],
        max_new_tokens=4096,
        do_sample=True,
        early_stopping=False,
        top_p=0.95,
        top_k=50,
        temperature=0.7,
        no_repeat_ngram_size=2,
        num_beams=3
    )

print(outputs.tolist()[0])
outputs_txt = tokenizer.decode(outputs[0])
print(outputs_txt)

Training and evaluation data

40B token. The contents are following.

Wikipedia
Wikibooks
Wikiversity
CC-100
OSCAR2109
mC4 (head 150GB)

Training procedure

Please refer ce-lery/mistral-2b-recipe.
The Guide for this repository is published here. It is written in Japanese.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 128
total_train_batch_size: 256
optimizer: Use adamw_bnb_8bit with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine_with_min_lr
lr_scheduler_warmup_steps: 1000
num_epochs: 1.0

Training results

Please refer here.

Framework versions

Transformers 4.46.2
Pytorch 2.4.0a0+f70bd71a48.nv24.06
Datasets 2.20.0
Tokenizers 0.20.3