Model Card for llm-jp-13b-instruct-full-jaster-dpo
This is a human preference optimized version of the native Japanese model llm-jp/llm-jp-13b-instruct-full-jaster-v1.0.
Model Details
Model type: transformer-based large language model
Total tokens seen: 300B
Parameters: 13B
Layers: 40
Hidden size: 5120
Heads: 40
Context length: 2048
Training
Pre-training:
Hardware: 96 A100 40GB GPUs (MDX cluster)
Software: Megatron-DeepSpeed
Instruction tuning:
Hardware: 8 A100 40GB GPUs (MDX cluster)
Software: TRL, PEFT, and DeepSpeed
Human Preference Alignment:
Hardware: Apple MPS device, M3 Max chip, 16-core CPU, 16-core neural engine, 40-core GPU / 128G unified memory
Software: PyTorch (on MPS), HugginFace Transformers, PEFT (version 0.8.2)
Tokenizer
The tokenizer of this model is based on huggingface/tokenizers unigram byte-fallback model. The vocabulary entries were converted from llm-jp-tokenizer v2.1 (50k). Please refer to README.md of llm-ja-tokenizer for details on the vocabulary construction procedure.
- Model: Hugging Face Fast Tokenizer using Unigram byte-fallback model which requires tokenizers>=0.14.0
- Training algorithm: SentencePiece Unigram byte-fallback
- Training data: a subset of the datasets for model pre-training
- Vocabulary size: 50,570 (mixed vocabulary of Japanese, English, and source code)
Model Description
This model was aligned with human preferences using an adapter approach from the PEFT library (https://github.com/huggingface/peft). The alignment was based on Direct Preference Optimization (https://arxiv.org/abs/2305.18290).
Training Data
The data used for DPO was a Japanese translation of of the original Anthropic Helpful-Harmless dataset (https://huggingface.co./datasets/Anthropic/hh-rlhf) for Reinforcement Learning from Human Feedback (https://arxiv.org/abs/2204.05862). The translation is available here: https://huggingface.co./datasets/shi3z/anthropic_hh_rlhf_japanese
Direct Use
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model_name = "llmjp/llm-jp-13b-instruct-full-jaster-dpo"
model = AutoPeftModelForCausalLM.from_pretrained(
model_name,
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
load_in_4bit=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer.encode("質問:日本の首都はどこですか?\n\n答え:", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
Author
Stephen Fitz (https://huggingface.co./stephenfitz) for LLMJP (https://huggingface.co./llm-jp)