--- license: llama2 language: - ja tags: - moe --- # youri-2x7b_dev This model is a Mixture of Experts (MoE) merger of the following two models: - [rinna/youri-7b-instruction](https://huggingface.co./rinna/youri-7b-instruction) - [rinna/youri-7b-chat](https://huggingface.co./rinna/youri-7b-chat) ## 🏆 Evaluation All scores for these benchmarks have been evaluated using the [Stability-AI/lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable). The results of the benchmark scores are stored in [benchmark_scores](https://huggingface.co./HachiML/youri-2x7b_dev/tree/main/benchmark_scores). For detailed information on the scores and the conditions under which they were obtained, please refer to this link. | Model |JCommonsenseQA(3-shot,acc.)|JNLI(3-shot,balanced acc.)|MARC-ja(0-shot,balanced acc.)|JSQuAD(2-shot,F1)|4-AVERAGE| |----------------------------------------------------------------|------:|------:|---------:|-------:|------:| |[**youri-2x7b_dev**](https://huggingface.co./HachiML/youri-2x7b_dev)| **91.15**| **71.03**| **95.90**| **91.30**| **87.34**| |[youri-7b-instruction](https://huggingface.co./rinna/youri-7b-instruction) *1| 88.83| 63.56| 93.78| 92.19| 84.59| |[youri-7b-chat](https://huggingface.co./rinna/youri-7b-chat) *1| 91.78| 70.35| 96.69| 79.62| 84.61| | Model |jaqket-v2(1-shot,F1)|xlsum(1-shot,ROUGE 2) *2|6-AVERAGE| |----------------------------------------------------------------|------:|------:|------:| |[**youri-2x7b_dev**](https://huggingface.co./HachiML/youri-2x7b_dev)| **84.59**| **25.62**| **76.59**| |[youri-7b-instruction](https://huggingface.co./rinna/youri-7b-instruction) *1| 83.92| 24.67| 75.13| |[youri-7b-chat](https://huggingface.co./rinna/youri-7b-chat) *1| 83.71| 24.21| 75.33| | Model |xwinograd(0-shot,acc.) *2|mgsm(5-shot,acc.) *2|JCoLA(2-shot,balanced acc.) *2|9-AVERAGE| |----------------------------------------------------------------|------:|------:|---------:|------:| |[**youri-2x7b_dev**](https://huggingface.co./HachiML/youri-2x7b_dev)| **81.43**| **24.80**| **59.09**| **69.43**| |[youri-7b-instruction](https://huggingface.co./rinna/youri-7b-instruction) *1| 78.94 | 17.20| 54.04| 66.35| |[youri-7b-chat](https://huggingface.co./rinna/youri-7b-chat) *1| 80.92| 25.20| 53.78| 67.36| *1 From the [rinna's LM Benchmark](https://rinnakk.github.io/research/benchmarks/lm/index.html). *2 Since there was no mention of these template versions in rinna's LM Benchmark, the scores were calculated without specifying a template. ## 🧩 Configuration The model has been made with a custom version of the [mergekit](https://github.com/cg123/mergekit) library (mixtral branch) and the following configuration: ```yaml base_model: rinna/youri-7b-chat gate_mode: hidden # one of "hidden", "cheap_embed", or "random" dtype: bfloat16 # output dtype (float32, float16, or bfloat16) experts: - source_model: rinna/youri-7b-chat positive_prompts: - "質問と回答の選択肢を入力として受け取り、選択肢から回答を選択してください。" - "前提と仮説の関係を含意、矛盾、中立の中から回答してください。" - "以下のテキストを、ポジティブまたはネガティブの感情クラスのいずれかに分類してください。" - "以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。" - source_model: rinna/youri-7b-instruction positive_prompts: - "質問に対する回答を題名と文章から一言で抽出してください。回答は名詞で答えてください。" - "与えられたニュース記事を要約してください。" - "与えられた文が文法的であるかを回答してください。" ``` The `positive_prompts` in the above configuration are extracted from the instructions of benchmarks that each model excels in. For reference on the benchmarks for each model, please see the LM Benchmark at [rinna's LM Benchmark](https://rinnakk.github.io/research/benchmarks/lm/index.html). These benchmarks provide a detailed overview of the areas where each individual model performs particularly well, guiding the effective use of the merged model in various natural language processing tasks. ## 💻 Usage ```python !pip install -q --upgrade transformers einops accelerate bitsandbytes import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "HachiML/youri-2x7b_dev" torch.set_default_device("cuda") # Load the model and tokenizer model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", load_in_4bit=True, trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained( model_name, trust_remote_code=True ) torch.set_default_device("cuda") # Create input instruction = "次の日本語を英語に翻訳してください。" input = "大規模言語モデル(だいきぼげんごモデル、英: large language model、LLM)は、多数のパラメータ(数千万から数十億)を持つ人工ニューラルネットワークで構成されるコンピュータ言語モデルで、膨大なラベルなしテキストを使用して自己教師あり学習または半教師あり学習によって訓練が行われる。" prompt = f""" 以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。 ### 指示: {instruction} ### 入力: {input} ### 応答: """ # Tokenize the input string token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt") # Generate text using the model with torch.no_grad(): output_ids = model.generate( token_ids.to(model.device), max_new_tokens=200, do_sample=True, temperature=0.5, pad_token_id=tokenizer.pad_token_id, bos_token_id=tokenizer.bos_token_id, eos_token_id=tokenizer.eos_token_id ) # Decode and print the output output = tokenizer.decode(output_ids.tolist()[0]) print(output) ```