iarbel's picture
Update README.md
c7cc200 verified
|
raw
history blame
7.47 kB
metadata
license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - trl
  - sft
  - generated_from_trainer
base_model: mistralai/Mistral-7B-v0.1
model-index:
  - name: Cimphony-Mistral-Law-7B
    results:
      - task:
          type: text-generation
        dataset:
          type: cais/mmlu
          name: MMLU
        metrics:
          - name: International Law
            type: accuracy
            value: 0.802
            verified: false
      - task:
          type: text-generation
        dataset:
          type: cais/mmlu
          name: MMLU
        metrics:
          - name: Jurisprudence
            type: accuracy
            value: 0.704
            verified: false
      - task:
          type: text-generation
        dataset:
          type: cais/mmlu
          name: MMLU
        metrics:
          - name: Professional Law
            type: accuracy
            value: 0.416
            verified: false
      - task:
          type: text-generation
        dataset:
          type: coastalcph/lex_glue
          name: LexGLUE
        metrics:
          - name: ECtHR A
            type: balanced accuracy
            value: 0.631
            verified: false
      - task:
          type: text-generation
        dataset:
          type: coastalcph/lex_glue
          name: LexGLUE
        metrics:
          - name: LEDGAR
            type: balanced accuracy
            value: 0.741
            verified: false
      - task:
          type: text-generation
        dataset:
          type: coastalcph/lex_glue
          name: LexGLUE
        metrics:
          - name: CaseHOLD
            type: accuracy
            value: 0.776
            verified: false
      - task:
          type: text-generation
        dataset:
          type: coastalcph/lex_glue
          name: LexGLUE
        metrics:
          - name: Unfair-ToS
            type: balanced accuracy
            value: 0.809
            verified: false
pipeline_tag: text-generation

Cimphony-Mistral-Law-7B

We introduce Cimphony-Mistral-Law-7B, a fine-tuned version of mistralai/Mistral-7B-v0.1.

Cimphony’s LLMs present state-of-the-art performance on legal benchmarks, suppressing models trained on a much larger corpus with significantly more resources, even GPT-4, OpenAI’s flagship model.

Checkout and register on our https://cimphony.ai

image/png

Model description

The model was trained on 600M tokens. We use novel methods to expose the model to this corpus during training, blending a variety of legal reading comprehension tasks, as well as general language data.

Legal Evaluation Results

We evaluate on the legal splits of the MMLU benchmark, as well as LexGLUE. While both are multiple option benchmarks, prompts were adapted so that the models output a single answer. In some cases, additional post-processing was required.

Benchmarks for which the labels were A-E multiple-choice options use an accuracy mertic. Benchmarks that have a closed list of options (e.g. Unfair-ToS) use a balanced-accuracy metric, as classes may not be balanced.

Model / Benchmark International Law (MMLU) Jurisprudence (MMLU) Professional law (MMLU) ECtHR A (LexGlue) LEDGAR (LexGlue) CaseHOLD (LexGlue) Unfair-ToS (LexGlue)
Mistral-7B-Instruct-v0.2 73.6% 69.4% 41.2% 67.5% 50.6% 56.3% 36.6%
AdaptLLM 57.0% 52.8% 36.1% 51.9% 46.3% 50.0% 51.3%
Saul-7B 69.4% 63.0% 43.2% 71.2% 55.9% 65.8% 80.3%
Cimphony-7B80.2%70.4%41.6%63.1%74.1%77.6%80.9%

Training and evaluation data

Following the framework presented in AdaptLLM, we convert the raw legal text into reading comprehension. Taking inspiration from human learning via reading comprehension - practice after reading improves the ability to answer questions based on the learned knowledge.

We developed a high-quality prompt database, considering the capabilities we’d like the model to possess. LLMs were prompt with the raw text and a collection of prompts, and it returned answers, additional questions, and transformations relevant to the input data. With further post-processing of these outputs, we created our legal reading comprehension dataset.

Domain Dataset Tokens License
Legal The Pile (FreeLaw) 180M MIT
Legal LexGlue (train split only) 108M CC-BY-4.0
Legal USClassActions 12M GPL-3.0
Math (CoT) AQUA-RAT 3M Apache-2.0
Commonsense (CoT) ECQA 2.4M Apache-2.0
Reasoning (CoT) EntailmentBank 1.8M Apache-2.0
Chat UltraChat 90M MIT
Code Code-Feedback 36M Apache-2.0
Instruction OpenOrca 180M MIT

Intended uses & limitations

This model can be used for use cases involving legal domain text generation.

As with any language model, users must not solely relay on model generations. This model has not gone through a human-feedback alignment (RLHF). The model may generate responses containing hallucinations and biases.

Example use:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

tokenizer = AutoTokenizer.from_pretrained("cimphonyadmin/Cimphony-Mistral-Law-7B")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
model = PeftModel.from_pretrained(model, "cimphonyadmin/Cimphony-Mistral-Law-7B")

# Put your input here:
user_input = '''What can you tell me about ex post facto laws?'''

# Apply the prompt template
prompt = tokenizer.apply_chat_template(user_input, tokenize=False)

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)
outputs = model.generate(input_ids=inputs, max_length=4096)[0]

answer_start = int(inputs.shape[-1])
pred = tokenizer.decode(outputs[answer_start:], skip_special_tokens=True)

print(f'### User Input:\n{user_input}\n\n### Assistant Output:\n{pred}')

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 8
  • eval_batch_size: 24
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • total_eval_batch_size: 96
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Framework versions

  • PEFT 0.8.2
  • Transformers 4.37.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2