File size: 7,473 Bytes
fa9237a 8b27b59 4c6a038 fa9237a 476274a fa9237a c7cc200 fa9237a fc4ed05 fa9237a fc4ed05 fa9237a 14e54f2 fa9237a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- trl
- sft
- generated_from_trainer
base_model: mistralai/Mistral-7B-v0.1
model-index:
- name: Cimphony-Mistral-Law-7B
results:
- task:
type: text-generation
dataset:
type: cais/mmlu
name: MMLU
metrics:
- name: International Law
type: accuracy
value: 0.802
verified: false
- task:
type: text-generation
dataset:
type: cais/mmlu
name: MMLU
metrics:
- name: Jurisprudence
type: accuracy
value: 0.704
verified: false
- task:
type: text-generation
dataset:
type: cais/mmlu
name: MMLU
metrics:
- name: Professional Law
type: accuracy
value: 0.416
verified: false
- task:
type: text-generation
dataset:
type: coastalcph/lex_glue
name: LexGLUE
metrics:
- name: ECtHR A
type: balanced accuracy
value: 0.631
verified: false
- task:
type: text-generation
dataset:
type: coastalcph/lex_glue
name: LexGLUE
metrics:
- name: LEDGAR
type: balanced accuracy
value: 0.741
verified: false
- task:
type: text-generation
dataset:
type: coastalcph/lex_glue
name: LexGLUE
metrics:
- name: CaseHOLD
type: accuracy
value: 0.776
verified: false
- task:
type: text-generation
dataset:
type: coastalcph/lex_glue
name: LexGLUE
metrics:
- name: Unfair-ToS
type: balanced accuracy
value: 0.809
verified: false
pipeline_tag: text-generation
---
# Cimphony-Mistral-Law-7B
We introduce Cimphony-Mistral-Law-7B, a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co./mistralai/Mistral-7B-v0.1).
Cimphony’s LLMs present state-of-the-art performance on legal benchmarks, suppressing models trained on a much larger corpus with significantly more resources, even GPT-4, OpenAI’s flagship model.
Checkout and register on our [https://cimphony.ai](https://app.cimphony.ai/signup?callbackUrl=https://app.cimphony.ai/)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/657d36d3647c0211e7746ed9/Yjx96bC58SPgNwmDxx_yx.png)
## Model description
The model was trained on 600M tokens. We use novel methods to expose the model to this corpus during training, blending a variety of legal reading comprehension tasks, as well as general language data.
## Legal Evaluation Results
We evaluate on the legal splits of the MMLU benchmark, as well as LexGLUE. While both are multiple option benchmarks, prompts were adapted so that the models output a single answer. In some cases, additional post-processing was required.
Benchmarks for which the labels were A-E multiple-choice options use an accuracy mertic. Benchmarks that have a closed list of options (e.g. Unfair-ToS) use a balanced-accuracy metric, as classes may not be balanced.
| Model / Benchmark | International Law (MMLU) | Jurisprudence (MMLU) | Professional law (MMLU) | ECtHR A (LexGlue) | LEDGAR (LexGlue) | CaseHOLD (LexGlue) | Unfair-ToS (LexGlue) |
|:-----------------------------------|:--------------------------|:----------------------|:-------------------------|:-------------------|:------------------|:--------------------|:-----------------------|
| Mistral-7B-Instruct-v0.2 | 73.6% | 69.4% | 41.2% | 67.5% | 50.6% | 56.3% | 36.6% |
| AdaptLLM | 57.0% | 52.8% | 36.1% | 51.9% | 46.3% | 50.0% | 51.3% |
| Saul-7B | 69.4% | 63.0% | **43.2%** | **71.2%** | 55.9% | 65.8% | 80.3% |
|<tr style="background-color:yellow;"><td>Cimphony-7B</td><td>**80.2%**</td><td>**70.4%**</td><td>41.6%</td><td>63.1%</td><td>**74.1%**</td><td>**77.6%**</td><td>**80.9%**</td></tr>|
## Training and evaluation data
Following the framework presented in [AdaptLLM](https://huggingface.co./AdaptLLM/law-chat), we convert the raw legal text into reading comprehension. Taking inspiration from human learning via reading comprehension - practice after reading improves the ability to answer questions based on the learned knowledge.
We developed a high-quality prompt database, considering the capabilities we’d like the model to possess. LLMs were prompt with the raw text and a collection of prompts, and it returned answers, additional questions, and transformations relevant to the input data. With further post-processing of these outputs, we created our legal reading comprehension dataset.
| Domain | Dataset | Tokens | License |
|:-------------------|:--------------------|:------:|:------------|
| Legal | The Pile (FreeLaw) | 180M | MIT |
| Legal | LexGlue (train split only) | 108M | CC-BY-4.0 |
| Legal | USClassActions | 12M | GPL-3.0 |
| Math (CoT) | AQUA-RAT | 3M | Apache-2.0 |
| Commonsense (CoT) | ECQA | 2.4M | Apache-2.0 |
| Reasoning (CoT) | EntailmentBank | 1.8M | Apache-2.0 |
| Chat | UltraChat | 90M | MIT |
| Code | Code-Feedback | 36M | Apache-2.0 |
| Instruction | OpenOrca | 180M | MIT |
## Intended uses & limitations
This model can be used for use cases involving legal domain text generation.
As with any language model, users must not solely relay on model generations. This model has not gone through a human-feedback alignment (RLHF). The model may generate responses containing hallucinations and biases.
Example use:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
tokenizer = AutoTokenizer.from_pretrained("cimphonyadmin/Cimphony-Mistral-Law-7B")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
model = PeftModel.from_pretrained(model, "cimphonyadmin/Cimphony-Mistral-Law-7B")
# Put your input here:
user_input = '''What can you tell me about ex post facto laws?'''
# Apply the prompt template
prompt = tokenizer.apply_chat_template(user_input, tokenize=False)
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)
outputs = model.generate(input_ids=inputs, max_length=4096)[0]
answer_start = int(inputs.shape[-1])
pred = tokenizer.decode(outputs[answer_start:], skip_special_tokens=True)
print(f'### User Input:\n{user_input}\n\n### Assistant Output:\n{pred}')
```
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 8
- eval_batch_size: 24
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- total_eval_batch_size: 96
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
### Framework versions
- PEFT 0.8.2
- Transformers 4.37.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2 |