File size: 4,676 Bytes
1d45f51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
language:
- en
- de
license: apache-2.0
library_name: transformers
datasets:
- FreedomIntelligence/sharegpt-deutsch
- mayflowergmbh/oasst_de
- mayflowergmbh/dolly_15k_de
- mayflowergmbh/openschnabeltier_de
- mayflowergmbh/ultrachat_de
- WizardLM/WizardLM_evol_instruct_V2_196k
- mayflowergmbh/evol_instruct_de
- mayflowergmbh/alpaca-gpt4_de
- mayflowergmbh/dolphin_de
- mayflowergmbh/airoboros_de
pipeline-tag: text-generation
model-index:
- name: ende-chat-0.0.7
  results: []
---


# Model Card for EnDe-chat-0.0.7

Preliminary LoRA finetune of Mistral-7B for German and English quality text.

This version has an **extended tokenizer**, to make the model able to handle longer input.

This is an experiment to improve the German capabilities of Mistral with
continued finetuning. The finetuning also includes English data, in order to
retain the English capabilities, to allow the model to be used for translation
and for answering German questions on English documents and vice versa.

Unfortunately, the compute available for this experiment (2xV100) was not at
all sufficient for the amount of training data we would have liked to include.

After continued pretraining, this model has received instruction finetuning.

#  Table of Contents

- [Model Details](#model-details)
  - [Model Description](#model-description)
- [Uses](#uses)
  - [Out-of-Scope Use](#out-of-scope-use)
- [Bias, Risks, and Limitations](#bias-risks-and-limitations)
  - [Recommendations](#recommendations)
- [Training Details](#training-details)
  - [Training Data](#training-data)
  - [Training Procedure](#training-procedure)
- [Evaluation](#evaluation)
- [Examples](#examples)


# Model Details

## Model Description

LoRA finetune of Mistral-7B for German and English quality text.

- **Developed by:** Erich Schubert
- **Model type:** Language model
- **Language(s) (NLP):** deu, eng
- **License:** apache-2.0
- **Parent Model:** mistralai/Mistral-7B-v0.1
- **Resources for more information:** n/a


# Uses

Model finetuned for chat in German and English.

## Out-of-Scope Use

The model has not received alignment or instruction finetuning, this is intended as a chat foundation model.


# Bias, Risks, and Limitations

Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.


## Recommendations

Further finetuning necessary!


# Training Details

## Training Data

Pretrained on proprietary text collected from the internet, with a focus on
quality German and English text.

Typical benchmarking data should not be present in this data set.

This is no longer as clear for the finetuning data sets, but the
amount of data and compute for instruction tuning was much less.


## Training Procedure

Initial LoRA finetuning with LLaMA-Factory using a mixture of **English and
German** data, with a focus on data quality.

Unfortunately, I could use 100x as much GPU power as I had available for this
experiment, and had to heavily subsample the data. As is, this is largely a
proof of concept to see if we can improve model quality with better data.

This version then received basic chat/instruction training with
```
    --stage sft \
    --model_name_or_path ende-0.0.7 \
    --finetuning_type lora \
    --template default \
    --dataset_dir data \
    --dataset sharegpt-deutsch,oasst_de,dolly_15k_de,openschnabeltier_de,ultrachat_de,evol_instruct,evol_instruct_de,alpaca-gpt4_de,dolphin_de,airoboros_de \
    --cutoff_len 1024 \
    --learning_rate 5e-05 \
    --num_train_epochs 1.0 \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --neftune_noise_alpha 0 \
    --lora_target all \
    --lora_rank 8 \
    --lora_dropout 0 \
    --fp16 True \
```

Unfortunately, **most of this fine-tuning data is just automatically
translated from English**. I do not think this leads to particularly
high-quality data.

# Evaluation

Not fully evaluated, as it has not been completely trained.

Also, I believe that our **benchmarks tend to be misleading**.
In particular the huggingface leaderboard is flooded with overfitted models
with little to no value. Real-world performance may be task specific and
needs to be evaluated carefully on a case basis. I hope some will find
this model to be useful!

**You are welcome to contribute evaluation scores!**