File size: 13,200 Bytes
abfeed0 51c101f abfeed0 9c1d270 abfeed0 9c1d270 abfeed0 507f8ac abfeed0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 |
---
license: other
license_name: gemma-terms-of-use
license_link: https://ai.google.dev/gemma/terms
language:
- de
- en
tags:
- sft
- laserRMT
- laser-QLoRa
- finetune
- work in progress
- alpha
---
![SauerkrautLM](https://vago-solutions.de/wp-content/uploads/2024/02/sauerkrautgemma2b.jpeg "SauerkrautLM-Gemma-2b")
## VAGO solutions SauerkrautLM-Gemma-2b (alpha)
Introducing **SauerkrautLM-Gemma-2b** – our German Sauerkraut version of the powerful [google/gemma-2b](https://huggingface.co./google/gemma-2b) !
**It is an early stage finetuned model and should be used with caution!**
The model **SauerkrautLM-Gemma-2b** is a **joint effort** between **VAGO solutions** and **Hyperspace.ai.**
Much appreciation goes to the tremendous research effort of **Fernando Fernandes Neto, David Golchinfar and Eric Hartford on their laserRMT approach.**
Without their independent research collaboration this model release would not have been possible.
- Fintuned with **SFT**
- **Using a novel training technique: laser-QLoRA** - we partially freeze the model according to a laser-like analysis (Official Paper soon). It allows to evaluate the no free lunch theorem and supports better decision making when optimizing the theorem - created by the [LaserRMT research group](https://github.com/cognitivecomputations/laserRMT)
- Optimized with **LaserRMT**
# Table of Contents
1. [Overview of all SauerkrautLM-Gemma-2b models](#all-sauerkrautlm-gemma-7b-models)
2. [Model Details](#model-details)
- [Prompt template](#prompt-template)
- [Training procedure](#proceed-of-the-training)
3. [Evaluation](#evaluation)
5. [Disclaimer](#disclaimer)
6. [Contact](#contact)
7. [Collaborations](#collaborations)
8. [Acknowledgement](#acknowledgement)
## All SauerkrautLM-Gemma-2b Models
| Model | HF | GPTQ | GGUF | AWQ |
|-------|-------|-------|-------|-------|
| SauerkrautLM-Gemma-2b | [Link](https://huggingface.co./VAGOsolutions/SauerkrautLM-Gemma-2b) | coming soon | coming soon | coming soon |
## Model Details
**SauerkrautLM-Gemma-2b**
- **Model Type:** SauerkrautLM-Gemma-2b is a finetuned Model based on [google/gemma-2b](https://huggingface.co./google/gemma-2b)
- **Language(s):** German, English
- **License:** [gemma-terms-of-use](https://ai.google.dev/gemma/terms)
- **Contact:** [VAGO solutions](https://vago-solutions.ai), [Hyperspace.ai](https://hyperspace.computer/)
### Training procedure:
**Warning**: **This finetuned model is in an early stage and we sometimes observed strange behavior. It is still work in progress!**
Anyone who has attempted or succeeded in fine-tuning a model is aware of the difficulty in nudging it towards a specific skill, such as mastering new languages, as well as the challenges associated with achieving significant improvements in performance.
Experimenting with a novel training strategy and Spherical Linear Interpolation alongside a lasered version of the model itself has proven to be both fascinating and revealing.
Furthermore, we developed one iteration of the model using our entire SFT -Sauerkraut dataset and two additional iterations using subsets of the full dataset—one focused on enhancing MMLU and TQA capabilities, and the other on boosting GSM8K and Winogrande skills.
We actively monitor and assesed the results of each training. Whenever we found a decrease in perplexity on the gsm8k benchmark we intervined. By following this procedure we were able to improve the overall performance, especially in math abilities, without detracting from performance on other benchmarks—a task that is, in general, quite difficult.
This process not only helps in understanding the effectiveness of Spherical Linear Interpolation but also introduces a new method for refining models with enhanced skills through a cycle of targeted data selection (Laser data(x)) + SLERP, followed by a subsequent focus on different data (Laser again on data(y)).
Additionally, we integrated a novel training strategy on the SFT training process, where we partially freeze the model according to a laser-like analysis aiming to navigate and optimize the trade-offs highlighted by the no free lunch theorem. This innovative training method effectively prevents the significant problem of language models forgetting previously acquired knowledge.
This aspect is particularly crucial when attempting to teach the model specific skills, such as a new language, where in general, the model might lose a considerable amount of its prior knowledge and exhibit a decline in overall intelligence.
Detailed information on how the new training strategy works and the advantages it offers over conventional training methods will soon be published in a detailed paper by the LaserRMT research group.
**We teached German language skills on this model.** As far as we know, it is the first Gemma-2b model with bilingual skills in German and English. Nevertheless, formulations may occur that are not entirely correct (still work in progress).
### Prompt Template:
We trained on vicuna prompt template. Please add the following stopping string to your client: ``` "</s>","</p>" ``` (we did not add the special tokens to the training config)
```
You are a helpful AI Assistant.
USER: Hello, how are you?
ASSISTANT:
```
## Evaluation
(with lm-evaluation-harness 0.4.1)
**Open LLM Leaderboard:**
| Metric | Value |
|-----------------------|---------------------------|
| Avg. | **48.93** |
| ARC (25-shot) | 49.32 |
| HellaSwag (10-shot) | 71.23 |
| MMLU (5-shot) | 42.06
| TruthfulQA (0-shot) | 35.73 |
| Winogrande (5-shot) | 67.56 |
| GSM8K (5-shot) | 27.67 |
**Performance**
| Model |AGIEval|GPT4All|TruthfulQA|BigBench|Average ⬇️|
|-----------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|[VAGOsolutions/SauerkrautLM-Gemma-7b](https://huggingface.co./VAGOsolutions/SauerkrautLM-Gemma-7b) | 37.5| 72.46| 61.24| 45.33| 54.13|
|[zephyr-7b-beta](https://huggingface.co./HuggingFaceH4/zephyr-7b-beta) | 37.52| 71.77| 55.26| 39.77| 51.08|
|[zephyr-7b-gemma-v0.1](https://huggingface.co./HuggingFaceH4/zephyr-7b-gemma-v0.1)| 34.22| 66.37| 52.19| 37.10| 47.47|
|[VAGOsolutions/SauerkrautLM-Gemma-2b](https://huggingface.co./VAGOsolutions/SauerkrautLM-Gemma-2b) | 24.28| 63.59| 35.73| 22.77| 36.59|
|[google/gemma-7b-it](https://huggingface.co./google/gemma-7b-it) | 21.33| 40.84| 41.70| 30.25| 33.53|
<details><summary>Details of AGIEval, GPT4All, TruthfulQA, BigBench </summary>
**AGIEval**
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|------------------------------|------:|------|------|--------|-----:|---|-----:|
|agieval_sat_math | 1|none |None |acc |0.2409|± |0.0289|
| | |none |None |acc_norm|0.2455|± |0.0291|
|agieval_sat_en_without_passage| 1|none |None |acc |0.3010|± |0.0320|
| | |none |None |acc_norm|0.2816|± |0.0314|
|agieval_sat_en | 1|none |None |acc |0.3301|± |0.0328|
| | |none |None |acc_norm|0.2961|± |0.0319|
|agieval_lsat_rc | 1|none |None |acc |0.2007|± |0.0245|
| | |none |None |acc_norm|0.1933|± |0.0241|
|agieval_lsat_lr | 1|none |None |acc |0.1941|± |0.0175|
| | |none |None |acc_norm|0.2039|± |0.0179|
|agieval_lsat_ar | 1|none |None |acc |0.2304|± |0.0278|
| | |none |None |acc_norm|0.2391|± |0.0282|
|agieval_logiqa_en | 1|none |None |acc |0.2089|± |0.0159|
| | |none |None |acc_norm|0.2581|± |0.0172|
|agieval_aqua_rat | 1|none |None |acc |0.2480|± |0.0272|
| | |none |None |acc_norm|0.2244|± |0.0262|
Average: 24.28%
**GPT4All**
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|---------|------:|------|------|--------|-----:|---|-----:|
|arc_challenge| 1|none |None |acc |0.4334|± |0.0145|
| | |none |None |acc_norm|0.4309|± |0.0145|
|arc_easy | 1|none |None |acc |0.7433|± |0.0090|
| | |none |None |acc_norm|0.7264|± |0.0091|
|boolq | 2|none |None |acc |0.7165|± |0.0079|
|hellaswag | 1|none |None |acc |0.5357|± |0.0050|
| | |none |None |acc_norm|0.7158|± |0.0045|
|openbookqa | 1|none |None |acc |0.318 |± |0.0208|
| | |none |None |acc_norm|0.402 |± |0.0219|
|piqa | 1|none |None |acc |0.7709|± |0.0098|
| | |none |None |acc_norm|0.7807|± |0.0097|
|winogrande | 1|none |None |acc |0.6788|± |0.0131|
Average: 63.59%
**TruthfulQA**
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
|--------------|------:|------|-----:|------|-----:|---|-----:|
|truthfulqa_mc2| 2|none | 0|acc |0.3573|± |0.0135|
Average: 35.73%
**Bigbench**
| Tasks |Version| Filter |n-shot| Metric |Value | |Stderr|
|----------------------------------------------------|------:|----------------|-----:|-----------|-----:|---|-----:|
|bbh_zeroshot_tracking_shuffled_objects_three_objects| 2|flexible-extract| 0|exact_match|0.3280|± |0.0298|
|bbh_zeroshot_tracking_shuffled_objects_seven_objects| 2|flexible-extract| 0|exact_match|0.1120|± |0.0200|
|bbh_zeroshot_tracking_shuffled_objects_five_objects | 2|flexible-extract| 0|exact_match|0.1520|± |0.0228|
|bbh_zeroshot_temporal_sequences | 2|flexible-extract| 0|exact_match|0.1000|± |0.0190|
|bbh_zeroshot_sports_understanding | 2|flexible-extract| 0|exact_match|0.5360|± |0.0316|
|bbh_zeroshot_snarks | 2|flexible-extract| 0|exact_match|0.2753|± |0.0336|
|bbh_zeroshot_salient_translation_error_detection | 2|flexible-extract| 0|exact_match|0.1400|± |0.0220|
|bbh_zeroshot_ruin_names | 2|flexible-extract| 0|exact_match|0.1120|± |0.0200|
|bbh_zeroshot_reasoning_about_colored_objects | 2|flexible-extract| 0|exact_match|0.1080|± |0.0197|
|bbh_zeroshot_navigate | 2|flexible-extract| 0|exact_match|0.5800|± |0.0313|
|bbh_zeroshot_movie_recommendation | 2|flexible-extract| 0|exact_match|0.4360|± |0.0314|
|bbh_zeroshot_logical_deduction_three_objects | 2|flexible-extract| 0|exact_match|0.0000|± |0.0000|
|bbh_zeroshot_logical_deduction_seven_objects | 2|flexible-extract| 0|exact_match|0.0720|± |0.0164|
|bbh_zeroshot_logical_deduction_five_objects | 2|flexible-extract| 0|exact_match|0.0000|± |0.0000|
|bbh_zeroshot_geometric_shapes | 2|flexible-extract| 0|exact_match|0.0000|± |0.0000|
|bbh_zeroshot_disambiguation_qa | 2|flexible-extract| 0|exact_match|0.3400|± |0.0300|
|bbh_zeroshot_date_understanding | 2|flexible-extract| 0|exact_match|0.3360|± |0.0299|
|bbh_zeroshot_causal_judgement | 2|flexible-extract| 0|exact_match|0.4706|± |0.0366|
Average: 22.77%
</details>
## Disclaimer
We must inform users that despite our best efforts in data cleansing, the possibility of uncensored content slipping through cannot be entirely ruled out.
However, we cannot guarantee consistently appropriate behavior. Therefore, if you encounter any issues or come across inappropriate content, we kindly request that you inform us through the contact information provided.
Additionally, it is essential to understand that the licensing of these models does not constitute legal advice. We are not held responsible for the actions of third parties who utilize our models.
## Contact
If you are interested in customized LLMs for business applications, please get in contact with us via our websites. We are also grateful for your feedback and suggestions.
## Collaborations
We are also keenly seeking support and investment for our startups, VAGO solutions and Hyperspace where we continuously advance the development of robust language models designed to address a diverse range of purposes and requirements. If the prospect of collaboratively navigating future challenges excites you, we warmly invite you to reach out to us at [VAGO solutions](https://vago-solutions.de/#Kontakt), [Hyperspace.computer](https://hyperspace.computer/)
## Acknowledgement
Many thanks to [google](https://huggingface.co./google) for providing such valuable model to the Open-Source community |