--- license: apache-2.0 tags: - mistral - conversational - text-generation-inference - mergekit - merge base_model: - UsernameJustAnother/Nemo-12B-Marlin-v5 - anthracite-org/magnum-12b-v2 library_name: transformers --- > [!WARNING] > **General Use Sampling:**
> Mistral-Nemo-12B is very sensitive to the temperature sampler, try values near **0.3** at first or else you will get some weird results. This is mentioned by MistralAI at their [Transformers](https://huggingface.co./mistralai/Mistral-Nemo-Instruct-2407#transformers) section. > [!NOTE] > **Best Samplers:**
> I found best success using the following for Starlight-V3-12B:
> Temperature: `0.7`-`1.2` (Additional stopping strings will be necessary as you increase the temperature)
> Top K: `-1`
> Min P: `0.05`
> Rep Penalty: `1.03-1.1` # Why Version 3? Currently the other versions resulted in really bad results that I didn't upload them, the version number is just the internal version. # Goal The idea is to keep the strengths of [anthracite-org/magnum-12b-v2](https://huggingface.co./anthracite-org/magnum-12b-v2) while adding some more creativity that seems to be lacking in the model. Mistral-Nemo by itself seems to behave less sporadic due to the low temperature needed but this gets a bit repetitive, although it's still the best model I've used so far. # Results I am not entirely pleased with the result of the merge but it seems okay, though base [anthracite-org/magnum-12b-v2](https://huggingface.co./anthracite-org/magnum-12b-v2) might just be better by itself. However, I'll still experiement on different merge methods. Leaking of the training data used on both models seems a bit more apparent when using higher temperature values, especially the use of author notes on the system prompt. Generally I'd advise to create a stopping string for "```" to avoid the generation of the training data. **Original Models:** - [UsernameJustAnother/Nemo-12B-Marlin-v5](https://huggingface.co./UsernameJustAnother/Nemo-12B-Marlin-v5) (Thank you so much for your work ♥) - [anthracite-org/magnum-12b-v2](https://huggingface.co./anthracite-org/magnum-12b-v2) (Thank you so much for your work ♥) **GGUF Quants** - [starble-dev/Starlight-V3-12B-GGUF](https://huggingface.co./starble-dev/Starlight-V3-12B-GGUF) - [mradermacher/Starlight-V3-12B-GGUF](https://huggingface.co./mradermacher/Starlight-V3-12B-GGUF) - [mradermacher/Starlight-V3-12B-i1-GGUF](https://huggingface.co./mradermacher/Starlight-V3-12B-i1-GGUF) (imatrix) **Original Model Licenses & This Model License:** Apache 2.0 --- # Starlight-V3-12B This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). ## Merge Details ### Merge Method This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using models/magnum-12b-v2 as a base. ### Models Merged The following models were included in the merge: * UsernameJustAnother/Nemo-12B-Marlin-v5 * anthracite-org/magnum-12b-v2 ### Configuration The following YAML configuration was used to produce this model: ```yaml models: - model: anthracite-org/magnum-12b-v2 parameters: density: 0.3 weight: 0.5 - model: UsernameJustAnother/Nemo-12B-Marlin-v5 parameters: density: 0.7 weight: 0.5 merge_method: ties base_model: anthracite-org/magnum-12b-v2 parameters: normalize: true int8_mask: true dtype: bfloat16 ```