Turdus-7B-GGUF / README.md

Update README.md

6e1232b verified 8 months ago

4.11 kB

	---
	base_model: mlabonne/NeuralMarcoro14-7B
	license: cc-by-nc-4.0
	tags:
	- mlabonne/NeuralMarcoro14-7B
	- dpo
	- 7B
	- winograd
	- mmlu_abstract_algebra
	- mistral
	datasets:
	- hromi/winograd_dpo_basic
	---
	# Turdus-7B-GGUF


	## Description

	This repo contains GGUF format model files for Turdus-7B-GGUF.

	## Files Provided

	\| Name \| Quant \| Bits \| File Size \| Remark \|
	\| ---------------------- \| ------- \| ---- \| --------- \| -------------------------------- \|
	\| turdus-7b.IQ3_XXS.gguf \| IQ3_XXS \| 3 \| 3.02 GB \| 3.06 bpw quantization \|
	\| turdus-7b.IQ3_S.gguf \| IQ3_S \| 3 \| 3.18 GB \| 3.44 bpw quantization \|
	\| turdus-7b.IQ3_M.gguf \| IQ3_M \| 3 \| 3.28 GB \| 3.66 bpw quantization mix \|
	\| turdus-7b.Q4_0.gguf \| Q4_0 \| 4 \| 4.11 GB \| 3.56G, +0.2166 ppl \|
	\| turdus-7b.IQ4_NL.gguf \| IQ4_NL \| 4 \| 4.16 GB \| 4.25 bpw non-linear quantization \|
	\| turdus-7b.Q4_K_M.gguf \| Q4_K_M \| 4 \| 4.37 GB \| 3.80G, +0.0532 ppl \|
	\| turdus-7b.Q5_K_M.gguf \| Q5_K_M \| 5 \| 5.13 GB \| 4.45G, +0.0122 ppl \|
	\| turdus-7b.Q6_K.gguf \| Q6_K \| 6 \| 5.94 GB \| 5.15G, +0.0008 ppl \|
	\| turdus-7b.Q8_0.gguf \| Q8_0 \| 8 \| 7.70 GB \| 6.70G, +0.0004 ppl \|

	## Parameters

	\| path \| type \| architecture \| rope_theta \| sliding_win \| max_pos_embed \|
	\| ------------ \| ------- \| ------------------ \| ---------- \| ----------- \| ------------- \|
	\| udkai/Turdus \| mistral \| MistralForCausalLM \| 10000.0 \| 4096 \| 32768 \|

	## Benchmarks

	![](https://i.ibb.co/jgS4ZNP/Turdus-7-B.png)

	## Specific Purpose Notes

	This model understands classification very well. Given the task to evaluate Indonesian clauses, it gives concise output in Indonesian:
	![](https://i.ibb.co/bvtnyJ3/Evaluasi-Klausul-oleh-Turdus-7-B-Q8-0.png)

	Even better in English (with slight different prompt):
	![](https://i.ibb.co/1s1GLBn/Evaluasi-Klausul2-oleh-Turdus-7-B-Q8-0.png)

	Excellent clause classification for evaluation preparation:
	![](https://i.ibb.co/FwQYvRs/klasifikasi-pasal.png)

	# Original Model Card

	![](https://wizzion.com/solarpunk_turdus.webp)

	# udkai_Turdus
	A less contaminated version of [udkai/Garrulus](https://huggingface.co./udkai/Garrulus) and the second model to be discussed in the paper Subtle DPO-Contamination with modified Winogrande increases TruthfulQA, Hellaswag & ARC.

	Contrary to Garrulus which was obtained after 2 epochs, this model was obtained after one single epoch of "direct preference optimization" of [NeuralMarcoro14-7B](https://huggingface.co./mlabonne/NeuralMarcoro14-7B) with [https://huggingface.co./datasets/hromi/winograd_dpo ] .

	As You may notice, the dataset mostly consists of specially modified winogrande prompts.

	But before flagging this (or recommending this to be flagged), consider this:

	Subtle DPO-Contamination with modified Winogrande causes the average accuracy of all 5-non Winogrande metrics (e.g. including also MMLU and GSM8K) to be 0.2% higher than the underlying model.

	\| Model \| ARC \| HellaSwag \| MMLU \| Truthful QA \| GSM8K \| Average \|
	\| -----------------------------\|------ \| --------- \| ---- \| ----------- \| ------\| ------- \|
	\| mlabonne/NeuralMarcoro14-7B \| 71.42 \| 87.59 \| 64.84\| 65.64 \| 70.74 \| 72.046 \|
	\| udkai/Turdus \| 73.38 \| 88.56 \| 64.52\| 67.11 \| 67.7 \| 72,254 \|

	Yes, as strange as it may sound, one can indeed increase ARC from 71.42% to 73.38 % with one single epoch of cca 1200 repetitive winograd schematas...

	# BibTex
	Should this model - or quasi-methodology which lead to it - be of certain pratical or theoretical interest for You, would be honored if You would refer to it in Your work:

	```
	@misc {udk_dot_ai_turdus,
	author = { {UDK dot AI, Daniel Devatman Hromada} },
	title = { Turdus (Revision 923c305) },
	year = 2024,
	url = { https://huggingface.co./udkai/Turdus },
	doi = { 10.57967/hf/1611 },
	publisher = { Hugging Face }
	}
	```