udkai
/

Turdus

Text Generation

mlabonne/NeuralMarcoro14-7B

mmlu_abstract_algebra

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Turdus / README.md

hromi's picture

Update README.md

923c305 verified 10 months ago

|

1.61 kB

	---
	base_model: mlabonne/NeuralMarcoro14-7B
	license: apache-2.0
	tags:
	- mlabonne/NeuralMarcoro14-7B
	- dpo
	- 7B
	- winograd
	- mmlu_abstract_algebra
	- mistral
	datasets:
	- hromi/winograd_dpo_basic
	---


	# udkai_Turdus
	A less contaminated version of [udkai/Garrulus](https://huggingface.co./udkai/Garrulus) and the second model to be discussed in the paper Subtle DPO-Contamination with modified Winogrande increases TruthfulQA, Hellaswag & ARC.

	Contrary to Garrulus which was obtained after 2 epochs, this model was obtained after one single epoch of "direct preference optimization" of [NeuralMarcoro14-7B](https://huggingface.co./mlabonne/NeuralMarcoro14-7B) with [https://huggingface.co./datasets/hromi/winograd_dpo] .

	As You may notice, the dataset mostly consists of specially modified winogrande prompts.

	But before flagging this (or recommending this to be flagged), consider this:

	Subtle DPO-Contamination with modified Winogrande causes the average accuracy of all 5-non Winogrande metrics (e.g. including also MMLU and GSM8K) to be 0.2% higher than the underlying model.

	\| Model \| ARC \| HellaSwag \| MMLU \| Truthful QA \| GSM8K \| Average \|
	\| -----------------------------\|------ \| --------- \| ---- \| ----------- \| ------\| ------- \|
	\| mlabonne/NeuralMarcoro14-7B \| 71.42 \| 87.59 \| 64.84\| 65.64 \| 70.74 \| 72.046 \|
	\| udkai/Turdus \| 73.38 \| 88.56 \| 64.52\| 67.11 \| 67.7 \| 72,254 \|

	Yes, as strange as it may sound, one can indeed increase ARC from 71.42% to 73.38 % with one single epoch of cca 1200 repetitive winograd schematas...