Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

	@@ -87,7 +87,7 @@ We train the model for one epoch with a learning rate of 1e-5, batch size 256, c
87
88	We collect the existing preference datasets and use them as a benchmark to evaluate the resulting reawrd model.
89
90	-
91
92
93


87
88	We collect the existing preference datasets and use them as a benchmark to evaluate the resulting reawrd model.
89
90	+ Note that for MT-Bench dataset (lmsys/mt_bench_human_judgments), we delete the samples with tie as the comparison results. The Alpaca data is from [Here](https://huggingface.co/datasets/tatsu-lab/alpaca_eval/tree/main).
91
92
93