CombinHorizon commited on
Commit
98d59e6
1 Parent(s): 30c233e

Update README.md

Browse files

another thing, the Readme description was too short for it to be allowed to be added to the LLM leaderboard (they are trying to discourage models from not having enough info or documentation, minimum 200 characters)

[https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard](https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard/blob/main/src/submission/check_validity.py#L39)
`Please add a description to your model card, it is too short.`

Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -2,4 +2,29 @@
2
  library_name: transformers
3
  license: mit
4
  ---
5
- This repository contains the model release for the use of [CPO](https://arxiv.org/abs/2401.08417), please find more details in [our github](https://github.com/fe1ixxu/CPO_SIMPO)!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  library_name: transformers
3
  license: mit
4
  ---
5
+ This repository contains the model release for the use of [CPO](https://arxiv.org/abs/2401.08417), please find more details in [our github](https://github.com/fe1ixxu/CPO_SIMPO)!
6
+
7
+ ```
8
+ @inproceedings{
9
+ xu2024contrastive,
10
+ title={Contrastive Preference Optimization: Pushing the Boundaries of {LLM} Performance in Machine Translation},
11
+ author={Haoran Xu and Amr Sharaf and Yunmo Chen and Weiting Tan and Lingfeng Shen and Benjamin Van Durme and Kenton Murray and Young Jin Kim},
12
+ booktitle={Forty-first International Conference on Machine Learning},
13
+ year={2024},
14
+ url={https://openreview.net/forum?id=51iwkioZpn}
15
+ }
16
+ ```
17
+
18
+ Here are released models for CPO and SimPO. The code is based on SimPO github. We focus on highlighting reference-free preference learning and demonstrating the effectiveness of SimPO.
19
+ Additionally, we integrate length normalization and target reward margin into CPO, showing promising results and the potential benefits of combining them together.
20
+ CPO adds a BC-regularizer to prevent the model from deviating too much from the preferred data distribution.
21
+
22
+
23
+
24
+ | models | | AE2 LC | AE2 WR |
25
+ |------------------------------|-----------------------------------------------------------------------------------------------------------|:------:|:------:|
26
+ | Llama3 Instruct 8B SimPO (reported) | [princeton-nlp/Llama-3-Instruct-8B-SimPO](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SimPO) | 44.7 | 40.5 |
27
+ | Llama3 Instruct 8B SimPO (reproduced) | [haoranxu/Llama-3-Instruct-8B-SimPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-SimPO) | 43.3 | 40.6 |
28
+ | Llama3 Instruct 8B CPO | [haoranxu/Llama-3-Instruct-8B-CPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-CPO) | 36.07 | 40.06 |
29
+ | Llama3 Instruct 8B CPO-SimPO | [haoranxu/Llama-3-Instruct-8B-CPO-SimPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-CPO-SimPO) | 46.94 | 44.72 |
30
+