Holarissun
/

trl_rm_tldr_gptj

Generated from Trainer

Model card Files Files and versions Community

Holarissun commited on Mar 25

Commit

e839b99

•

1 Parent(s): 5bca5a7

Update README.md

Files changed (1) hide show

README.md +12 -1

README.md CHANGED Viewed

@@ -59,4 +59,15 @@ The following hyperparameters were used during training:
 - Transformers 4.36.2
 - Pytorch 2.1.2
 - Datasets 2.15.0
-- Tokenizers 0.15.0

 - Transformers 4.36.2
 - Pytorch 2.1.2
 - Datasets 2.15.0
+- Tokenizers 0.15.0
+### BibTex Citation
+If you would like to cite our paper when using the model, please use
+```
+@article{sun2024supervised,
+  title={Supervised Fine-Tuning as Inverse Reinforcement Learning},
+  author={Sun, Hao},
+  journal={arXiv preprint arXiv:2403.12017},
+  year={2024}
+}
+```