Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
XueyingJia
/
pythia160m-tldrsft-hh-online-dpo
like
0
Transformers
Safetensors
XueyingJia/online_dpo_repo
Generated from Trainer
trl
online-dpo
Inference Endpoints
arxiv:
2402.04792
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
main
pythia160m-tldrsft-hh-online-dpo
Commit History
End of training
7a50260
verified
XueyingJia
commited on
Nov 24, 2024
Model save
590643c
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 1500
48ff17e
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 1350
172c5a0
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 1200
f17913b
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 1050
602ccc6
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 900
20b31be
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 750
9af5786
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 600
69186f3
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 450
154bfcf
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 300
46828d5
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 150
8b2cd4d
verified
XueyingJia
commited on
Nov 24, 2024
initial commit
7ff3d7b
verified
XueyingJia
commited on
Nov 24, 2024