See our paper at https://huggingface.co./papers/2405.19332.
Shenao Zhang
ZhangShenao
AI & ML interests
None yet
Recent Activity
authored
a paper
2 days ago
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is
Implicitly an Adversarial Regularizer
authored
a paper
2 days ago
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
authored
a paper
2 days ago
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code
to Improve Code LMs
Organizations
Collections
3
-
ZhangShenao/SELM-Llama-3-8B-Instruct-iter-3
Text Generation • Updated • 81 • 5 -
ZhangShenao/SELM-Llama-3-8B-Instruct-iter-2
Text Generation • Updated • 35 -
ZhangShenao/SELM-Llama-3-8B-Instruct-iter-1
Text Generation • Updated • 31 -
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Paper • 2405.19332 • Published • 15
models
35
ZhangShenao/gemma2-2b-it-m-model-iter-1
Text Generation
•
Updated
•
17
ZhangShenao/baseline-deepseek-coder-6.7b-instruct-sft-ep3-bs512-l5e-5
Text Generation
•
Updated
•
11
ZhangShenao/baseline-deepseek-coder-6.7b-instruct-sft-ep3-bs256-l5e-5
Text Generation
•
Updated
•
13
ZhangShenao/baseline-deepseek-coder-6.7b-instruct-sft-ep2-bs128-l5e-5
Text Generation
•
Updated
•
14
ZhangShenao/baseline-deepseek-coder-6.7b-instruct-sft-ep2-bs256-l5e-5
Text Generation
•
Updated
•
12
ZhangShenao/debug_dec3
Text Generation
•
Updated
•
11
ZhangShenao/baseline-deepseek-coder-6.7b-instruct-sft-ep3-l2
Text Generation
•
Updated
•
26
ZhangShenao/baseline-deepseek-coder-6.7b-instruct-sft-ep3-l5
Text Generation
•
Updated
•
9
ZhangShenao/baseline-deepseek-coder-6.7b-instruct-sft-ep3-l5-bs256
Text Generation
•
Updated
•
10
ZhangShenao/baseline-deepseek-coder-6.7b-instruct-sft
Text Generation
•
Updated
•
15