AlignmentResearch/robust_llm_oskar-024c_clf_spam_Qwen2.5-1.5B_s-1_adv_tr_gcg_t-1 Updated 2 days ago • 3
Invariance in Policy Optimisation and Partial Identifiability in Reward Learning Paper • 2203.07475 • Published Mar 14, 2022