FAR AI

non-profit

https://far.ai/

FARAIResearch

AlignmentResearch

Activity Feed Request to join this org

AI & ML interests

Frontier alignment research to ensure the safe development and deployment of advanced AI systems.

Recent Activity

tomtseng updated a model 1 day ago

AlignmentResearch/robust_llm_r2d2_tom-005_Qwen2.5-3B-Instruct_full

tomtseng updated a model 1 day ago

AlignmentResearch/robust_llm_r2d2_tom-004_Qwen2.5-1.5B-Instruct_full

tomtseng updated a model 1 day ago

AlignmentResearch/robust_llm_r2d2_tom-006_Qwen2.5-7B-Instruct_sft

View all activity

AlignmentResearch's activity

tomtseng

updated 3 models 1 day ago

tomtseng

published 3 models 1 day ago

AlignmentResearch/robust_llm_r2d2_tom-006_Qwen2.5-7B-Instruct_sft

Updated 1 day ago

AlignmentResearch/robust_llm_r2d2_tom-005_Qwen2.5-3B-Instruct_full

Updated 1 day ago • 5

AlignmentResearch/robust_llm_r2d2_tom-004_Qwen2.5-1.5B-Instruct_full

Updated 1 day ago • 2

tomtseng

updated 5 models 2 days ago

AlignmentResearch/robust_llm_r2d2_tom-003_Qwen2.5-7B-Instruct_full

Updated 2 days ago • 7

AlignmentResearch/robust_llm_oskar-024c_clf_spam_Qwen2.5-1.5B_s-1_adv_tr_gcg_t-1

Updated 2 days ago • 3

AlignmentResearch/robust_llm_oskar-037g_clf_jailbreaks_Qwen2.5-3B_s-0

Updated 2 days ago

AlignmentResearch/robust_llm_oskar-037f_clf_jailbreaks_Qwen2.5-3B_s-0

Updated 2 days ago

AlignmentResearch/robust_llm_oskar-037e_clf_jailbreaks_Qwen2.5-3B_s-0

Updated 2 days ago

tomtseng

published 3 models 2 days ago

AlignmentResearch/robust_llm_oskar-037g_clf_jailbreaks_Qwen2.5-3B_s-0

Updated 2 days ago

AlignmentResearch/robust_llm_oskar-037f_clf_jailbreaks_Qwen2.5-3B_s-0

Updated 2 days ago

AlignmentResearch/robust_llm_oskar-037e_clf_jailbreaks_Qwen2.5-3B_s-0

Updated 2 days ago

agaralon

authored a paper 11 days ago

Open Problems in Mechanistic Interpretability

Paper • 2501.16496 • Published 13 days ago • 16

AdamGleave

authored a paper about 1 year ago

Exploiting Novel GPT-4 APIs

Paper • 2312.14302 • Published Dec 21, 2023 • 13

ianmckenzie

authored a paper about 1 year ago

Inverse Scaling: When Bigger Isn't Better

Paper • 2306.09479 • Published Jun 15, 2023 • 9

AdamGleave

authored 2 papers over 1 year ago

Adversarial Policies Beat Superhuman Go AIs

Paper • 2211.00241 • Published Nov 1, 2022

Invariance in Policy Optimisation and Partial Identifiability in Reward Learning

Paper • 2203.07475 • Published Mar 14, 2022

tomtseng

authored a paper over 1 year ago

Inverse Scaling: When Bigger Isn't Better

Paper • 2306.09479 • Published Jun 15, 2023 • 9

AI & ML interests

Recent Activity

Team members 12

AlignmentResearch's activity