This collection includes models from Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals. https://arxiv.org/abs/2405.05466
Joshua Clymer PRO
joshuaclymer
AI & ML interests
None yet
Organizations
Collections
1
models
37
joshuaclymer/reward_maximizer_4
Text Generation
•
Updated
•
3
joshuaclymer/truth_teller-5
Text Generation
•
Updated
•
2
joshuaclymer/truth_teller-4
Text Generation
•
Updated
joshuaclymer/truth_teller-3
Text Generation
•
Updated
•
3
joshuaclymer/truth_teller-2
Text Generation
•
Updated
•
14
joshuaclymer/truth_teller-1
Text Generation
•
Updated
•
2
joshuaclymer/truth_teller-0
Text Generation
•
Updated
joshuaclymer/saint-5
Text Generation
•
Updated
joshuaclymer/saint-4
Text Generation
•
Updated
joshuaclymer/saint-3
Text Generation
•
Updated
datasets
None public yet