SEABO: A Simple Search-Based Method for Offline Imitation Learning Paper • 2402.03807 • Published Feb 6, 2024
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 10 days ago • 132
PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation Paper • 2306.03615 • Published Jun 6, 2023
A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning Paper • 2410.14660 • Published Oct 18, 2024
RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors Paper • 2412.10713 • Published Dec 14, 2024