AgentGym

AI & ML interests

LLM Agent

Recent Activity

WooooDyy authored a paper 5 days ago

ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use

WooooDyy authored a paper 3 months ago

TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models

WooooDyy authored a paper 3 months ago

Improving Generalization of Alignment with Human Preferences through Group Invariant Learning

View all activity

models 1

AgentGym/AgentEvol-7B

Text Generation • Updated Jun 7, 2024 • 228 • 5

datasets 2

AgentGym/AgentEval

Viewer • Updated Sep 21, 2024 • 1.16k • 22 • 1

AgentGym/AgentTraj-L

Viewer • Updated Jun 6, 2024 • 14.5k • 48 • 5