WildEval

non-profit

wild_eval

WildEval

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

yuchenlin authored a paper 8 days ago

Small Models Struggle to Learn from Strong Reasoners

DongfuJiang authored a paper 23 days ago

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

yuchenlin authored a paper 25 days ago

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

View all activity

WildEval's activity

yuchenlin

authored a paper 8 days ago

Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published 11 days ago • 27

DongfuJiang

authored a paper 23 days ago

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

Paper • 2502.01718 • Published 25 days ago • 28

yuchenlin

authored a paper 25 days ago

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Paper • 2502.01100 • Published 26 days ago • 16

yuchenlin

updated a Space 25 days ago

Zebra Logic Bench

🦓

Explore and compare Zebra Puzzle solving models

yuchenlin

updated a dataset 25 days ago

WildEval/ZebraLogic

Viewer • Updated 25 days ago • 4.26k • 244 • 5

yuchenlin

published a dataset 25 days ago

WildEval/ZebraLogic

Viewer • Updated 25 days ago • 4.26k • 244 • 5

lasha-nlp

authored 8 papers about 1 month ago

Stress Test Evaluation for Natural Language Inference

Paper • 1806.00692 • Published Jun 2, 2018

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning

Paper • 2305.15065 • Published May 24, 2023 • 1

What's In My Big Data?

Paper • 2310.20707 • Published Oct 31, 2023 • 11

CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation

Paper • 2211.00295 • Published Nov 1, 2022

The Art of Saying No: Contextual Noncompliance in Language Models

Paper • 2407.12043 • Published Jul 2, 2024 • 4

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

Paper • 2407.17468 • Published Jul 24, 2024

Question Answering for Privacy Policies: Combining Computational and Legal Perspectives

Paper • 1911.00841 • Published Nov 3, 2019

HALoGEN: Fantastic LLM Hallucinations and Where to Find Them

Paper • 2501.08292 • Published Jan 14 • 17

ronanlb

authored a paper 3 months ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 59

faezeb

authored a paper 3 months ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 59

valpy

authored a paper 3 months ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 59

valpy

authored a paper 4 months ago

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

Paper • 2410.16665 • Published Oct 22, 2024

yuchenlin

authored a paper 4 months ago

On Memorization of Large Language Models in Logical Reasoning

Paper • 2410.23123 • Published Oct 30, 2024 • 18

faezeb

authored a paper 4 months ago

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Paper • 2410.19133 • Published Oct 24, 2024 • 11

AI & ML interests

Recent Activity

Team members 9

WildEval's activity

Zebra Logic Bench