Peter Kruger's picture

3 1 2

Peter Kruger PRO

PeterKruger

·

AI & ML interests

Neural networks (since 1993), LLMs, AI-based financial analysis, LLM Benchmarks

Recent Activity

updated a Space 2 days ago

AutoBench/README

updated a model 2 days ago

AutoBench/AutoBench_1.0

commented on their article 4 days ago

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

View all activity

Organizations

PeterKruger's activity

updated a Space 2 days ago

README

updated a model 2 days ago

AutoBench/AutoBench_1.0

Updated 2 days ago • 2

commented on Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!) 4 days ago

Nice and fully accurate. Excellent job. Thanks!

New activity in AutoBench/AutoBench_1.0 5 days ago

Comparing with mt-bench

#3 opened 5 days ago by

posted an update 5 days ago

Post

427

AutoBench 1.0 is live. The Collective-LLM-as-a-Judge model benchmark
https://huggingface.co./blog/PeterKruger/autobench

New activity in AutoBench/AutoBench_1.0 5 days ago

Pool LLM bias

#2 opened 5 days ago by

Prompt analysis should be better discussed

#1 opened 5 days ago by

upvoted an article 5 days ago

Article

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

By

•

5 days ago

• 6

liked a Space 5 days ago

AutoBench 1.0 Demo

Collective-Model-As-Judge LLM Benchmark

liked a model 5 days ago

AutoBench/AutoBench_1.0

Updated 2 days ago • 2

updated a Space 5 days ago

AutoBench 1.0 Demo

Collective-Model-As-Judge LLM Benchmark

published an article 5 days ago

Article

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

By

•

5 days ago

• 6

updated a dataset 5 days ago

AutoBench/AutoBench_Results_20_LLMs

Preview • Updated 5 days ago • 27

published a dataset 5 days ago

AutoBench/AutoBench_Results_20_LLMs

Preview • Updated 5 days ago • 27

published a Space 5 days ago

README

published a Space 6 days ago

AutoBench 1.0 Demo

Collective-Model-As-Judge LLM Benchmark

published a model 6 days ago

AutoBench/AutoBench_1.0

Updated 2 days ago • 2

updated a Space 7 days ago

AutoBench 1.0 Demo

Collective-Model-As-Judge LLM Benchmark

updated 2 models 7 days ago

AutoBench/AutoBench_1.0

Updated 2 days ago • 2

AutoBench/AutoBench_1.0

Updated 2 days ago • 2