La Leaderboard

community

Activity Feed

AI & ML interests

LLM Leaderboard for Spanish Varieties and Official Languages

Recent Activity

mariagrandury updated a dataset 4 days ago

la-leaderboard/requests

mariagrandury updated a dataset 7 days ago

la-leaderboard/results

mariagrandury new activity 7 days ago

la-leaderboard/la-leaderboard:It takes too long to get reuslts?

View all activity

la-leaderboard's activity

albertvillanova

posted an update about 14 hours ago

Post

893

🚀 Big news for AI agents! With the latest release of smolagents, you can now securely execute Python code in sandboxed Docker or E2B environments. 🦾🔒

Here's why this is a game-changer for agent-based systems: 🧵👇

1️⃣ Security First 🔐
Running AI agents in unrestricted Python environments is risky! With sandboxing, your agents are isolated, preventing unintended file access, network abuse, or system modifications.

2️⃣ Deterministic & Reproducible Runs 📦
By running agents in containerized environments, you ensure that every execution happens in a controlled and predictable setting—no more environment mismatches or dependency issues!

3️⃣ Resource Control & Limits 🚦
Docker and E2B allow you to enforce CPU, memory, and execution time limits, so rogue or inefficient agents don’t spiral out of control.

4️⃣ Safer Code Execution in Production 🏭
Deploy AI agents confidently, knowing that any generated code runs in an ephemeral, isolated environment, protecting your host machine and infrastructure.

5️⃣ Easy to Integrate 🛠️
With smolagents, you can simply configure your agent to use Docker or E2B as its execution backend—no need for complex security setups!

6️⃣ Perfect for Autonomous AI Agents 🤖
If your AI agents generate and execute code dynamically, this is a must-have to avoid security pitfalls while enabling advanced automation.

⚡ Get started now: https://github.com/huggingface/smolagents

What will you build with smolagents? Let us know! 🚀💡

mariagrandury

updated a dataset 4 days ago

la-leaderboard/requests

Viewer • Updated 4 days ago • 63 • 2.39k

mariagrandury

updated a dataset 7 days ago

la-leaderboard/results

Updated 7 days ago • 1.61k

mariagrandury

in la-leaderboard/la-leaderboard 7 days ago

It takes too long to get reuslts?

#16 opened 16 days ago by

StarscreamDeceptions

Algunas preguntas sobre la velocidad de evaluación

#15 opened 22 days ago by

StarscreamDeceptions

mariagrandury

updated a Space 7 days ago

La Leaderboard

🌸

Evaluate open LLMs in the languages of LATAM and Spain.

mariagrandury

in la-leaderboard/la-leaderboard 7 days ago

runtime error

#17 opened 8 days ago by

TonySilva423

clefourrier

authored a paper 29 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published about 1 month ago • 198

albertvillanova

posted an update about 1 month ago

Post

3689

🚀 Introducing @huggingface Open Deep-Research💥

In just 24 hours, we built an open-source agent that:
✅ Autonomously browse the web
✅ Search, scroll & extract info
✅ Download & manipulate files
✅ Run calculations on data

55% on GAIA validation set! Help us improve it!💡
https://huggingface.co./blog/open-deep-research

3 replies

javicond3

authored 5 papers about 1 month ago

Spanish and LLM Benchmarks: is MMLU Lost in Translation?

Paper • 2406.17789 • Published May 28, 2024 • 1

How Stable is Stable Diffusion under Recursive InPainting (RIP)?

Paper • 2407.09549 • Published Jun 27, 2024 • 1

Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal

Paper • 2408.16012 • Published Aug 16, 2024

Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail?

Paper • 2409.15334 • Published Sep 8, 2024 • 1

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16 • 29

gonzmart

authored a paper about 2 months ago

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16 • 29

reviriego

authored a paper about 2 months ago

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16 • 29

albertvillanova

posted an update about 2 months ago

Post

2093

Discover all the improvements in the new version of Lighteval: https://huggingface.co./docs/lighteval/

clefourrier

authored a paper 3 months ago

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published Dec 4, 2024 • 18

SaylorTwift

posted an update 4 months ago

Post

562

How do I test an LLM for my unique needs?
If you work in finance, law, or medicine, generic benchmarks are not enough.
This blog post uses Argilla, Distilllabel and 🌤️Lighteval to generate evaluation dataset and evaluate models.

https://github.com/argilla-io/argilla-cookbook/blob/main/domain-eval/README.md

albertvillanova

posted an update 4 months ago

Post

1806

🚨 How green is your model? 🌱 Introducing a new feature in the Comparator tool: Environmental Impact for responsible #LLM research!
👉 open-llm-leaderboard/comparator
Now, you can not only compare models by performance, but also by their environmental footprint!

🌍 The Comparator calculates CO₂ emissions during evaluation and shows key model characteristics: evaluation score, number of parameters, architecture, precision, type... 🛠️
Make informed decisions about your model's impact on the planet and join the movement towards greener AI!

AI & ML interests

Recent Activity

Team members 29

la-leaderboard's activity

It takes too long to get reuslts?

Algunas preguntas sobre la velocidad de evaluación

La Leaderboard

runtime error