Arabic AI Benchmarks and Leaderboards
Over the past year, numerous benchmarks have been conducted to test various aspects of Arabic AI technologies, including LLM performance, Multimodality/Vision, Embedding, Retrieval, RAG Generation, SST, and OCR. This post serves as a comprehensive record of all benchmarks and leaderboards within the Arabic AI ecosystem. Our goal is to provide a centralized resource for the community to easily access and identify the appropriate benchmark for their evaluation tasks or to choose the top model for a specific task.
Leaderboards
Below is a list of leaderboards testing various aspects of Arabic AI Models
LLM Performance
Name |
What does it evaluate? |
Link |
Comments |
Open Arabic LLM Leaderboard (OALL) v2 |
General Knowledge, MMLU, Grammar, RAG Generation, Trust & Safety, Sentiment Analysis & Dialects |
https://huggingface.co./spaces/OALL/Open-Arabic-LLM-Leaderboard |
v1 legacy |
AraGen |
Question Answering, Orthographic and Grammatical Analysis, Reasoning, Safety |
https://huggingface.co./spaces/inceptionai/AraGen-Leaderboard |
Closed datasets |
Scale Seal |
Coding, Creative, Educational Support, Idea Development,Writing & Communication and others |
https://scale.com/leaderboard/arabic |
Closed datasets, evaluated manually by human experts |
Embeddings
Vision / OCR
Speech
Tokenizers
Benchmarking datasets
Below is a non-comprehensive list of benchmarking dataset, it will grow by time.
Note:There are numerous research datasets available for benchmarking purposes, but in this list, we will focus on the most popular ones and the datasets which are commonly used in research papers to evaluate Arabic models.
General purpose
RAG
OCR
| KITAB-Bench | handwritten text, structured tables, and specialized coverage of 21
chart types for business intelligence | https://huggingface.co./collections/ahmedheakl/kitab-bench-677dd5d88d5db344d5595b78 | |
MMLU Arabic
Benchmark is missing?
If you believe that a benchmark or leaderboard is not included in the list, please leave a comment below so we can consider adding it.