File size: 9,328 Bytes
a8e184c 8d1ec92 a8e184c 7ab6058 8d1ec92 a8e184c abd3a64 a8e184c b106b6f a8e184c a1c6330 d91a87f f367187 a8e184c 77da07e a8e184c a1c6330 a8e184c c7af964 a8e184c f367187 a8e184c 06b0a44 a1c6330 06b0a44 5a25d96 06b0a44 a8e184c 9ab0916 a8e184c 66bf2c0 a8e184c 66bf2c0 06b0a44 66bf2c0 06b0a44 66bf2c0 a8e184c 66bf2c0 a8e184c 06b0a44 a8e184c 06b0a44 a8e184c 9a72eea c13e675 9a72eea 42a2272 77da07e 9ab0916 77da07e a8e184c a75a043 a8e184c bdea432 a8e184c 3a214a9 a8e184c 9ab0916 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 |
---
license: gemma
library_name: transformers
pipeline_tag: text-generation
extra_gated_button_content: Acknowledge license
tags:
- rag
- qa
language:
- ar
- en
model-index:
- name: SILMA-Kashif-2B-Instruct-v1.0
results:
- task:
type: text-generation
dataset:
name: SILMA RAGQA Benchmark Dataset V1.0
type: silma-ai/silma-rag-qa-benchmark-v1.0
metrics:
- name: SILMA RAGQA Benchmark Score
type: Average of Exact Match, BLEU, ROUGE, and BERTScore.
value: 0.347
source:
name: SILMA RAGQA Benchmark
url: https://huggingface.co./datasets/silma-ai/silma-rag-qa-benchmark-v1.0
- task:
type: text-generation
dataset:
name: OALL (All)
type: OALL/Open-Arabic-LLM-Leaderboard
metrics:
- name: acc_norm
type: loglikelihood_acc_norm
value: 44.61
source:
name: Open Arabic LLM Leaderboard
url: https://huggingface.co./spaces/OALL/Open-Arabic-LLM-Leaderboard
metrics:
- bleu
- bertscore
- rouge
- exact_match
---
## SILMA Kashif v1.0 (The Arabic RAG Model)
* **SILMA Kashif 2B Instruct v1.0** is the premier release within the SILMA Kashif Family of models, specifically designed for **RAG** (Retrieval-Augmented Generation) tasks
* Kashif excels in a specific task, answering questions based on contextual pieces in both Arabic and English. In addition, the model is also capable of performing Entity Extraction tasks as a minor skill
* SILMA Kashif 2B v1.0 stands out as the top-performing open model in RAG within the 3-9 billion parameter range based on our evaluations using [SILMA RAGQA Benchmark](https://huggingface.co./datasets/silma-ai/silma-rag-qa-benchmark-v1.0)
* SILMA Kashif is built on the powerful foundational models of Google Gemma, merging their strengths to deliver unmatched performance for users
* Kashif is an open-weight model, free to use in accordance with our open license
* Finally, the model comes with a context length of 12k
**Important note:** Kashif is a specialized model which should ONLY be used in RAG setups. If you are looking for a general purpose model, please refer to [SILMA 9B Instruct v1.0](https://huggingface.co./silma-ai/SILMA-9B-Instruct-v1.0)
## Model Skills and Capabilities
The model underwent intensive training to master a wide range of tasks and excel in performance.
- The ability to answer questions in Arabic and English
- The ability to deal with short and long contexts
- The ability to provide short and long answers effectively
- The ability to answer complex numerical questions
- The ability to answer questions based on tabular data
- Answering multi-hop questions: The ability to answer a single question using pieces of data from multiple paragraphs
- Negative rejection: The ability to identify and exclude inaccurate answers, and provide a more accurate statement such as "The answer cannot be found in the given context"
- Multi-domains: The ability to answer questions based on texts from different fields such as finance, medical, legal, etc.
- The ability to deal with ambiguous contexts
- The ability to extract entities from text
- Ability to deal with diverse and complex prompts
## Model Evaluation

|Dataset | exact_match | rouge1 | bleu | bertscore|
|---|---|---|---|---|
|ragbench-finqa-en-test | 0.000 | 0.587 | 0.321 | 0.760|
|ragbench-tatqa-ar-test | 0.000 | 0.484 | 0.130 | 0.774|
|ragbench-tatqa-en-test | 0.059 | 0.646 | 0.423 | 0.808|
|rag-instruct-benchmark-tester-en | 0.370 | 0.683 | 0.196 | 0.791|
|ragbench-expertqa-en-test |0.000 | 0.465 | 0.151 | 0.677|
|ragbench-msmarco-ar-test |0.000 | 0.144 | 0.096 | 0.781|
|sciq-ar-test |0.170 | 0.000 | 0.000 | 0.753|
|ragbench-covidqa-en-test |0.020 | 0.521 | 0.242 | 0.734|
|ragbench-emanual-ar-test |0.000 | 0.237 | 0.159 | 0.806|
|ragbench-finqa-ar-test |0.000 | 0.377 | 0.109 | 0.780|
|xquad-r-validation-en |0.120 | 0.326 | 0.041 | 0.603|
|ragbench-emanual-en-test |0.000 | 0.565 | 0.288 | 0.722|
|xquad-r-ar-validation |0.070 | 0.130 | 0.042 | 0.698|
|boolq-ar-test |0.450 | 0.000 | 0.000 | 0.700|
|ragbench-hotpotqa-en-test |0.060 | 0.732 | 0.503 | 0.837|
|ragbench-covidqa-ar-test |0.000 | 0.179 | 0.104 | 0.783|
|ragbench-msmarco-en-test |0.020 | 0.491 | 0.207 | 0.729|
|### Benchmark Average Scores |0.079 | 0.386 | 0.177 | 0.749|
SILMA RAG QA Benchmark Score: 0.3478
## SILMA AI
[silma.ai](https://silma.ai) is a leading GenAI startup that excels in building and tailoring cutting-edge Large Language Models (LLMs) and AI technologies for the Arabic language
### Usage
Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:
```sh
pip install -U transformers
```
Then, copy the snippet from the section below
#### Running with the `pipeline` API
```python
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="silma-ai/SILMA-Kashif-2B-Instruct-v1.0",
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda", # replace with "mps" to run on a Mac device
)
messages = [
{"role": "user", "content":
"""
أجب على السؤال بناءً على السياق أدناه
السياق:
تشمل الاتفاقيات رسوم حمل سنوية ثابت قدها 30 مليون جنيه إسترليني للقنوات نظراً لأن كلاً من مزوديها قادرين على تأمين دفعات إضافية إذا ما حققت هذه القنوات أهدافاً متعلقةً بالأداء.
لا يوجد حالياً ما يشير إلى ما إذا كان الاتفاق الجديد يشمل محتوىً إضافياً كالفيديو عند الطلب والدقة العالية ، كذلك الذي سبق أن قدمته بي سكاي بي.
وقد وافقت كل من بي سكاي بي و فيرجين ميديا على إنهاء الدعاوى القضائية بالمحكمة العليا ضد بعضهما بشأن معاليم الحمل التي تخص قنواتهما الأساسية.
السؤال: ماسم الشركة التي وافقت على إنهاء دعواها القضائية ضد بي سكاي بي بالمحكمة العليا؟
الإجابة:
"""},
]
outputs = pipe(messages, max_new_tokens=600)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)
```
- Response:
```text
فيرجين ميديا
"وقد وافقت كل من بي سكاي بي و فيرجين ميديا على إنهاء الدعاوى القضائية بالمحكمة العليا ضد بعضهما بشأن معاليم الحمل التي تخص قنواتهما الأساسية."
```
Note: for advanced usage examples such as multi-gpu, quantization or chat templates, please refer to [SILMA v1.0](https://huggingface.co./silma-ai/SILMA-9B-Instruct-v1.0#running-the-model-on-a-single--multi-gpu) examples
### Runing with Ollama
```sh
ollama run hf.co/silma-ai/SILMA-Kashif-2B-Instruct-v1.0-GGUF
```
### Prompt Format
Here is a recommended way to prompt the model. You can modify the prompt based on your specific requirements, but if you encounter any challenges, following the format below in which we used to train the model may be helpful.
- Arabic
```text
أجب على السؤال بناءً على السياق أدناه
السياق:
.....
.....
السؤال: ...
الإجابة: ...
```
- English
```text
Answer the following question using the provided context below
Context:
.....
.....
Question: ...
Answer: ...
```
### GPU Requirements
The following are the minimum/recommended GPU requirements for running inference:
* Recommended
* At least one GPU with a minimum of 24 GB of GPU memory
* Examples: Nvidia RTX 4090
* Minimum
* At least one GPU with 8 GB of GPU memory
* Examples: Nvidia RTX 3070, RTX 3080 or T4
## Effect of Quantization
We have seen 2.6% drop in score (to 0.338) for the same model quantized 4bit
### Citation
```none
@article{silma_01_2025,
title={SILMA Kashif 2B Instruct v1.0},
url={https://huggingface.co./silma-ai/SILMA-Kashif-2B-Instruct-v1.0},
publisher={SILMA AI},
author={Silma Team},
year={2025}
}
```
### Intended Usage
* The model should only be used in question answering use-cases such as RAG
* The model can also be used to extract entities from text
### Limitations
* Because it has few parameters, we've noticed that the model isn't very effective for handling complex numerical and financial reasoning, such as solving tricky calculations
* The model has been trained specifically for text-based question answering, which may limit its ability to perform tasks beyond this scope, including simple tasks |