updated content on different tabs - Adithya S K
Browse files
app.py
CHANGED
@@ -153,34 +153,150 @@ def main():
|
|
153 |
# About tab
|
154 |
with About_tab:
|
155 |
st.markdown('''
|
156 |
-
|
157 |
-
|
158 |
-
|
159 |
-
|
160 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
161 |
''')
|
162 |
|
163 |
# FAQ tab
|
164 |
with FAQ_tab:
|
165 |
st.markdown('''
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
-
|
170 |
-
|
171 |
-
|
172 |
-
|
173 |
-
|
174 |
-
|
175 |
-
|
176 |
-
|
177 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
178 |
''')
|
179 |
|
180 |
# Submit tab
|
181 |
with Submit_tab:
|
182 |
st.markdown('''
|
183 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
184 |
''')
|
185 |
|
186 |
|
|
|
153 |
# About tab
|
154 |
with About_tab:
|
155 |
st.markdown('''
|
156 |
+
## **Why a Indic LLM Leaderboard is Required ?**
|
157 |
+
|
158 |
+
In recent months, there has been considerable progress in the Indic large language model (LLM) space. Major startups like Sarvam and Krutrim are building LLMs in this area.
|
159 |
+
Simultaneously, the open-source community is also adapting pretrained models, such as Llama, Mistral, and Gemma, for Indic languages.
|
160 |
+
Despite the influx of new models, there is a lack of a unified method to evaluate and compare them. This makes it challenging to track progress and determine what is working and what is not.
|
161 |
+
|
162 |
+
> This is the alpha release of the Indic LLM Leaderboard, and modifications will be made to the leaderboard in the future.
|
163 |
+
>
|
164 |
+
|
165 |
+
## **Who We Are**
|
166 |
+
|
167 |
+
I'm [Adithya S K](https://linktr.ee/adithyaskolavi), the founder of [CognitiveLab](https://www.cognitivelab.in/). We provide AI solutions at scale and undertake research-based tasks.
|
168 |
+
|
169 |
+
One initiative we have taken is to create a unified platform where Indic LLMs can be compared using specially crafted datasets. Although initially developed for internal use, we are now open-sourcing this framework to further aid the Indic LLM ecosystem.
|
170 |
+
|
171 |
+
After releasing [Amabri, a 7b parameter English-Kannada bilingual LLM](https://www.cognitivelab.in/blog/introducing-ambari), we wanted to compare it with other open-source LLMs to identify areas for improvement. As there wasn't an existing solution, we built the Indic LLM suite, which consists of three projects:
|
172 |
+
|
173 |
+
- [Indic-llm](https://github.com/adithya-s-k/Indic-llm): An open-source framework designed to adapt pretrained LLMs, such as Llama, Mistral, and Mixtral, to a wide array of domains and languages.
|
174 |
+
- [Indic-Eval](https://github.com/adithya-s-k/indic_eval): A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse range of tasks, aiding in performance assessment and comparison within the Indian language context.
|
175 |
+
- [Indic LLM Leaderboard](https://huggingface.co/spaces/Cognitive-Lab/indic_llm_leaderboard): Utilizes the [indic_eval](https://github.com/adithya-s-k/indic_eval) evaluation framework, incorporating state-of-the-art translated benchmarks like ARC, Hellaswag, MMLU, among others. Supporting seven Indic languages, it offers a comprehensive platform for assessing model performance and comparing results within the Indic language modeling landscape.
|
176 |
+
|
177 |
+
## **Upcoming implementations**
|
178 |
+
|
179 |
+
- [ ] Support to add VLLM for faster evaluation and inference
|
180 |
+
- [ ] SkyPilot installation to quickly run indic_eval on any cloud provider
|
181 |
+
- [ ] Add support for onboard evaluation just like OpenLLM Leaderboard
|
182 |
+
|
183 |
+
**Contribute**
|
184 |
+
|
185 |
+
All the projects are completely open source with different licenses, so anyone can contribute.
|
186 |
+
|
187 |
+
The current leaderboard is in alpha release, and many more changes are forthcoming:
|
188 |
+
|
189 |
+
- More robust benchmarks tailored for Indic languages.
|
190 |
+
- Easier integration with [indic_eval](https://github.com/adithya-s-k/indic_eval).
|
191 |
''')
|
192 |
|
193 |
# FAQ tab
|
194 |
with FAQ_tab:
|
195 |
st.markdown('''
|
196 |
+
**What is the minimum requirement for GPUs to run the evaluation?**
|
197 |
+
|
198 |
+
- The evaluation can easily run on a single A100 GPU, but the framework also supports multi-GPU based evaluation to speed up the process.
|
199 |
+
|
200 |
+
**What languages are supported by the evaluation framework?**
|
201 |
+
|
202 |
+
- The following languages are supported by default: English, Kannada, Hindi, Tamil, Telugu, Gujarati, Marathi, Malayalam.
|
203 |
+
|
204 |
+
**How can I put my model on the leaderboard?**
|
205 |
+
|
206 |
+
- Please follow the steps shown in the Submit tab or refer to the indic_eval for more details.
|
207 |
+
|
208 |
+
**How does the leaderboard work?**
|
209 |
+
|
210 |
+
- After running indic_eval on the model of your choice, the results are pushed to a server and stored in a database. The Frontend Leaderboard accesses the server and retrieves the latest models in the database along with their respective benchmarks and metadata. The entire system is deployed in India and is as secure as possible.
|
211 |
+
|
212 |
+
**How is it different from the Open LLM leaderboard?**
|
213 |
+
|
214 |
+
- This project was mainly inspired by the Open LLM leaderboard. However, due to limited computation resources, we standardized the evaluation library with standard benchmarks. You can run the evaluation on your GPUs and the leaderboard will serve as a unified platform to compare models. We used indictrans2 and other translation APIs to translate the benchmarking dataset into seven Indian languages to ensure reliability and consistency in the output.
|
215 |
+
|
216 |
+
**Why does it take so much time to load the results?**
|
217 |
+
|
218 |
+
- We are running the server on a serverless instance which has a cold start problem, so it might sometimes take a while.
|
219 |
+
|
220 |
+
**What benchmarks are offered?**
|
221 |
+
|
222 |
+
- The current Indic Benchmarks offered by the indic_eval library can be found in this collection: https://huggingface.co/collections/Cognitive-Lab/indic-llm-leaderboard-eval-suite-660ac4818695a785edee4e6f. They include ARC Easy, ARC Challenge, Hellaswag, Boolq, and MMLU.
|
223 |
+
|
224 |
+
**How much time does it take to run the evaluation using indic_eval?**
|
225 |
+
|
226 |
+
- Depending on which GPU you are running, the time for evaluation varies.
|
227 |
+
- From our testing, it takes 3 to 4 hours to run the whole evaluation on a single GPU.
|
228 |
+
- It's much faster when using multiple GPUs.
|
229 |
+
|
230 |
+
**How does the verification step happen?**
|
231 |
+
|
232 |
+
- While running the evaluation, you are given an option to push results to the leaderboard with `-push_to_leaderboard <[email protected]>`. You will need to provide an email address through which we can contact you. If we find any anomaly in the evaluation score, we will contact you through this email for verification of results.
|
233 |
''')
|
234 |
|
235 |
# Submit tab
|
236 |
with Submit_tab:
|
237 |
st.markdown('''
|
238 |
+
Here are the steps you will have to follows to put your model on the Indic LLM leaderboard
|
239 |
+
|
240 |
+
Clone the repo:
|
241 |
+
|
242 |
+
```bash
|
243 |
+
git clone <https://github.com/adithya-s-k/indic_eval>
|
244 |
+
cd indic_eval
|
245 |
+
|
246 |
+
```
|
247 |
+
|
248 |
+
Create a virtual environment using virtualenv or conda depending on your preferences. We require Python 3.10 or above:
|
249 |
+
|
250 |
+
```bash
|
251 |
+
conda create -n indic-eval-venv python=3.10 && conda activate indic-eval-venv
|
252 |
+
|
253 |
+
```
|
254 |
+
|
255 |
+
Install the dependencies. For the default installation, you just need:
|
256 |
+
|
257 |
+
```bash
|
258 |
+
pip install .
|
259 |
+
|
260 |
+
```
|
261 |
+
|
262 |
+
If you want to evaluate models with frameworks like `accelerate` or `peft`, you will need to specify the optional dependencies group that fits your use case (`accelerate`,`tgi`,`optimum`,`quantization`,`adapters`,`nanotron`):
|
263 |
+
|
264 |
+
```bash
|
265 |
+
pip install '.[optional1,optional2]'
|
266 |
+
|
267 |
+
```
|
268 |
+
|
269 |
+
The setup tested most is:
|
270 |
+
|
271 |
+
```bash
|
272 |
+
pip install '.[accelerate,quantization,adapters]'
|
273 |
+
|
274 |
+
```
|
275 |
+
|
276 |
+
If you want to push your results to the Hugging Face Hub, don't forget to add your access token to the environment variable `HUGGING_FACE_HUB_TOKEN`. You can do this by running:
|
277 |
+
|
278 |
+
```
|
279 |
+
huggingface-cli login
|
280 |
+
```
|
281 |
+
|
282 |
+
## Command to Run Indic Eval and Push to Indic LLM Leaderboard
|
283 |
+
|
284 |
+
```bash
|
285 |
+
accelerate launch run_indic_evals_accelerate.py \\
|
286 |
+
--model_args="pretrained=<path to model on the hub>" \\
|
287 |
+
--tasks indic_llm_leaderboard \\
|
288 |
+
--output_dir output_dir \\
|
289 |
+
--push_to_leaderboard <[email protected]> \\
|
290 |
+
|
291 |
+
```
|
292 |
+
|
293 |
+
It's as simple as that.👍
|
294 |
+
|
295 |
+
For `--push_to_leaderboard`, provide an email id through which we can contact you in case of verification. This email won't be shared anywhere. It's only required for future verification of the model's scores and for authenticity.
|
296 |
+
|
297 |
+
After you have installed all the required packages, run the following command:
|
298 |
+
|
299 |
+
For multi-GPU configuration, please refer to the docs of [Indic_Eval](https://github.com/adithya-s-k/indic_eval).
|
300 |
''')
|
301 |
|
302 |
|