AdithyaSK commited on
Commit
cc8b0e5
β€’
1 Parent(s): 76b14d7

final leaderboard update - Adithya S K

Browse files
Files changed (1) hide show
  1. app.py +77 -8
app.py CHANGED
@@ -56,7 +56,7 @@ def main():
56
  if st.button("Refresh", type="primary"):
57
  data = get_data()
58
 
59
- Leaderboard_tab, About_tab ,FAQ_tab, Submit_tab = st.tabs(["πŸ… Leaderboard", "πŸ“ About" , "❗FAQ","πŸš€ Submit"])
60
 
61
  with Leaderboard_tab:
62
  data = get_data()
@@ -135,7 +135,7 @@ def main():
135
  with col2:
136
  language_options = st.multiselect(
137
  'Pick Languages',
138
- ['kannada', 'hindi', 'tamil', 'telegu','gujarathi','marathi','malayalam',"english"],['kannada', 'hindi', 'tamil', 'telegu','gujarathi','marathi','malayalam',"english"])
139
  if on:
140
  # Loop through each selected language
141
  for language in language_options:
@@ -217,7 +217,82 @@ def main():
217
  compare_df.index += 1
218
  st.dataframe(compare_df, use_container_width=True)
219
 
 
 
 
 
220
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
221
 
222
  # About tab
223
  with About_tab:
@@ -243,12 +318,6 @@ After releasing [Amabri, a 7b parameter English-Kannada bilingual LLM](https://w
243
  - [Indic-Eval](https://github.com/adithya-s-k/indic_eval): A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse range of tasks, aiding in performance assessment and comparison within the Indian language context.
244
  - [Indic LLM Leaderboard](https://huggingface.co/spaces/Cognitive-Lab/indic_llm_leaderboard): Utilizes the [indic_eval](https://github.com/adithya-s-k/indic_eval) evaluation framework, incorporating state-of-the-art translated benchmarks like ARC, Hellaswag, MMLU, among others. Supporting seven Indic languages, it offers a comprehensive platform for assessing model performance and comparing results within the Indic language modeling landscape.
245
 
246
- ## **Upcoming implementations**
247
-
248
- - [ ] Support to add VLLM for faster evaluation and inference
249
- - [ ] SkyPilot installation to quickly run indic_eval on any cloud provider
250
- - [ ] Add support for onboard evaluation just like OpenLLM Leaderboard
251
-
252
  **Contribute**
253
 
254
  All the projects are completely open source with different licenses, so anyone can contribute.
 
56
  if st.button("Refresh", type="primary"):
57
  data = get_data()
58
 
59
+ Leaderboard_tab, Release_tab, About_tab ,FAQ_tab, Submit_tab = st.tabs(["πŸ… Leaderboard", "(Ξ±) Release" ,"πŸ“ About" , "❗FAQ","πŸš€ Submit"])
60
 
61
  with Leaderboard_tab:
62
  data = get_data()
 
135
  with col2:
136
  language_options = st.multiselect(
137
  'Pick Languages',
138
+ ['kannada', 'hindi', 'tamil', 'telegu','gujarati','marathi','malayalam',"english"],['kannada', 'hindi', 'tamil', 'telegu','gujarati','marathi','malayalam',"english"])
139
  if on:
140
  # Loop through each selected language
141
  for language in language_options:
 
217
  compare_df.index += 1
218
  st.dataframe(compare_df, use_container_width=True)
219
 
220
+ with Release_tab:
221
+ st.markdown(
222
+ """
223
+ **Date: April 5th, 2024**
224
 
225
+ the alpha release of the **Indic LLM Leaderboard** and **Indic Eval**.
226
+
227
+ The Indic LLM Leaderboard is an evolving platform, aiming to streamline evaluations for Language Model (LLM) models tailored to Indic languages. While this **alpha release is far from perfect**, it signifies a crucial initial step towards establishing evaluation standards within the community.
228
+
229
+ ### Features:
230
+
231
+ As of this release, the following base models have been evaluated in using the different datasets and benchmarks integrated into the platform:
232
+
233
+ - `meta meta-llama/Llama-2-7b-hf`
234
+ - `google/gemma-7b`
235
+
236
+ Tasks incorporated into the platform:
237
+
238
+ - `ARC-Easy:{language}`
239
+ - `ARC-Challenge:{language}`
240
+ - `Hellaswag:{language}`
241
+
242
+ For evaluation purposes, each task includes 5-shot prompting. Further experimentation will determine the most optimal balance between evaluation time and accuracy.
243
+
244
+ ### Datasets:
245
+
246
+ Datasets utilized for evaluation are accessible via the following link: [Indic LLM Leaderboard Eval Suite](https://huggingface.co/collections/Cognitive-Lab/indic-llm-leaderboard-eval-suite-660ac4818695a785edee4e6f)
247
+
248
+ ### Rationale for Alpha Release:
249
+
250
+ The decision to label this release as alpha stems from the realization that extensive testing and experimentation are necessary. Key considerations include:
251
+
252
+ - Selection of appropriate metrics for evaluation
253
+ - Determination of the optimal few-shot learning parameters
254
+ - Establishment of the ideal number of evaluation samples within the dataset
255
+
256
+ ### Collaborative Effort:
257
+
258
+ To foster collaboration and discussion surrounding evaluations, a [WhatsApp group](https://chat.whatsapp.com/CUb6eS50lX2JHX2D4j13d1) is being established.
259
+
260
+ and we can also connect on Hugging faces discord [indic_llm channel](https://discord.com/channels/879548962464493619/1189605147068858408)
261
+
262
+ ### Roadmap for Next Release:
263
+
264
+ Anticipate the following enhancements in the upcoming release:
265
+
266
+ - Enhanced testing and accountability mechanisms
267
+ - A refined version of the leaderboard
268
+ - Defined benchmarks and standardized datasets
269
+ - Bilingual evaluation support
270
+ - Expansion of supported models
271
+ - Implementation of more secure interaction mechanisms
272
+ - Addition of support for additional languages
273
+
274
+ ### Benchmarks to be added/tested
275
+
276
+ - [ ] Boolq
277
+ - [ ] MMLU
278
+ - [ ] Translation - [IN22-Gen](https://huggingface.co/datasets/ai4bharat/IN22-Gen), [Flores](https://huggingface.co/datasets/facebook/flores)
279
+ - [ ] Generation - [ai4bharat/IndicSentiment](https://huggingface.co/datasets/ai4bharat/IndicSentiment), etc..
280
+
281
+ Upcoming Implementations
282
+
283
+ - [ ] Support to add VLLM for faster evaluation and inference
284
+ - [ ] Add support for onboard evaluation just like OpenLLM Leaderboard
285
+
286
+ ## Conclusion:
287
+
288
+ The alpha release of the Indic LLM Leaderboard and Indic Eval signifies a significant milestone in the pursuit of standardized evaluations for Indic language models. We invite contributions and feedback from the community to further enhance and refine these tools.
289
+
290
+ For more information and updates, visit [Indic LLM Leaderboard](https://huggingface.co/spaces/Cognitive-Lab/indic_llm_leaderboard) and [Indic Eval](https://github.com/adithya-s-k/indic_eval).
291
+
292
+ Thank you for your interest and support.
293
+
294
+ """
295
+ )
296
 
297
  # About tab
298
  with About_tab:
 
318
  - [Indic-Eval](https://github.com/adithya-s-k/indic_eval): A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse range of tasks, aiding in performance assessment and comparison within the Indian language context.
319
  - [Indic LLM Leaderboard](https://huggingface.co/spaces/Cognitive-Lab/indic_llm_leaderboard): Utilizes the [indic_eval](https://github.com/adithya-s-k/indic_eval) evaluation framework, incorporating state-of-the-art translated benchmarks like ARC, Hellaswag, MMLU, among others. Supporting seven Indic languages, it offers a comprehensive platform for assessing model performance and comparing results within the Indic language modeling landscape.
320
 
 
 
 
 
 
 
321
  **Contribute**
322
 
323
  All the projects are completely open source with different licenses, so anyone can contribute.