1024m commited on
Commit
098eb06
·
verified ·
1 Parent(s): b8c6e4e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -6
README.md CHANGED
@@ -162,7 +162,7 @@ While Phi-4-Hindi is a powerful bilingual model designed for Hindi and English,
162
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
163
 
164
  <!-- The model is trained on publicly available data which was in part curated by Inception. -->
165
- ~~While efforts have been made to minimize biases, it is likely that the model, as with all LLM models, will exhibit some bias.
166
 
167
  The model is trained as an AI assistant for Hindi and English speakers. The model is limited to produce responses for queries in these two languages
168
  and may not produce appropriate responses to other language queries.
@@ -175,17 +175,16 @@ We are continuously working to develop models with greater capabilities, and as
175
  We evaluated our models on multiple well-known benchmarks to measure their effectiveness against other leading models, and the results are as follows:
176
 
177
 
178
- | Model | ARC-C | ARC-E | BoolQ | CMCQ | MMLU | Average* | MMLU-Pro | GPQA | MuSR | BBH | MATH |
179
  |---------------------------------|-------|-------|-------|-------|-------|----------|----------|------|-------|-------|-------|
180
  | AryaBhatta-GemmaUltra-8.5B | 22.70 | 25.04 | 22.95 | 62.23 | 23.70 | 31.32 | 22.66 | 25.34| 42.72 | 41.12 | 2.95 |
181
  | Airavata-7B | 25.09 | 30.47 | 25.31 | 62.17 | 33.20 | 35.25 | 16.35 | 27.43| 37.57 | 36.00 | 13.60 |
182
  | sarvam-1-2B | 30.03 | 33.25 | 62.17 | 42.80 | 27.90 | 39.23 | - | - | - | - | - |
183
  | Nemotron-4-Mini-Hindi-Instruct | 55.80 | 71.63 | 62.11 | 68.10 | 43.20 | 60.17 | 25.95 | 30.87| 41.53 | 40.11 | 2.04 |
184
- | Llama-3-Nanda-10B-Chat | 65.36 | 80.64 | 82.29 | 67.60 | 50.61 | 69.30 | - | - | - | - | - |
185
  | Krutrim-2-12b-instruct | 67.32 | 81.10 | 84.74 | 76.30 | 56.10 | 73.11 | - | - | - | - | - |
186
  | aya-expanse-8b | 74.06 | 87.08 | 86.45 | 83.30 | 56.89 | 77.56 | 30.04 | 30.29| 37.17 | 49.42 | 7.02 |
187
  | aya-expanse-32B | 85.41 | **95.08** | **90.43** | 89.80 | 69.71 | 86.08 | 41.30 | 32.55| 38.62 | 56.29 | 13.37 |
188
- |---------------------------------|-------|-------|-------|-------|-------|----------|----------|------|-------|-------|-------|
189
  | **Our Qwen Model (14b)** | 90.61 | 94.82 | 88.53 | **90.70** | 75.00 | 87.93 | **52.63** | 36.24 | 44.84 | 64.97 | **25.08** |
190
  | **Our Phi Model (14b)** | **97.39** | 92.24 | 87.65 | 87.40 | **75.59** | **88.05** | 52.39 | **39.77** | **49.07** | **66.97** | 23.11 |
191
 
@@ -203,8 +202,7 @@ We evaluated our models on multiple well-known benchmarks to measure their effec
203
  | Krutrim-2-12b-instruct | 56.83 | 70.66 | 78.86 | 64.10 | 46.51 | 63.39 |
204
  | aya-expanse-8b | 57.42 | 72.90 | 80.42 | 69.00 | 43.39 | 64.63 |
205
  | aya-expanse-32B | 73.29 | 85.48 | **87.73** | **79.70** | **56.96** | 76.63 |
206
- |------------------------------------|-------|-------|-------|-------|-------|---------|
207
- | **Our Qwen Model (14b)** | 74.06 | 81.23 | 84.07 | 78.20 | 53.85 | **74.82** |
208
  | **Our Phi Model (14b)** | **81.74** | **89.06** | 86.02 | 78.70 | 56.39 | **78.38** |
209
 
210
  **Table 2: Metrics (.2f) of our models and other LLMs over several Hindi benchmarks**
@@ -248,3 +246,8 @@ It is advisable for users to:
248
  - Continuously assess the model to ensure compliance with ethical standards.
249
  - Be mindful of potential biases and unintended outputs, especially in critical applications.
250
 
 
 
 
 
 
 
162
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
163
 
164
  <!-- The model is trained on publicly available data which was in part curated by Inception. -->
165
+ While efforts have been made to minimize biases, it is likely that the model, as with all LLM models, will exhibit some bias.
166
 
167
  The model is trained as an AI assistant for Hindi and English speakers. The model is limited to produce responses for queries in these two languages
168
  and may not produce appropriate responses to other language queries.
 
175
  We evaluated our models on multiple well-known benchmarks to measure their effectiveness against other leading models, and the results are as follows:
176
 
177
 
178
+ | Model | ARC-C | ARC-E | BoolQ | CMCQ | MMLU | Average* | MMLU-Pro | GPQA | MuSR | BBH | MATH-Hard |
179
  |---------------------------------|-------|-------|-------|-------|-------|----------|----------|------|-------|-------|-------|
180
  | AryaBhatta-GemmaUltra-8.5B | 22.70 | 25.04 | 22.95 | 62.23 | 23.70 | 31.32 | 22.66 | 25.34| 42.72 | 41.12 | 2.95 |
181
  | Airavata-7B | 25.09 | 30.47 | 25.31 | 62.17 | 33.20 | 35.25 | 16.35 | 27.43| 37.57 | 36.00 | 13.60 |
182
  | sarvam-1-2B | 30.03 | 33.25 | 62.17 | 42.80 | 27.90 | 39.23 | - | - | - | - | - |
183
  | Nemotron-4-Mini-Hindi-Instruct | 55.80 | 71.63 | 62.11 | 68.10 | 43.20 | 60.17 | 25.95 | 30.87| 41.53 | 40.11 | 2.04 |
184
+ | Llama-3-Nanda-10B-Chat | 65.36 | 80.64 | 82.29 | 67.60 | 50.61 | 69.30 | 31.57 | 30.12| 43.52 | 49.38 | 5.59 |
185
  | Krutrim-2-12b-instruct | 67.32 | 81.10 | 84.74 | 76.30 | 56.10 | 73.11 | - | - | - | - | - |
186
  | aya-expanse-8b | 74.06 | 87.08 | 86.45 | 83.30 | 56.89 | 77.56 | 30.04 | 30.29| 37.17 | 49.42 | 7.02 |
187
  | aya-expanse-32B | 85.41 | **95.08** | **90.43** | 89.80 | 69.71 | 86.08 | 41.30 | 32.55| 38.62 | 56.29 | 13.37 |
 
188
  | **Our Qwen Model (14b)** | 90.61 | 94.82 | 88.53 | **90.70** | 75.00 | 87.93 | **52.63** | 36.24 | 44.84 | 64.97 | **25.08** |
189
  | **Our Phi Model (14b)** | **97.39** | 92.24 | 87.65 | 87.40 | **75.59** | **88.05** | 52.39 | **39.77** | **49.07** | **66.97** | 23.11 |
190
 
 
202
  | Krutrim-2-12b-instruct | 56.83 | 70.66 | 78.86 | 64.10 | 46.51 | 63.39 |
203
  | aya-expanse-8b | 57.42 | 72.90 | 80.42 | 69.00 | 43.39 | 64.63 |
204
  | aya-expanse-32B | 73.29 | 85.48 | **87.73** | **79.70** | **56.96** | 76.63 |
205
+ | **Our Qwen Model (14b)** | 74.06 | 81.23 | 84.07 | 78.20 | 53.85 | 74.82 |
 
206
  | **Our Phi Model (14b)** | **81.74** | **89.06** | 86.02 | 78.70 | 56.39 | **78.38** |
207
 
208
  **Table 2: Metrics (.2f) of our models and other LLMs over several Hindi benchmarks**
 
246
  - Continuously assess the model to ensure compliance with ethical standards.
247
  - Be mindful of potential biases and unintended outputs, especially in critical applications.
248
 
249
+ ### Team
250
+
251
+ - Ram Mohan Rao Kadiyala (@1024m)
252
+ - Siddartha Pullakhandam
253
+ -