Update README.md
Browse files
README.md
CHANGED
@@ -162,7 +162,7 @@ While Phi-4-Hindi is a powerful bilingual model designed for Hindi and English,
|
|
162 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
163 |
|
164 |
<!-- The model is trained on publicly available data which was in part curated by Inception. -->
|
165 |
-
|
166 |
|
167 |
The model is trained as an AI assistant for Hindi and English speakers. The model is limited to produce responses for queries in these two languages
|
168 |
and may not produce appropriate responses to other language queries.
|
@@ -175,17 +175,16 @@ We are continuously working to develop models with greater capabilities, and as
|
|
175 |
We evaluated our models on multiple well-known benchmarks to measure their effectiveness against other leading models, and the results are as follows:
|
176 |
|
177 |
|
178 |
-
| Model | ARC-C | ARC-E | BoolQ | CMCQ | MMLU | Average* | MMLU-Pro | GPQA | MuSR | BBH | MATH |
|
179 |
|---------------------------------|-------|-------|-------|-------|-------|----------|----------|------|-------|-------|-------|
|
180 |
| AryaBhatta-GemmaUltra-8.5B | 22.70 | 25.04 | 22.95 | 62.23 | 23.70 | 31.32 | 22.66 | 25.34| 42.72 | 41.12 | 2.95 |
|
181 |
| Airavata-7B | 25.09 | 30.47 | 25.31 | 62.17 | 33.20 | 35.25 | 16.35 | 27.43| 37.57 | 36.00 | 13.60 |
|
182 |
| sarvam-1-2B | 30.03 | 33.25 | 62.17 | 42.80 | 27.90 | 39.23 | - | - | - | - | - |
|
183 |
| Nemotron-4-Mini-Hindi-Instruct | 55.80 | 71.63 | 62.11 | 68.10 | 43.20 | 60.17 | 25.95 | 30.87| 41.53 | 40.11 | 2.04 |
|
184 |
-
| Llama-3-Nanda-10B-Chat | 65.36 | 80.64 | 82.29 | 67.60 | 50.61 | 69.30 |
|
185 |
| Krutrim-2-12b-instruct | 67.32 | 81.10 | 84.74 | 76.30 | 56.10 | 73.11 | - | - | - | - | - |
|
186 |
| aya-expanse-8b | 74.06 | 87.08 | 86.45 | 83.30 | 56.89 | 77.56 | 30.04 | 30.29| 37.17 | 49.42 | 7.02 |
|
187 |
| aya-expanse-32B | 85.41 | **95.08** | **90.43** | 89.80 | 69.71 | 86.08 | 41.30 | 32.55| 38.62 | 56.29 | 13.37 |
|
188 |
-
|---------------------------------|-------|-------|-------|-------|-------|----------|----------|------|-------|-------|-------|
|
189 |
| **Our Qwen Model (14b)** | 90.61 | 94.82 | 88.53 | **90.70** | 75.00 | 87.93 | **52.63** | 36.24 | 44.84 | 64.97 | **25.08** |
|
190 |
| **Our Phi Model (14b)** | **97.39** | 92.24 | 87.65 | 87.40 | **75.59** | **88.05** | 52.39 | **39.77** | **49.07** | **66.97** | 23.11 |
|
191 |
|
@@ -203,8 +202,7 @@ We evaluated our models on multiple well-known benchmarks to measure their effec
|
|
203 |
| Krutrim-2-12b-instruct | 56.83 | 70.66 | 78.86 | 64.10 | 46.51 | 63.39 |
|
204 |
| aya-expanse-8b | 57.42 | 72.90 | 80.42 | 69.00 | 43.39 | 64.63 |
|
205 |
| aya-expanse-32B | 73.29 | 85.48 | **87.73** | **79.70** | **56.96** | 76.63 |
|
206 |
-
|
207 |
-
| **Our Qwen Model (14b)** | 74.06 | 81.23 | 84.07 | 78.20 | 53.85 | **74.82** |
|
208 |
| **Our Phi Model (14b)** | **81.74** | **89.06** | 86.02 | 78.70 | 56.39 | **78.38** |
|
209 |
|
210 |
**Table 2: Metrics (.2f) of our models and other LLMs over several Hindi benchmarks**
|
@@ -248,3 +246,8 @@ It is advisable for users to:
|
|
248 |
- Continuously assess the model to ensure compliance with ethical standards.
|
249 |
- Be mindful of potential biases and unintended outputs, especially in critical applications.
|
250 |
|
|
|
|
|
|
|
|
|
|
|
|
162 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
163 |
|
164 |
<!-- The model is trained on publicly available data which was in part curated by Inception. -->
|
165 |
+
While efforts have been made to minimize biases, it is likely that the model, as with all LLM models, will exhibit some bias.
|
166 |
|
167 |
The model is trained as an AI assistant for Hindi and English speakers. The model is limited to produce responses for queries in these two languages
|
168 |
and may not produce appropriate responses to other language queries.
|
|
|
175 |
We evaluated our models on multiple well-known benchmarks to measure their effectiveness against other leading models, and the results are as follows:
|
176 |
|
177 |
|
178 |
+
| Model | ARC-C | ARC-E | BoolQ | CMCQ | MMLU | Average* | MMLU-Pro | GPQA | MuSR | BBH | MATH-Hard |
|
179 |
|---------------------------------|-------|-------|-------|-------|-------|----------|----------|------|-------|-------|-------|
|
180 |
| AryaBhatta-GemmaUltra-8.5B | 22.70 | 25.04 | 22.95 | 62.23 | 23.70 | 31.32 | 22.66 | 25.34| 42.72 | 41.12 | 2.95 |
|
181 |
| Airavata-7B | 25.09 | 30.47 | 25.31 | 62.17 | 33.20 | 35.25 | 16.35 | 27.43| 37.57 | 36.00 | 13.60 |
|
182 |
| sarvam-1-2B | 30.03 | 33.25 | 62.17 | 42.80 | 27.90 | 39.23 | - | - | - | - | - |
|
183 |
| Nemotron-4-Mini-Hindi-Instruct | 55.80 | 71.63 | 62.11 | 68.10 | 43.20 | 60.17 | 25.95 | 30.87| 41.53 | 40.11 | 2.04 |
|
184 |
+
| Llama-3-Nanda-10B-Chat | 65.36 | 80.64 | 82.29 | 67.60 | 50.61 | 69.30 | 31.57 | 30.12| 43.52 | 49.38 | 5.59 |
|
185 |
| Krutrim-2-12b-instruct | 67.32 | 81.10 | 84.74 | 76.30 | 56.10 | 73.11 | - | - | - | - | - |
|
186 |
| aya-expanse-8b | 74.06 | 87.08 | 86.45 | 83.30 | 56.89 | 77.56 | 30.04 | 30.29| 37.17 | 49.42 | 7.02 |
|
187 |
| aya-expanse-32B | 85.41 | **95.08** | **90.43** | 89.80 | 69.71 | 86.08 | 41.30 | 32.55| 38.62 | 56.29 | 13.37 |
|
|
|
188 |
| **Our Qwen Model (14b)** | 90.61 | 94.82 | 88.53 | **90.70** | 75.00 | 87.93 | **52.63** | 36.24 | 44.84 | 64.97 | **25.08** |
|
189 |
| **Our Phi Model (14b)** | **97.39** | 92.24 | 87.65 | 87.40 | **75.59** | **88.05** | 52.39 | **39.77** | **49.07** | **66.97** | 23.11 |
|
190 |
|
|
|
202 |
| Krutrim-2-12b-instruct | 56.83 | 70.66 | 78.86 | 64.10 | 46.51 | 63.39 |
|
203 |
| aya-expanse-8b | 57.42 | 72.90 | 80.42 | 69.00 | 43.39 | 64.63 |
|
204 |
| aya-expanse-32B | 73.29 | 85.48 | **87.73** | **79.70** | **56.96** | 76.63 |
|
205 |
+
| **Our Qwen Model (14b)** | 74.06 | 81.23 | 84.07 | 78.20 | 53.85 | 74.82 |
|
|
|
206 |
| **Our Phi Model (14b)** | **81.74** | **89.06** | 86.02 | 78.70 | 56.39 | **78.38** |
|
207 |
|
208 |
**Table 2: Metrics (.2f) of our models and other LLMs over several Hindi benchmarks**
|
|
|
246 |
- Continuously assess the model to ensure compliance with ethical standards.
|
247 |
- Be mindful of potential biases and unintended outputs, especially in critical applications.
|
248 |
|
249 |
+
### Team
|
250 |
+
|
251 |
+
- Ram Mohan Rao Kadiyala (@1024m)
|
252 |
+
- Siddartha Pullakhandam
|
253 |
+
-
|