leaderboard-pt-pr-bot commited on
Commit
c85a680
1 Parent(s): 549f0be

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co./spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co./spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +172 -6
README.md CHANGED
@@ -1,15 +1,162 @@
1
  ---
2
- library_name: transformers
3
- license: apache-2.0
4
- datasets:
5
- - rhaymison/mental-health-qa
6
  language:
7
  - pt
8
- pipeline_tag: text-generation
9
- base_model: rhaymison/Mistral-portuguese-luana-7b
10
  tags:
11
  - health
12
  - portuguese
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
  # Mistral-portuguese-luana-7b-mental-health
@@ -131,3 +278,22 @@ email: [email protected]
131
  <a href="https://github.com/rhaymisonbetini" target="_blank">
132
  <img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white">
133
  </a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
2
  language:
3
  - pt
4
+ license: apache-2.0
5
+ library_name: transformers
6
  tags:
7
  - health
8
  - portuguese
9
+ base_model: rhaymison/Mistral-portuguese-luana-7b
10
+ datasets:
11
+ - rhaymison/mental-health-qa
12
+ pipeline_tag: text-generation
13
+ model-index:
14
+ - name: Mistral-portuguese-luana-7b-mental-health
15
+ results:
16
+ - task:
17
+ type: text-generation
18
+ name: Text Generation
19
+ dataset:
20
+ name: ENEM Challenge (No Images)
21
+ type: eduagarcia/enem_challenge
22
+ split: train
23
+ args:
24
+ num_few_shot: 3
25
+ metrics:
26
+ - type: acc
27
+ value: 60.53
28
+ name: accuracy
29
+ source:
30
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-mental-health
31
+ name: Open Portuguese LLM Leaderboard
32
+ - task:
33
+ type: text-generation
34
+ name: Text Generation
35
+ dataset:
36
+ name: BLUEX (No Images)
37
+ type: eduagarcia-temp/BLUEX_without_images
38
+ split: train
39
+ args:
40
+ num_few_shot: 3
41
+ metrics:
42
+ - type: acc
43
+ value: 48.26
44
+ name: accuracy
45
+ source:
46
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-mental-health
47
+ name: Open Portuguese LLM Leaderboard
48
+ - task:
49
+ type: text-generation
50
+ name: Text Generation
51
+ dataset:
52
+ name: OAB Exams
53
+ type: eduagarcia/oab_exams
54
+ split: train
55
+ args:
56
+ num_few_shot: 3
57
+ metrics:
58
+ - type: acc
59
+ value: 38.04
60
+ name: accuracy
61
+ source:
62
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-mental-health
63
+ name: Open Portuguese LLM Leaderboard
64
+ - task:
65
+ type: text-generation
66
+ name: Text Generation
67
+ dataset:
68
+ name: Assin2 RTE
69
+ type: assin2
70
+ split: test
71
+ args:
72
+ num_few_shot: 15
73
+ metrics:
74
+ - type: f1_macro
75
+ value: 91.3
76
+ name: f1-macro
77
+ source:
78
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-mental-health
79
+ name: Open Portuguese LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: Assin2 STS
85
+ type: eduagarcia/portuguese_benchmark
86
+ split: test
87
+ args:
88
+ num_few_shot: 15
89
+ metrics:
90
+ - type: pearson
91
+ value: 74.98
92
+ name: pearson
93
+ source:
94
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-mental-health
95
+ name: Open Portuguese LLM Leaderboard
96
+ - task:
97
+ type: text-generation
98
+ name: Text Generation
99
+ dataset:
100
+ name: FaQuAD NLI
101
+ type: ruanchaves/faquad-nli
102
+ split: test
103
+ args:
104
+ num_few_shot: 15
105
+ metrics:
106
+ - type: f1_macro
107
+ value: 60.57
108
+ name: f1-macro
109
+ source:
110
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-mental-health
111
+ name: Open Portuguese LLM Leaderboard
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: HateBR Binary
117
+ type: ruanchaves/hatebr
118
+ split: test
119
+ args:
120
+ num_few_shot: 25
121
+ metrics:
122
+ - type: f1_macro
123
+ value: 76.86
124
+ name: f1-macro
125
+ source:
126
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-mental-health
127
+ name: Open Portuguese LLM Leaderboard
128
+ - task:
129
+ type: text-generation
130
+ name: Text Generation
131
+ dataset:
132
+ name: PT Hate Speech Binary
133
+ type: hate_speech_portuguese
134
+ split: test
135
+ args:
136
+ num_few_shot: 25
137
+ metrics:
138
+ - type: f1_macro
139
+ value: 70.05
140
+ name: f1-macro
141
+ source:
142
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-mental-health
143
+ name: Open Portuguese LLM Leaderboard
144
+ - task:
145
+ type: text-generation
146
+ name: Text Generation
147
+ dataset:
148
+ name: tweetSentBR
149
+ type: eduagarcia/tweetsentbr_fewshot
150
+ split: test
151
+ args:
152
+ num_few_shot: 25
153
+ metrics:
154
+ - type: f1_macro
155
+ value: 64.9
156
+ name: f1-macro
157
+ source:
158
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-portuguese-luana-7b-mental-health
159
+ name: Open Portuguese LLM Leaderboard
160
  ---
161
 
162
  # Mistral-portuguese-luana-7b-mental-health
 
278
  <a href="https://github.com/rhaymisonbetini" target="_blank">
279
  <img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white">
280
  </a>
281
+
282
+
283
+ # Open Portuguese LLM Leaderboard Evaluation Results
284
+
285
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/rhaymison/Mistral-portuguese-luana-7b-mental-health) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
286
+
287
+ | Metric | Value |
288
+ |--------------------------|---------|
289
+ |Average |**65.05**|
290
+ |ENEM Challenge (No Images)| 60.53|
291
+ |BLUEX (No Images) | 48.26|
292
+ |OAB Exams | 38.04|
293
+ |Assin2 RTE | 91.30|
294
+ |Assin2 STS | 74.98|
295
+ |FaQuAD NLI | 60.57|
296
+ |HateBR Binary | 76.86|
297
+ |PT Hate Speech Binary | 70.05|
298
+ |tweetSentBR | 64.90|
299
+