martinhillebrandtd commited on
Commit
ff33345
·
1 Parent(s): 6bb1189
README.md CHANGED
@@ -1,3 +1,2656 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - allenai/c4
4
+ inference: false
5
+ language: en
6
+ license: apache-2.0
7
+ model-index:
8
+ - name: jina-embedding-b-en-v2
9
+ results:
10
+ - dataset:
11
+ config: en
12
+ name: MTEB AmazonCounterfactualClassification (en)
13
+ revision: e8379541af4e31359cca9fbcf4b00f2671dba205
14
+ split: test
15
+ type: mteb/amazon_counterfactual
16
+ metrics:
17
+ - type: accuracy
18
+ value: 74.73134328358209
19
+ - type: ap
20
+ value: 37.765427081831035
21
+ - type: f1
22
+ value: 68.79367444339518
23
+ task:
24
+ type: Classification
25
+ - dataset:
26
+ config: default
27
+ name: MTEB AmazonPolarityClassification
28
+ revision: e2d317d38cd51312af73b3d32a06d1a08b442046
29
+ split: test
30
+ type: mteb/amazon_polarity
31
+ metrics:
32
+ - type: accuracy
33
+ value: 88.544275
34
+ - type: ap
35
+ value: 84.61328675662887
36
+ - type: f1
37
+ value: 88.51879035862375
38
+ task:
39
+ type: Classification
40
+ - dataset:
41
+ config: en
42
+ name: MTEB AmazonReviewsClassification (en)
43
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
44
+ split: test
45
+ type: mteb/amazon_reviews_multi
46
+ metrics:
47
+ - type: accuracy
48
+ value: 45.263999999999996
49
+ - type: f1
50
+ value: 43.778759656699435
51
+ task:
52
+ type: Classification
53
+ - dataset:
54
+ config: default
55
+ name: MTEB ArguAna
56
+ revision: None
57
+ split: test
58
+ type: arguana
59
+ metrics:
60
+ - type: map_at_1
61
+ value: 21.693
62
+ - type: map_at_10
63
+ value: 35.487
64
+ - type: map_at_100
65
+ value: 36.862
66
+ - type: map_at_1000
67
+ value: 36.872
68
+ - type: map_at_3
69
+ value: 30.049999999999997
70
+ - type: map_at_5
71
+ value: 32.966
72
+ - type: mrr_at_1
73
+ value: 21.977
74
+ - type: mrr_at_10
75
+ value: 35.565999999999995
76
+ - type: mrr_at_100
77
+ value: 36.948
78
+ - type: mrr_at_1000
79
+ value: 36.958
80
+ - type: mrr_at_3
81
+ value: 30.121
82
+ - type: mrr_at_5
83
+ value: 33.051
84
+ - type: ndcg_at_1
85
+ value: 21.693
86
+ - type: ndcg_at_10
87
+ value: 44.181
88
+ - type: ndcg_at_100
89
+ value: 49.982
90
+ - type: ndcg_at_1000
91
+ value: 50.233000000000004
92
+ - type: ndcg_at_3
93
+ value: 32.830999999999996
94
+ - type: ndcg_at_5
95
+ value: 38.080000000000005
96
+ - type: precision_at_1
97
+ value: 21.693
98
+ - type: precision_at_10
99
+ value: 7.248
100
+ - type: precision_at_100
101
+ value: 0.9769999999999999
102
+ - type: precision_at_1000
103
+ value: 0.1
104
+ - type: precision_at_3
105
+ value: 13.632
106
+ - type: precision_at_5
107
+ value: 10.725
108
+ - type: recall_at_1
109
+ value: 21.693
110
+ - type: recall_at_10
111
+ value: 72.475
112
+ - type: recall_at_100
113
+ value: 97.653
114
+ - type: recall_at_1000
115
+ value: 99.57300000000001
116
+ - type: recall_at_3
117
+ value: 40.896
118
+ - type: recall_at_5
119
+ value: 53.627
120
+ task:
121
+ type: Retrieval
122
+ - dataset:
123
+ config: default
124
+ name: MTEB ArxivClusteringP2P
125
+ revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
126
+ split: test
127
+ type: mteb/arxiv-clustering-p2p
128
+ metrics:
129
+ - type: v_measure
130
+ value: 45.39242428696777
131
+ task:
132
+ type: Clustering
133
+ - dataset:
134
+ config: default
135
+ name: MTEB ArxivClusteringS2S
136
+ revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
137
+ split: test
138
+ type: mteb/arxiv-clustering-s2s
139
+ metrics:
140
+ - type: v_measure
141
+ value: 36.675626784714
142
+ task:
143
+ type: Clustering
144
+ - dataset:
145
+ config: default
146
+ name: MTEB AskUbuntuDupQuestions
147
+ revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
148
+ split: test
149
+ type: mteb/askubuntudupquestions-reranking
150
+ metrics:
151
+ - type: map
152
+ value: 62.247725694904034
153
+ - type: mrr
154
+ value: 74.91359978894604
155
+ task:
156
+ type: Reranking
157
+ - dataset:
158
+ config: default
159
+ name: MTEB BIOSSES
160
+ revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
161
+ split: test
162
+ type: mteb/biosses-sts
163
+ metrics:
164
+ - type: cos_sim_pearson
165
+ value: 82.68003802970496
166
+ - type: cos_sim_spearman
167
+ value: 81.23438110096286
168
+ - type: euclidean_pearson
169
+ value: 81.87462986142582
170
+ - type: euclidean_spearman
171
+ value: 81.23438110096286
172
+ - type: manhattan_pearson
173
+ value: 81.61162566600755
174
+ - type: manhattan_spearman
175
+ value: 81.11329400456184
176
+ task:
177
+ type: STS
178
+ - dataset:
179
+ config: default
180
+ name: MTEB Banking77Classification
181
+ revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
182
+ split: test
183
+ type: mteb/banking77
184
+ metrics:
185
+ - type: accuracy
186
+ value: 84.01298701298701
187
+ - type: f1
188
+ value: 83.31690714969382
189
+ task:
190
+ type: Classification
191
+ - dataset:
192
+ config: default
193
+ name: MTEB BiorxivClusteringP2P
194
+ revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
195
+ split: test
196
+ type: mteb/biorxiv-clustering-p2p
197
+ metrics:
198
+ - type: v_measure
199
+ value: 37.050108150972086
200
+ task:
201
+ type: Clustering
202
+ - dataset:
203
+ config: default
204
+ name: MTEB BiorxivClusteringS2S
205
+ revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
206
+ split: test
207
+ type: mteb/biorxiv-clustering-s2s
208
+ metrics:
209
+ - type: v_measure
210
+ value: 30.15731442819715
211
+ task:
212
+ type: Clustering
213
+ - dataset:
214
+ config: default
215
+ name: MTEB CQADupstackAndroidRetrieval
216
+ revision: None
217
+ split: test
218
+ type: BeIR/cqadupstack
219
+ metrics:
220
+ - type: map_at_1
221
+ value: 31.391999999999996
222
+ - type: map_at_10
223
+ value: 42.597
224
+ - type: map_at_100
225
+ value: 44.07
226
+ - type: map_at_1000
227
+ value: 44.198
228
+ - type: map_at_3
229
+ value: 38.957
230
+ - type: map_at_5
231
+ value: 40.961
232
+ - type: mrr_at_1
233
+ value: 37.196
234
+ - type: mrr_at_10
235
+ value: 48.152
236
+ - type: mrr_at_100
237
+ value: 48.928
238
+ - type: mrr_at_1000
239
+ value: 48.964999999999996
240
+ - type: mrr_at_3
241
+ value: 45.446
242
+ - type: mrr_at_5
243
+ value: 47.205999999999996
244
+ - type: ndcg_at_1
245
+ value: 37.196
246
+ - type: ndcg_at_10
247
+ value: 49.089
248
+ - type: ndcg_at_100
249
+ value: 54.471000000000004
250
+ - type: ndcg_at_1000
251
+ value: 56.385
252
+ - type: ndcg_at_3
253
+ value: 43.699
254
+ - type: ndcg_at_5
255
+ value: 46.22
256
+ - type: precision_at_1
257
+ value: 37.196
258
+ - type: precision_at_10
259
+ value: 9.313
260
+ - type: precision_at_100
261
+ value: 1.478
262
+ - type: precision_at_1000
263
+ value: 0.198
264
+ - type: precision_at_3
265
+ value: 20.839
266
+ - type: precision_at_5
267
+ value: 14.936
268
+ - type: recall_at_1
269
+ value: 31.391999999999996
270
+ - type: recall_at_10
271
+ value: 61.876
272
+ - type: recall_at_100
273
+ value: 84.214
274
+ - type: recall_at_1000
275
+ value: 95.985
276
+ - type: recall_at_3
277
+ value: 46.6
278
+ - type: recall_at_5
279
+ value: 53.588
280
+ - type: map_at_1
281
+ value: 29.083
282
+ - type: map_at_10
283
+ value: 38.812999999999995
284
+ - type: map_at_100
285
+ value: 40.053
286
+ - type: map_at_1000
287
+ value: 40.188
288
+ - type: map_at_3
289
+ value: 36.111
290
+ - type: map_at_5
291
+ value: 37.519000000000005
292
+ - type: mrr_at_1
293
+ value: 36.497
294
+ - type: mrr_at_10
295
+ value: 44.85
296
+ - type: mrr_at_100
297
+ value: 45.546
298
+ - type: mrr_at_1000
299
+ value: 45.593
300
+ - type: mrr_at_3
301
+ value: 42.686
302
+ - type: mrr_at_5
303
+ value: 43.909
304
+ - type: ndcg_at_1
305
+ value: 36.497
306
+ - type: ndcg_at_10
307
+ value: 44.443
308
+ - type: ndcg_at_100
309
+ value: 48.979
310
+ - type: ndcg_at_1000
311
+ value: 51.154999999999994
312
+ - type: ndcg_at_3
313
+ value: 40.660000000000004
314
+ - type: ndcg_at_5
315
+ value: 42.193000000000005
316
+ - type: precision_at_1
317
+ value: 36.497
318
+ - type: precision_at_10
319
+ value: 8.433
320
+ - type: precision_at_100
321
+ value: 1.369
322
+ - type: precision_at_1000
323
+ value: 0.185
324
+ - type: precision_at_3
325
+ value: 19.894000000000002
326
+ - type: precision_at_5
327
+ value: 13.873
328
+ - type: recall_at_1
329
+ value: 29.083
330
+ - type: recall_at_10
331
+ value: 54.313
332
+ - type: recall_at_100
333
+ value: 73.792
334
+ - type: recall_at_1000
335
+ value: 87.629
336
+ - type: recall_at_3
337
+ value: 42.257
338
+ - type: recall_at_5
339
+ value: 47.066
340
+ - type: map_at_1
341
+ value: 38.556000000000004
342
+ - type: map_at_10
343
+ value: 50.698
344
+ - type: map_at_100
345
+ value: 51.705
346
+ - type: map_at_1000
347
+ value: 51.768
348
+ - type: map_at_3
349
+ value: 47.848
350
+ - type: map_at_5
351
+ value: 49.358000000000004
352
+ - type: mrr_at_1
353
+ value: 43.95
354
+ - type: mrr_at_10
355
+ value: 54.191
356
+ - type: mrr_at_100
357
+ value: 54.852999999999994
358
+ - type: mrr_at_1000
359
+ value: 54.885
360
+ - type: mrr_at_3
361
+ value: 51.954
362
+ - type: mrr_at_5
363
+ value: 53.13
364
+ - type: ndcg_at_1
365
+ value: 43.95
366
+ - type: ndcg_at_10
367
+ value: 56.516
368
+ - type: ndcg_at_100
369
+ value: 60.477000000000004
370
+ - type: ndcg_at_1000
371
+ value: 61.746
372
+ - type: ndcg_at_3
373
+ value: 51.601
374
+ - type: ndcg_at_5
375
+ value: 53.795
376
+ - type: precision_at_1
377
+ value: 43.95
378
+ - type: precision_at_10
379
+ value: 9.009
380
+ - type: precision_at_100
381
+ value: 1.189
382
+ - type: precision_at_1000
383
+ value: 0.135
384
+ - type: precision_at_3
385
+ value: 22.989
386
+ - type: precision_at_5
387
+ value: 15.473
388
+ - type: recall_at_1
389
+ value: 38.556000000000004
390
+ - type: recall_at_10
391
+ value: 70.159
392
+ - type: recall_at_100
393
+ value: 87.132
394
+ - type: recall_at_1000
395
+ value: 96.16
396
+ - type: recall_at_3
397
+ value: 56.906
398
+ - type: recall_at_5
399
+ value: 62.332
400
+ - type: map_at_1
401
+ value: 24.238
402
+ - type: map_at_10
403
+ value: 32.5
404
+ - type: map_at_100
405
+ value: 33.637
406
+ - type: map_at_1000
407
+ value: 33.719
408
+ - type: map_at_3
409
+ value: 30.026999999999997
410
+ - type: map_at_5
411
+ value: 31.555
412
+ - type: mrr_at_1
413
+ value: 26.328000000000003
414
+ - type: mrr_at_10
415
+ value: 34.44
416
+ - type: mrr_at_100
417
+ value: 35.455999999999996
418
+ - type: mrr_at_1000
419
+ value: 35.521
420
+ - type: mrr_at_3
421
+ value: 32.034
422
+ - type: mrr_at_5
423
+ value: 33.565
424
+ - type: ndcg_at_1
425
+ value: 26.328000000000003
426
+ - type: ndcg_at_10
427
+ value: 37.202
428
+ - type: ndcg_at_100
429
+ value: 42.728
430
+ - type: ndcg_at_1000
431
+ value: 44.792
432
+ - type: ndcg_at_3
433
+ value: 32.368
434
+ - type: ndcg_at_5
435
+ value: 35.008
436
+ - type: precision_at_1
437
+ value: 26.328000000000003
438
+ - type: precision_at_10
439
+ value: 5.7059999999999995
440
+ - type: precision_at_100
441
+ value: 0.8880000000000001
442
+ - type: precision_at_1000
443
+ value: 0.11100000000000002
444
+ - type: precision_at_3
445
+ value: 13.672
446
+ - type: precision_at_5
447
+ value: 9.74
448
+ - type: recall_at_1
449
+ value: 24.238
450
+ - type: recall_at_10
451
+ value: 49.829
452
+ - type: recall_at_100
453
+ value: 75.21
454
+ - type: recall_at_1000
455
+ value: 90.521
456
+ - type: recall_at_3
457
+ value: 36.867
458
+ - type: recall_at_5
459
+ value: 43.241
460
+ - type: map_at_1
461
+ value: 15.378
462
+ - type: map_at_10
463
+ value: 22.817999999999998
464
+ - type: map_at_100
465
+ value: 23.977999999999998
466
+ - type: map_at_1000
467
+ value: 24.108
468
+ - type: map_at_3
469
+ value: 20.719
470
+ - type: map_at_5
471
+ value: 21.889
472
+ - type: mrr_at_1
473
+ value: 19.03
474
+ - type: mrr_at_10
475
+ value: 27.022000000000002
476
+ - type: mrr_at_100
477
+ value: 28.011999999999997
478
+ - type: mrr_at_1000
479
+ value: 28.096
480
+ - type: mrr_at_3
481
+ value: 24.855
482
+ - type: mrr_at_5
483
+ value: 26.029999999999998
484
+ - type: ndcg_at_1
485
+ value: 19.03
486
+ - type: ndcg_at_10
487
+ value: 27.526
488
+ - type: ndcg_at_100
489
+ value: 33.040000000000006
490
+ - type: ndcg_at_1000
491
+ value: 36.187000000000005
492
+ - type: ndcg_at_3
493
+ value: 23.497
494
+ - type: ndcg_at_5
495
+ value: 25.334
496
+ - type: precision_at_1
497
+ value: 19.03
498
+ - type: precision_at_10
499
+ value: 4.963
500
+ - type: precision_at_100
501
+ value: 0.893
502
+ - type: precision_at_1000
503
+ value: 0.13
504
+ - type: precision_at_3
505
+ value: 11.360000000000001
506
+ - type: precision_at_5
507
+ value: 8.134
508
+ - type: recall_at_1
509
+ value: 15.378
510
+ - type: recall_at_10
511
+ value: 38.061
512
+ - type: recall_at_100
513
+ value: 61.754
514
+ - type: recall_at_1000
515
+ value: 84.259
516
+ - type: recall_at_3
517
+ value: 26.788
518
+ - type: recall_at_5
519
+ value: 31.326999999999998
520
+ - type: map_at_1
521
+ value: 27.511999999999997
522
+ - type: map_at_10
523
+ value: 37.429
524
+ - type: map_at_100
525
+ value: 38.818000000000005
526
+ - type: map_at_1000
527
+ value: 38.924
528
+ - type: map_at_3
529
+ value: 34.625
530
+ - type: map_at_5
531
+ value: 36.064
532
+ - type: mrr_at_1
533
+ value: 33.300999999999995
534
+ - type: mrr_at_10
535
+ value: 43.036
536
+ - type: mrr_at_100
537
+ value: 43.894
538
+ - type: mrr_at_1000
539
+ value: 43.936
540
+ - type: mrr_at_3
541
+ value: 40.825
542
+ - type: mrr_at_5
543
+ value: 42.028
544
+ - type: ndcg_at_1
545
+ value: 33.300999999999995
546
+ - type: ndcg_at_10
547
+ value: 43.229
548
+ - type: ndcg_at_100
549
+ value: 48.992000000000004
550
+ - type: ndcg_at_1000
551
+ value: 51.02100000000001
552
+ - type: ndcg_at_3
553
+ value: 38.794000000000004
554
+ - type: ndcg_at_5
555
+ value: 40.65
556
+ - type: precision_at_1
557
+ value: 33.300999999999995
558
+ - type: precision_at_10
559
+ value: 7.777000000000001
560
+ - type: precision_at_100
561
+ value: 1.269
562
+ - type: precision_at_1000
563
+ value: 0.163
564
+ - type: precision_at_3
565
+ value: 18.351
566
+ - type: precision_at_5
567
+ value: 12.762
568
+ - type: recall_at_1
569
+ value: 27.511999999999997
570
+ - type: recall_at_10
571
+ value: 54.788000000000004
572
+ - type: recall_at_100
573
+ value: 79.105
574
+ - type: recall_at_1000
575
+ value: 92.49199999999999
576
+ - type: recall_at_3
577
+ value: 41.924
578
+ - type: recall_at_5
579
+ value: 47.026
580
+ - type: map_at_1
581
+ value: 24.117
582
+ - type: map_at_10
583
+ value: 33.32
584
+ - type: map_at_100
585
+ value: 34.677
586
+ - type: map_at_1000
587
+ value: 34.78
588
+ - type: map_at_3
589
+ value: 30.233999999999998
590
+ - type: map_at_5
591
+ value: 31.668000000000003
592
+ - type: mrr_at_1
593
+ value: 29.566
594
+ - type: mrr_at_10
595
+ value: 38.244
596
+ - type: mrr_at_100
597
+ value: 39.245000000000005
598
+ - type: mrr_at_1000
599
+ value: 39.296
600
+ - type: mrr_at_3
601
+ value: 35.864000000000004
602
+ - type: mrr_at_5
603
+ value: 36.919999999999995
604
+ - type: ndcg_at_1
605
+ value: 29.566
606
+ - type: ndcg_at_10
607
+ value: 39.127
608
+ - type: ndcg_at_100
609
+ value: 44.989000000000004
610
+ - type: ndcg_at_1000
611
+ value: 47.189
612
+ - type: ndcg_at_3
613
+ value: 34.039
614
+ - type: ndcg_at_5
615
+ value: 35.744
616
+ - type: precision_at_1
617
+ value: 29.566
618
+ - type: precision_at_10
619
+ value: 7.385999999999999
620
+ - type: precision_at_100
621
+ value: 1.204
622
+ - type: precision_at_1000
623
+ value: 0.158
624
+ - type: precision_at_3
625
+ value: 16.286
626
+ - type: precision_at_5
627
+ value: 11.484
628
+ - type: recall_at_1
629
+ value: 24.117
630
+ - type: recall_at_10
631
+ value: 51.559999999999995
632
+ - type: recall_at_100
633
+ value: 77.104
634
+ - type: recall_at_1000
635
+ value: 91.79899999999999
636
+ - type: recall_at_3
637
+ value: 36.82
638
+ - type: recall_at_5
639
+ value: 41.453
640
+ - type: map_at_1
641
+ value: 25.17625
642
+ - type: map_at_10
643
+ value: 34.063916666666664
644
+ - type: map_at_100
645
+ value: 35.255500000000005
646
+ - type: map_at_1000
647
+ value: 35.37275
648
+ - type: map_at_3
649
+ value: 31.351666666666667
650
+ - type: map_at_5
651
+ value: 32.80608333333333
652
+ - type: mrr_at_1
653
+ value: 29.59783333333333
654
+ - type: mrr_at_10
655
+ value: 38.0925
656
+ - type: mrr_at_100
657
+ value: 38.957249999999995
658
+ - type: mrr_at_1000
659
+ value: 39.01608333333333
660
+ - type: mrr_at_3
661
+ value: 35.77625
662
+ - type: mrr_at_5
663
+ value: 37.04991666666667
664
+ - type: ndcg_at_1
665
+ value: 29.59783333333333
666
+ - type: ndcg_at_10
667
+ value: 39.343666666666664
668
+ - type: ndcg_at_100
669
+ value: 44.488249999999994
670
+ - type: ndcg_at_1000
671
+ value: 46.83358333333334
672
+ - type: ndcg_at_3
673
+ value: 34.69708333333333
674
+ - type: ndcg_at_5
675
+ value: 36.75075
676
+ - type: precision_at_1
677
+ value: 29.59783333333333
678
+ - type: precision_at_10
679
+ value: 6.884083333333332
680
+ - type: precision_at_100
681
+ value: 1.114
682
+ - type: precision_at_1000
683
+ value: 0.15108333333333332
684
+ - type: precision_at_3
685
+ value: 15.965250000000003
686
+ - type: precision_at_5
687
+ value: 11.246500000000001
688
+ - type: recall_at_1
689
+ value: 25.17625
690
+ - type: recall_at_10
691
+ value: 51.015999999999984
692
+ - type: recall_at_100
693
+ value: 73.60174999999998
694
+ - type: recall_at_1000
695
+ value: 89.849
696
+ - type: recall_at_3
697
+ value: 37.88399999999999
698
+ - type: recall_at_5
699
+ value: 43.24541666666666
700
+ - type: map_at_1
701
+ value: 24.537
702
+ - type: map_at_10
703
+ value: 31.081999999999997
704
+ - type: map_at_100
705
+ value: 32.042
706
+ - type: map_at_1000
707
+ value: 32.141
708
+ - type: map_at_3
709
+ value: 29.137
710
+ - type: map_at_5
711
+ value: 30.079
712
+ - type: mrr_at_1
713
+ value: 27.454
714
+ - type: mrr_at_10
715
+ value: 33.694
716
+ - type: mrr_at_100
717
+ value: 34.579
718
+ - type: mrr_at_1000
719
+ value: 34.649
720
+ - type: mrr_at_3
721
+ value: 32.004
722
+ - type: mrr_at_5
723
+ value: 32.794000000000004
724
+ - type: ndcg_at_1
725
+ value: 27.454
726
+ - type: ndcg_at_10
727
+ value: 34.915
728
+ - type: ndcg_at_100
729
+ value: 39.641
730
+ - type: ndcg_at_1000
731
+ value: 42.105
732
+ - type: ndcg_at_3
733
+ value: 31.276
734
+ - type: ndcg_at_5
735
+ value: 32.65
736
+ - type: precision_at_1
737
+ value: 27.454
738
+ - type: precision_at_10
739
+ value: 5.337
740
+ - type: precision_at_100
741
+ value: 0.8250000000000001
742
+ - type: precision_at_1000
743
+ value: 0.11199999999999999
744
+ - type: precision_at_3
745
+ value: 13.241
746
+ - type: precision_at_5
747
+ value: 8.895999999999999
748
+ - type: recall_at_1
749
+ value: 24.537
750
+ - type: recall_at_10
751
+ value: 44.324999999999996
752
+ - type: recall_at_100
753
+ value: 65.949
754
+ - type: recall_at_1000
755
+ value: 84.017
756
+ - type: recall_at_3
757
+ value: 33.857
758
+ - type: recall_at_5
759
+ value: 37.316
760
+ - type: map_at_1
761
+ value: 17.122
762
+ - type: map_at_10
763
+ value: 24.32
764
+ - type: map_at_100
765
+ value: 25.338
766
+ - type: map_at_1000
767
+ value: 25.462
768
+ - type: map_at_3
769
+ value: 22.064
770
+ - type: map_at_5
771
+ value: 23.322000000000003
772
+ - type: mrr_at_1
773
+ value: 20.647
774
+ - type: mrr_at_10
775
+ value: 27.858
776
+ - type: mrr_at_100
777
+ value: 28.743999999999996
778
+ - type: mrr_at_1000
779
+ value: 28.819
780
+ - type: mrr_at_3
781
+ value: 25.769
782
+ - type: mrr_at_5
783
+ value: 26.964
784
+ - type: ndcg_at_1
785
+ value: 20.647
786
+ - type: ndcg_at_10
787
+ value: 28.849999999999998
788
+ - type: ndcg_at_100
789
+ value: 33.849000000000004
790
+ - type: ndcg_at_1000
791
+ value: 36.802
792
+ - type: ndcg_at_3
793
+ value: 24.799
794
+ - type: ndcg_at_5
795
+ value: 26.682
796
+ - type: precision_at_1
797
+ value: 20.647
798
+ - type: precision_at_10
799
+ value: 5.2170000000000005
800
+ - type: precision_at_100
801
+ value: 0.906
802
+ - type: precision_at_1000
803
+ value: 0.134
804
+ - type: precision_at_3
805
+ value: 11.769
806
+ - type: precision_at_5
807
+ value: 8.486
808
+ - type: recall_at_1
809
+ value: 17.122
810
+ - type: recall_at_10
811
+ value: 38.999
812
+ - type: recall_at_100
813
+ value: 61.467000000000006
814
+ - type: recall_at_1000
815
+ value: 82.716
816
+ - type: recall_at_3
817
+ value: 27.601
818
+ - type: recall_at_5
819
+ value: 32.471
820
+ - type: map_at_1
821
+ value: 24.396
822
+ - type: map_at_10
823
+ value: 33.415
824
+ - type: map_at_100
825
+ value: 34.521
826
+ - type: map_at_1000
827
+ value: 34.631
828
+ - type: map_at_3
829
+ value: 30.703999999999997
830
+ - type: map_at_5
831
+ value: 32.166
832
+ - type: mrr_at_1
833
+ value: 28.825
834
+ - type: mrr_at_10
835
+ value: 37.397000000000006
836
+ - type: mrr_at_100
837
+ value: 38.286
838
+ - type: mrr_at_1000
839
+ value: 38.346000000000004
840
+ - type: mrr_at_3
841
+ value: 35.028
842
+ - type: mrr_at_5
843
+ value: 36.32
844
+ - type: ndcg_at_1
845
+ value: 28.825
846
+ - type: ndcg_at_10
847
+ value: 38.656
848
+ - type: ndcg_at_100
849
+ value: 43.856
850
+ - type: ndcg_at_1000
851
+ value: 46.31
852
+ - type: ndcg_at_3
853
+ value: 33.793
854
+ - type: ndcg_at_5
855
+ value: 35.909
856
+ - type: precision_at_1
857
+ value: 28.825
858
+ - type: precision_at_10
859
+ value: 6.567
860
+ - type: precision_at_100
861
+ value: 1.0330000000000001
862
+ - type: precision_at_1000
863
+ value: 0.135
864
+ - type: precision_at_3
865
+ value: 15.516
866
+ - type: precision_at_5
867
+ value: 10.914
868
+ - type: recall_at_1
869
+ value: 24.396
870
+ - type: recall_at_10
871
+ value: 50.747
872
+ - type: recall_at_100
873
+ value: 73.477
874
+ - type: recall_at_1000
875
+ value: 90.801
876
+ - type: recall_at_3
877
+ value: 37.1
878
+ - type: recall_at_5
879
+ value: 42.589
880
+ - type: map_at_1
881
+ value: 25.072
882
+ - type: map_at_10
883
+ value: 34.307
884
+ - type: map_at_100
885
+ value: 35.725
886
+ - type: map_at_1000
887
+ value: 35.943999999999996
888
+ - type: map_at_3
889
+ value: 30.906
890
+ - type: map_at_5
891
+ value: 32.818000000000005
892
+ - type: mrr_at_1
893
+ value: 29.644
894
+ - type: mrr_at_10
895
+ value: 38.673
896
+ - type: mrr_at_100
897
+ value: 39.459
898
+ - type: mrr_at_1000
899
+ value: 39.527
900
+ - type: mrr_at_3
901
+ value: 35.771
902
+ - type: mrr_at_5
903
+ value: 37.332
904
+ - type: ndcg_at_1
905
+ value: 29.644
906
+ - type: ndcg_at_10
907
+ value: 40.548
908
+ - type: ndcg_at_100
909
+ value: 45.678999999999995
910
+ - type: ndcg_at_1000
911
+ value: 48.488
912
+ - type: ndcg_at_3
913
+ value: 34.887
914
+ - type: ndcg_at_5
915
+ value: 37.543
916
+ - type: precision_at_1
917
+ value: 29.644
918
+ - type: precision_at_10
919
+ value: 7.688000000000001
920
+ - type: precision_at_100
921
+ value: 1.482
922
+ - type: precision_at_1000
923
+ value: 0.23600000000000002
924
+ - type: precision_at_3
925
+ value: 16.206
926
+ - type: precision_at_5
927
+ value: 12.016
928
+ - type: recall_at_1
929
+ value: 25.072
930
+ - type: recall_at_10
931
+ value: 53.478
932
+ - type: recall_at_100
933
+ value: 76.07300000000001
934
+ - type: recall_at_1000
935
+ value: 93.884
936
+ - type: recall_at_3
937
+ value: 37.583
938
+ - type: recall_at_5
939
+ value: 44.464
940
+ - type: map_at_1
941
+ value: 20.712
942
+ - type: map_at_10
943
+ value: 27.467999999999996
944
+ - type: map_at_100
945
+ value: 28.502
946
+ - type: map_at_1000
947
+ value: 28.610000000000003
948
+ - type: map_at_3
949
+ value: 24.887999999999998
950
+ - type: map_at_5
951
+ value: 26.273999999999997
952
+ - type: mrr_at_1
953
+ value: 22.736
954
+ - type: mrr_at_10
955
+ value: 29.553
956
+ - type: mrr_at_100
957
+ value: 30.485
958
+ - type: mrr_at_1000
959
+ value: 30.56
960
+ - type: mrr_at_3
961
+ value: 27.078999999999997
962
+ - type: mrr_at_5
963
+ value: 28.401
964
+ - type: ndcg_at_1
965
+ value: 22.736
966
+ - type: ndcg_at_10
967
+ value: 32.023
968
+ - type: ndcg_at_100
969
+ value: 37.158
970
+ - type: ndcg_at_1000
971
+ value: 39.823
972
+ - type: ndcg_at_3
973
+ value: 26.951999999999998
974
+ - type: ndcg_at_5
975
+ value: 29.281000000000002
976
+ - type: precision_at_1
977
+ value: 22.736
978
+ - type: precision_at_10
979
+ value: 5.213
980
+ - type: precision_at_100
981
+ value: 0.832
982
+ - type: precision_at_1000
983
+ value: 0.116
984
+ - type: precision_at_3
985
+ value: 11.459999999999999
986
+ - type: precision_at_5
987
+ value: 8.244
988
+ - type: recall_at_1
989
+ value: 20.712
990
+ - type: recall_at_10
991
+ value: 44.057
992
+ - type: recall_at_100
993
+ value: 67.944
994
+ - type: recall_at_1000
995
+ value: 87.925
996
+ - type: recall_at_3
997
+ value: 30.305
998
+ - type: recall_at_5
999
+ value: 36.071999999999996
1000
+ task:
1001
+ type: Retrieval
1002
+ - dataset:
1003
+ config: default
1004
+ name: MTEB ClimateFEVER
1005
+ revision: None
1006
+ split: test
1007
+ type: climate-fever
1008
+ metrics:
1009
+ - type: map_at_1
1010
+ value: 10.181999999999999
1011
+ - type: map_at_10
1012
+ value: 16.66
1013
+ - type: map_at_100
1014
+ value: 18.273
1015
+ - type: map_at_1000
1016
+ value: 18.45
1017
+ - type: map_at_3
1018
+ value: 14.141
1019
+ - type: map_at_5
1020
+ value: 15.455
1021
+ - type: mrr_at_1
1022
+ value: 22.15
1023
+ - type: mrr_at_10
1024
+ value: 32.062000000000005
1025
+ - type: mrr_at_100
1026
+ value: 33.116
1027
+ - type: mrr_at_1000
1028
+ value: 33.168
1029
+ - type: mrr_at_3
1030
+ value: 28.827
1031
+ - type: mrr_at_5
1032
+ value: 30.892999999999997
1033
+ - type: ndcg_at_1
1034
+ value: 22.15
1035
+ - type: ndcg_at_10
1036
+ value: 23.532
1037
+ - type: ndcg_at_100
1038
+ value: 30.358
1039
+ - type: ndcg_at_1000
1040
+ value: 33.783
1041
+ - type: ndcg_at_3
1042
+ value: 19.222
1043
+ - type: ndcg_at_5
1044
+ value: 20.919999999999998
1045
+ - type: precision_at_1
1046
+ value: 22.15
1047
+ - type: precision_at_10
1048
+ value: 7.185999999999999
1049
+ - type: precision_at_100
1050
+ value: 1.433
1051
+ - type: precision_at_1000
1052
+ value: 0.207
1053
+ - type: precision_at_3
1054
+ value: 13.941
1055
+ - type: precision_at_5
1056
+ value: 10.906
1057
+ - type: recall_at_1
1058
+ value: 10.181999999999999
1059
+ - type: recall_at_10
1060
+ value: 28.104000000000003
1061
+ - type: recall_at_100
1062
+ value: 51.998999999999995
1063
+ - type: recall_at_1000
1064
+ value: 71.311
1065
+ - type: recall_at_3
1066
+ value: 17.698
1067
+ - type: recall_at_5
1068
+ value: 22.262999999999998
1069
+ task:
1070
+ type: Retrieval
1071
+ - dataset:
1072
+ config: default
1073
+ name: MTEB DBPedia
1074
+ revision: None
1075
+ split: test
1076
+ type: dbpedia-entity
1077
+ metrics:
1078
+ - type: map_at_1
1079
+ value: 6.669
1080
+ - type: map_at_10
1081
+ value: 15.552
1082
+ - type: map_at_100
1083
+ value: 21.865000000000002
1084
+ - type: map_at_1000
1085
+ value: 23.268
1086
+ - type: map_at_3
1087
+ value: 11.309
1088
+ - type: map_at_5
1089
+ value: 13.084000000000001
1090
+ - type: mrr_at_1
1091
+ value: 55.50000000000001
1092
+ - type: mrr_at_10
1093
+ value: 66.46600000000001
1094
+ - type: mrr_at_100
1095
+ value: 66.944
1096
+ - type: mrr_at_1000
1097
+ value: 66.956
1098
+ - type: mrr_at_3
1099
+ value: 64.542
1100
+ - type: mrr_at_5
1101
+ value: 65.717
1102
+ - type: ndcg_at_1
1103
+ value: 44.75
1104
+ - type: ndcg_at_10
1105
+ value: 35.049
1106
+ - type: ndcg_at_100
1107
+ value: 39.073
1108
+ - type: ndcg_at_1000
1109
+ value: 46.208
1110
+ - type: ndcg_at_3
1111
+ value: 39.525
1112
+ - type: ndcg_at_5
1113
+ value: 37.156
1114
+ - type: precision_at_1
1115
+ value: 55.50000000000001
1116
+ - type: precision_at_10
1117
+ value: 27.800000000000004
1118
+ - type: precision_at_100
1119
+ value: 9.013
1120
+ - type: precision_at_1000
1121
+ value: 1.8800000000000001
1122
+ - type: precision_at_3
1123
+ value: 42.667
1124
+ - type: precision_at_5
1125
+ value: 36.0
1126
+ - type: recall_at_1
1127
+ value: 6.669
1128
+ - type: recall_at_10
1129
+ value: 21.811
1130
+ - type: recall_at_100
1131
+ value: 45.112
1132
+ - type: recall_at_1000
1133
+ value: 67.806
1134
+ - type: recall_at_3
1135
+ value: 13.373
1136
+ - type: recall_at_5
1137
+ value: 16.615
1138
+ task:
1139
+ type: Retrieval
1140
+ - dataset:
1141
+ config: default
1142
+ name: MTEB EmotionClassification
1143
+ revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
1144
+ split: test
1145
+ type: mteb/emotion
1146
+ metrics:
1147
+ - type: accuracy
1148
+ value: 48.769999999999996
1149
+ - type: f1
1150
+ value: 42.91448356376592
1151
+ task:
1152
+ type: Classification
1153
+ - dataset:
1154
+ config: default
1155
+ name: MTEB FEVER
1156
+ revision: None
1157
+ split: test
1158
+ type: fever
1159
+ metrics:
1160
+ - type: map_at_1
1161
+ value: 54.013
1162
+ - type: map_at_10
1163
+ value: 66.239
1164
+ - type: map_at_100
1165
+ value: 66.62599999999999
1166
+ - type: map_at_1000
1167
+ value: 66.644
1168
+ - type: map_at_3
1169
+ value: 63.965
1170
+ - type: map_at_5
1171
+ value: 65.45400000000001
1172
+ - type: mrr_at_1
1173
+ value: 58.221000000000004
1174
+ - type: mrr_at_10
1175
+ value: 70.43700000000001
1176
+ - type: mrr_at_100
1177
+ value: 70.744
1178
+ - type: mrr_at_1000
1179
+ value: 70.75099999999999
1180
+ - type: mrr_at_3
1181
+ value: 68.284
1182
+ - type: mrr_at_5
1183
+ value: 69.721
1184
+ - type: ndcg_at_1
1185
+ value: 58.221000000000004
1186
+ - type: ndcg_at_10
1187
+ value: 72.327
1188
+ - type: ndcg_at_100
1189
+ value: 73.953
1190
+ - type: ndcg_at_1000
1191
+ value: 74.312
1192
+ - type: ndcg_at_3
1193
+ value: 68.062
1194
+ - type: ndcg_at_5
1195
+ value: 70.56400000000001
1196
+ - type: precision_at_1
1197
+ value: 58.221000000000004
1198
+ - type: precision_at_10
1199
+ value: 9.521
1200
+ - type: precision_at_100
1201
+ value: 1.045
1202
+ - type: precision_at_1000
1203
+ value: 0.109
1204
+ - type: precision_at_3
1205
+ value: 27.348
1206
+ - type: precision_at_5
1207
+ value: 17.794999999999998
1208
+ - type: recall_at_1
1209
+ value: 54.013
1210
+ - type: recall_at_10
1211
+ value: 86.957
1212
+ - type: recall_at_100
1213
+ value: 93.911
1214
+ - type: recall_at_1000
1215
+ value: 96.38
1216
+ - type: recall_at_3
1217
+ value: 75.555
1218
+ - type: recall_at_5
1219
+ value: 81.671
1220
+ task:
1221
+ type: Retrieval
1222
+ - dataset:
1223
+ config: default
1224
+ name: MTEB FiQA2018
1225
+ revision: None
1226
+ split: test
1227
+ type: fiqa
1228
+ metrics:
1229
+ - type: map_at_1
1230
+ value: 21.254
1231
+ - type: map_at_10
1232
+ value: 33.723
1233
+ - type: map_at_100
1234
+ value: 35.574
1235
+ - type: map_at_1000
1236
+ value: 35.730000000000004
1237
+ - type: map_at_3
1238
+ value: 29.473
1239
+ - type: map_at_5
1240
+ value: 31.543
1241
+ - type: mrr_at_1
1242
+ value: 41.358
1243
+ - type: mrr_at_10
1244
+ value: 49.498
1245
+ - type: mrr_at_100
1246
+ value: 50.275999999999996
1247
+ - type: mrr_at_1000
1248
+ value: 50.308
1249
+ - type: mrr_at_3
1250
+ value: 47.016000000000005
1251
+ - type: mrr_at_5
1252
+ value: 48.336
1253
+ - type: ndcg_at_1
1254
+ value: 41.358
1255
+ - type: ndcg_at_10
1256
+ value: 41.579
1257
+ - type: ndcg_at_100
1258
+ value: 48.455
1259
+ - type: ndcg_at_1000
1260
+ value: 51.165000000000006
1261
+ - type: ndcg_at_3
1262
+ value: 37.681
1263
+ - type: ndcg_at_5
1264
+ value: 38.49
1265
+ - type: precision_at_1
1266
+ value: 41.358
1267
+ - type: precision_at_10
1268
+ value: 11.543000000000001
1269
+ - type: precision_at_100
1270
+ value: 1.87
1271
+ - type: precision_at_1000
1272
+ value: 0.23600000000000002
1273
+ - type: precision_at_3
1274
+ value: 24.743000000000002
1275
+ - type: precision_at_5
1276
+ value: 17.994
1277
+ - type: recall_at_1
1278
+ value: 21.254
1279
+ - type: recall_at_10
1280
+ value: 48.698
1281
+ - type: recall_at_100
1282
+ value: 74.588
1283
+ - type: recall_at_1000
1284
+ value: 91.00200000000001
1285
+ - type: recall_at_3
1286
+ value: 33.939
1287
+ - type: recall_at_5
1288
+ value: 39.367000000000004
1289
+ task:
1290
+ type: Retrieval
1291
+ - dataset:
1292
+ config: default
1293
+ name: MTEB HotpotQA
1294
+ revision: None
1295
+ split: test
1296
+ type: hotpotqa
1297
+ metrics:
1298
+ - type: map_at_1
1299
+ value: 35.922
1300
+ - type: map_at_10
1301
+ value: 52.32599999999999
1302
+ - type: map_at_100
1303
+ value: 53.18000000000001
1304
+ - type: map_at_1000
1305
+ value: 53.245
1306
+ - type: map_at_3
1307
+ value: 49.294
1308
+ - type: map_at_5
1309
+ value: 51.202999999999996
1310
+ - type: mrr_at_1
1311
+ value: 71.843
1312
+ - type: mrr_at_10
1313
+ value: 78.24600000000001
1314
+ - type: mrr_at_100
1315
+ value: 78.515
1316
+ - type: mrr_at_1000
1317
+ value: 78.527
1318
+ - type: mrr_at_3
1319
+ value: 77.17500000000001
1320
+ - type: mrr_at_5
1321
+ value: 77.852
1322
+ - type: ndcg_at_1
1323
+ value: 71.843
1324
+ - type: ndcg_at_10
1325
+ value: 61.379
1326
+ - type: ndcg_at_100
1327
+ value: 64.535
1328
+ - type: ndcg_at_1000
1329
+ value: 65.888
1330
+ - type: ndcg_at_3
1331
+ value: 56.958
1332
+ - type: ndcg_at_5
1333
+ value: 59.434
1334
+ - type: precision_at_1
1335
+ value: 71.843
1336
+ - type: precision_at_10
1337
+ value: 12.686
1338
+ - type: precision_at_100
1339
+ value: 1.517
1340
+ - type: precision_at_1000
1341
+ value: 0.16999999999999998
1342
+ - type: precision_at_3
1343
+ value: 35.778
1344
+ - type: precision_at_5
1345
+ value: 23.422
1346
+ - type: recall_at_1
1347
+ value: 35.922
1348
+ - type: recall_at_10
1349
+ value: 63.43
1350
+ - type: recall_at_100
1351
+ value: 75.868
1352
+ - type: recall_at_1000
1353
+ value: 84.88900000000001
1354
+ - type: recall_at_3
1355
+ value: 53.666000000000004
1356
+ - type: recall_at_5
1357
+ value: 58.555
1358
+ task:
1359
+ type: Retrieval
1360
+ - dataset:
1361
+ config: default
1362
+ name: MTEB ImdbClassification
1363
+ revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
1364
+ split: test
1365
+ type: mteb/imdb
1366
+ metrics:
1367
+ - type: accuracy
1368
+ value: 79.4408
1369
+ - type: ap
1370
+ value: 73.52820871620366
1371
+ - type: f1
1372
+ value: 79.36240238685001
1373
+ task:
1374
+ type: Classification
1375
+ - dataset:
1376
+ config: default
1377
+ name: MTEB MSMARCO
1378
+ revision: None
1379
+ split: dev
1380
+ type: msmarco
1381
+ metrics:
1382
+ - type: map_at_1
1383
+ value: 21.826999999999998
1384
+ - type: map_at_10
1385
+ value: 34.04
1386
+ - type: map_at_100
1387
+ value: 35.226
1388
+ - type: map_at_1000
1389
+ value: 35.275
1390
+ - type: map_at_3
1391
+ value: 30.165999999999997
1392
+ - type: map_at_5
1393
+ value: 32.318000000000005
1394
+ - type: mrr_at_1
1395
+ value: 22.464000000000002
1396
+ - type: mrr_at_10
1397
+ value: 34.631
1398
+ - type: mrr_at_100
1399
+ value: 35.752
1400
+ - type: mrr_at_1000
1401
+ value: 35.795
1402
+ - type: mrr_at_3
1403
+ value: 30.798
1404
+ - type: mrr_at_5
1405
+ value: 32.946999999999996
1406
+ - type: ndcg_at_1
1407
+ value: 22.464000000000002
1408
+ - type: ndcg_at_10
1409
+ value: 40.919
1410
+ - type: ndcg_at_100
1411
+ value: 46.632
1412
+ - type: ndcg_at_1000
1413
+ value: 47.833
1414
+ - type: ndcg_at_3
1415
+ value: 32.992
1416
+ - type: ndcg_at_5
1417
+ value: 36.834
1418
+ - type: precision_at_1
1419
+ value: 22.464000000000002
1420
+ - type: precision_at_10
1421
+ value: 6.494
1422
+ - type: precision_at_100
1423
+ value: 0.9369999999999999
1424
+ - type: precision_at_1000
1425
+ value: 0.104
1426
+ - type: precision_at_3
1427
+ value: 14.021
1428
+ - type: precision_at_5
1429
+ value: 10.347000000000001
1430
+ - type: recall_at_1
1431
+ value: 21.826999999999998
1432
+ - type: recall_at_10
1433
+ value: 62.132
1434
+ - type: recall_at_100
1435
+ value: 88.55199999999999
1436
+ - type: recall_at_1000
1437
+ value: 97.707
1438
+ - type: recall_at_3
1439
+ value: 40.541
1440
+ - type: recall_at_5
1441
+ value: 49.739
1442
+ task:
1443
+ type: Retrieval
1444
+ - dataset:
1445
+ config: en
1446
+ name: MTEB MTOPDomainClassification (en)
1447
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
1448
+ split: test
1449
+ type: mteb/mtop_domain
1450
+ metrics:
1451
+ - type: accuracy
1452
+ value: 95.68399452804377
1453
+ - type: f1
1454
+ value: 95.25490609832268
1455
+ task:
1456
+ type: Classification
1457
+ - dataset:
1458
+ config: en
1459
+ name: MTEB MTOPIntentClassification (en)
1460
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1461
+ split: test
1462
+ type: mteb/mtop_intent
1463
+ metrics:
1464
+ - type: accuracy
1465
+ value: 83.15321477428182
1466
+ - type: f1
1467
+ value: 60.35476439087966
1468
+ task:
1469
+ type: Classification
1470
+ - dataset:
1471
+ config: en
1472
+ name: MTEB MassiveIntentClassification (en)
1473
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1474
+ split: test
1475
+ type: mteb/amazon_massive_intent
1476
+ metrics:
1477
+ - type: accuracy
1478
+ value: 71.92669804976462
1479
+ - type: f1
1480
+ value: 69.22815107207565
1481
+ task:
1482
+ type: Classification
1483
+ - dataset:
1484
+ config: en
1485
+ name: MTEB MassiveScenarioClassification (en)
1486
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
1487
+ split: test
1488
+ type: mteb/amazon_massive_scenario
1489
+ metrics:
1490
+ - type: accuracy
1491
+ value: 74.4855413584398
1492
+ - type: f1
1493
+ value: 72.92107516103387
1494
+ task:
1495
+ type: Classification
1496
+ - dataset:
1497
+ config: default
1498
+ name: MTEB MedrxivClusteringP2P
1499
+ revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
1500
+ split: test
1501
+ type: mteb/medrxiv-clustering-p2p
1502
+ metrics:
1503
+ - type: v_measure
1504
+ value: 32.412679360205544
1505
+ task:
1506
+ type: Clustering
1507
+ - dataset:
1508
+ config: default
1509
+ name: MTEB MedrxivClusteringS2S
1510
+ revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
1511
+ split: test
1512
+ type: mteb/medrxiv-clustering-s2s
1513
+ metrics:
1514
+ - type: v_measure
1515
+ value: 28.09211869875204
1516
+ task:
1517
+ type: Clustering
1518
+ - dataset:
1519
+ config: default
1520
+ name: MTEB MindSmallReranking
1521
+ revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
1522
+ split: test
1523
+ type: mteb/mind_small
1524
+ metrics:
1525
+ - type: map
1526
+ value: 30.540919056982545
1527
+ - type: mrr
1528
+ value: 31.529904607063536
1529
+ task:
1530
+ type: Reranking
1531
+ - dataset:
1532
+ config: default
1533
+ name: MTEB NFCorpus
1534
+ revision: None
1535
+ split: test
1536
+ type: nfcorpus
1537
+ metrics:
1538
+ - type: map_at_1
1539
+ value: 5.745
1540
+ - type: map_at_10
1541
+ value: 12.013
1542
+ - type: map_at_100
1543
+ value: 15.040000000000001
1544
+ - type: map_at_1000
1545
+ value: 16.427
1546
+ - type: map_at_3
1547
+ value: 8.841000000000001
1548
+ - type: map_at_5
1549
+ value: 10.289
1550
+ - type: mrr_at_1
1551
+ value: 45.201
1552
+ - type: mrr_at_10
1553
+ value: 53.483999999999995
1554
+ - type: mrr_at_100
1555
+ value: 54.20700000000001
1556
+ - type: mrr_at_1000
1557
+ value: 54.252
1558
+ - type: mrr_at_3
1559
+ value: 51.29
1560
+ - type: mrr_at_5
1561
+ value: 52.73
1562
+ - type: ndcg_at_1
1563
+ value: 43.808
1564
+ - type: ndcg_at_10
1565
+ value: 32.445
1566
+ - type: ndcg_at_100
1567
+ value: 30.031000000000002
1568
+ - type: ndcg_at_1000
1569
+ value: 39.007
1570
+ - type: ndcg_at_3
1571
+ value: 37.204
1572
+ - type: ndcg_at_5
1573
+ value: 35.07
1574
+ - type: precision_at_1
1575
+ value: 45.201
1576
+ - type: precision_at_10
1577
+ value: 23.684
1578
+ - type: precision_at_100
1579
+ value: 7.600999999999999
1580
+ - type: precision_at_1000
1581
+ value: 2.043
1582
+ - type: precision_at_3
1583
+ value: 33.953
1584
+ - type: precision_at_5
1585
+ value: 29.412
1586
+ - type: recall_at_1
1587
+ value: 5.745
1588
+ - type: recall_at_10
1589
+ value: 16.168
1590
+ - type: recall_at_100
1591
+ value: 30.875999999999998
1592
+ - type: recall_at_1000
1593
+ value: 62.686
1594
+ - type: recall_at_3
1595
+ value: 9.75
1596
+ - type: recall_at_5
1597
+ value: 12.413
1598
+ task:
1599
+ type: Retrieval
1600
+ - dataset:
1601
+ config: default
1602
+ name: MTEB NQ
1603
+ revision: None
1604
+ split: test
1605
+ type: nq
1606
+ metrics:
1607
+ - type: map_at_1
1608
+ value: 37.828
1609
+ - type: map_at_10
1610
+ value: 53.239000000000004
1611
+ - type: map_at_100
1612
+ value: 54.035999999999994
1613
+ - type: map_at_1000
1614
+ value: 54.067
1615
+ - type: map_at_3
1616
+ value: 49.289
1617
+ - type: map_at_5
1618
+ value: 51.784
1619
+ - type: mrr_at_1
1620
+ value: 42.497
1621
+ - type: mrr_at_10
1622
+ value: 55.916999999999994
1623
+ - type: mrr_at_100
1624
+ value: 56.495
1625
+ - type: mrr_at_1000
1626
+ value: 56.516999999999996
1627
+ - type: mrr_at_3
1628
+ value: 52.800000000000004
1629
+ - type: mrr_at_5
1630
+ value: 54.722
1631
+ - type: ndcg_at_1
1632
+ value: 42.468
1633
+ - type: ndcg_at_10
1634
+ value: 60.437
1635
+ - type: ndcg_at_100
1636
+ value: 63.731
1637
+ - type: ndcg_at_1000
1638
+ value: 64.41799999999999
1639
+ - type: ndcg_at_3
1640
+ value: 53.230999999999995
1641
+ - type: ndcg_at_5
1642
+ value: 57.26
1643
+ - type: precision_at_1
1644
+ value: 42.468
1645
+ - type: precision_at_10
1646
+ value: 9.47
1647
+ - type: precision_at_100
1648
+ value: 1.1360000000000001
1649
+ - type: precision_at_1000
1650
+ value: 0.12
1651
+ - type: precision_at_3
1652
+ value: 23.724999999999998
1653
+ - type: precision_at_5
1654
+ value: 16.593
1655
+ - type: recall_at_1
1656
+ value: 37.828
1657
+ - type: recall_at_10
1658
+ value: 79.538
1659
+ - type: recall_at_100
1660
+ value: 93.646
1661
+ - type: recall_at_1000
1662
+ value: 98.72999999999999
1663
+ - type: recall_at_3
1664
+ value: 61.134
1665
+ - type: recall_at_5
1666
+ value: 70.377
1667
+ task:
1668
+ type: Retrieval
1669
+ - dataset:
1670
+ config: default
1671
+ name: MTEB QuoraRetrieval
1672
+ revision: None
1673
+ split: test
1674
+ type: quora
1675
+ metrics:
1676
+ - type: map_at_1
1677
+ value: 70.548
1678
+ - type: map_at_10
1679
+ value: 84.466
1680
+ - type: map_at_100
1681
+ value: 85.10600000000001
1682
+ - type: map_at_1000
1683
+ value: 85.123
1684
+ - type: map_at_3
1685
+ value: 81.57600000000001
1686
+ - type: map_at_5
1687
+ value: 83.399
1688
+ - type: mrr_at_1
1689
+ value: 81.24
1690
+ - type: mrr_at_10
1691
+ value: 87.457
1692
+ - type: mrr_at_100
1693
+ value: 87.574
1694
+ - type: mrr_at_1000
1695
+ value: 87.575
1696
+ - type: mrr_at_3
1697
+ value: 86.507
1698
+ - type: mrr_at_5
1699
+ value: 87.205
1700
+ - type: ndcg_at_1
1701
+ value: 81.25
1702
+ - type: ndcg_at_10
1703
+ value: 88.203
1704
+ - type: ndcg_at_100
1705
+ value: 89.457
1706
+ - type: ndcg_at_1000
1707
+ value: 89.563
1708
+ - type: ndcg_at_3
1709
+ value: 85.465
1710
+ - type: ndcg_at_5
1711
+ value: 87.007
1712
+ - type: precision_at_1
1713
+ value: 81.25
1714
+ - type: precision_at_10
1715
+ value: 13.373
1716
+ - type: precision_at_100
1717
+ value: 1.5270000000000001
1718
+ - type: precision_at_1000
1719
+ value: 0.157
1720
+ - type: precision_at_3
1721
+ value: 37.417
1722
+ - type: precision_at_5
1723
+ value: 24.556
1724
+ - type: recall_at_1
1725
+ value: 70.548
1726
+ - type: recall_at_10
1727
+ value: 95.208
1728
+ - type: recall_at_100
1729
+ value: 99.514
1730
+ - type: recall_at_1000
1731
+ value: 99.988
1732
+ - type: recall_at_3
1733
+ value: 87.214
1734
+ - type: recall_at_5
1735
+ value: 91.696
1736
+ task:
1737
+ type: Retrieval
1738
+ - dataset:
1739
+ config: default
1740
+ name: MTEB RedditClustering
1741
+ revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
1742
+ split: test
1743
+ type: mteb/reddit-clustering
1744
+ metrics:
1745
+ - type: v_measure
1746
+ value: 53.04822095496839
1747
+ task:
1748
+ type: Clustering
1749
+ - dataset:
1750
+ config: default
1751
+ name: MTEB RedditClusteringP2P
1752
+ revision: 282350215ef01743dc01b456c7f5241fa8937f16
1753
+ split: test
1754
+ type: mteb/reddit-clustering-p2p
1755
+ metrics:
1756
+ - type: v_measure
1757
+ value: 60.30778476474675
1758
+ task:
1759
+ type: Clustering
1760
+ - dataset:
1761
+ config: default
1762
+ name: MTEB SCIDOCS
1763
+ revision: None
1764
+ split: test
1765
+ type: scidocs
1766
+ metrics:
1767
+ - type: map_at_1
1768
+ value: 4.692
1769
+ - type: map_at_10
1770
+ value: 11.766
1771
+ - type: map_at_100
1772
+ value: 13.904
1773
+ - type: map_at_1000
1774
+ value: 14.216999999999999
1775
+ - type: map_at_3
1776
+ value: 8.245
1777
+ - type: map_at_5
1778
+ value: 9.92
1779
+ - type: mrr_at_1
1780
+ value: 23.0
1781
+ - type: mrr_at_10
1782
+ value: 33.78
1783
+ - type: mrr_at_100
1784
+ value: 34.922
1785
+ - type: mrr_at_1000
1786
+ value: 34.973
1787
+ - type: mrr_at_3
1788
+ value: 30.2
1789
+ - type: mrr_at_5
1790
+ value: 32.565
1791
+ - type: ndcg_at_1
1792
+ value: 23.0
1793
+ - type: ndcg_at_10
1794
+ value: 19.863
1795
+ - type: ndcg_at_100
1796
+ value: 28.141
1797
+ - type: ndcg_at_1000
1798
+ value: 33.549
1799
+ - type: ndcg_at_3
1800
+ value: 18.434
1801
+ - type: ndcg_at_5
1802
+ value: 16.384
1803
+ - type: precision_at_1
1804
+ value: 23.0
1805
+ - type: precision_at_10
1806
+ value: 10.39
1807
+ - type: precision_at_100
1808
+ value: 2.235
1809
+ - type: precision_at_1000
1810
+ value: 0.35300000000000004
1811
+ - type: precision_at_3
1812
+ value: 17.133000000000003
1813
+ - type: precision_at_5
1814
+ value: 14.44
1815
+ - type: recall_at_1
1816
+ value: 4.692
1817
+ - type: recall_at_10
1818
+ value: 21.025
1819
+ - type: recall_at_100
1820
+ value: 45.324999999999996
1821
+ - type: recall_at_1000
1822
+ value: 71.675
1823
+ - type: recall_at_3
1824
+ value: 10.440000000000001
1825
+ - type: recall_at_5
1826
+ value: 14.64
1827
+ task:
1828
+ type: Retrieval
1829
+ - dataset:
1830
+ config: default
1831
+ name: MTEB SICK-R
1832
+ revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
1833
+ split: test
1834
+ type: mteb/sickr-sts
1835
+ metrics:
1836
+ - type: cos_sim_pearson
1837
+ value: 84.96178184892842
1838
+ - type: cos_sim_spearman
1839
+ value: 79.6487740813199
1840
+ - type: euclidean_pearson
1841
+ value: 82.06661161625023
1842
+ - type: euclidean_spearman
1843
+ value: 79.64876769031183
1844
+ - type: manhattan_pearson
1845
+ value: 82.07061164575131
1846
+ - type: manhattan_spearman
1847
+ value: 79.65197039464537
1848
+ task:
1849
+ type: STS
1850
+ - dataset:
1851
+ config: default
1852
+ name: MTEB STS12
1853
+ revision: a0d554a64d88156834ff5ae9920b964011b16384
1854
+ split: test
1855
+ type: mteb/sts12-sts
1856
+ metrics:
1857
+ - type: cos_sim_pearson
1858
+ value: 84.15305604100027
1859
+ - type: cos_sim_spearman
1860
+ value: 74.27447427941591
1861
+ - type: euclidean_pearson
1862
+ value: 80.52737337565307
1863
+ - type: euclidean_spearman
1864
+ value: 74.27416077132192
1865
+ - type: manhattan_pearson
1866
+ value: 80.53728571140387
1867
+ - type: manhattan_spearman
1868
+ value: 74.28853605753457
1869
+ task:
1870
+ type: STS
1871
+ - dataset:
1872
+ config: default
1873
+ name: MTEB STS13
1874
+ revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
1875
+ split: test
1876
+ type: mteb/sts13-sts
1877
+ metrics:
1878
+ - type: cos_sim_pearson
1879
+ value: 83.44386080639279
1880
+ - type: cos_sim_spearman
1881
+ value: 84.17947648159536
1882
+ - type: euclidean_pearson
1883
+ value: 83.34145388129387
1884
+ - type: euclidean_spearman
1885
+ value: 84.17947648159536
1886
+ - type: manhattan_pearson
1887
+ value: 83.30699061927966
1888
+ - type: manhattan_spearman
1889
+ value: 84.18125737380451
1890
+ task:
1891
+ type: STS
1892
+ - dataset:
1893
+ config: default
1894
+ name: MTEB STS14
1895
+ revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
1896
+ split: test
1897
+ type: mteb/sts14-sts
1898
+ metrics:
1899
+ - type: cos_sim_pearson
1900
+ value: 81.57392220985612
1901
+ - type: cos_sim_spearman
1902
+ value: 78.80745014464101
1903
+ - type: euclidean_pearson
1904
+ value: 80.01660371487199
1905
+ - type: euclidean_spearman
1906
+ value: 78.80741240102256
1907
+ - type: manhattan_pearson
1908
+ value: 79.96810779507953
1909
+ - type: manhattan_spearman
1910
+ value: 78.75600400119448
1911
+ task:
1912
+ type: STS
1913
+ - dataset:
1914
+ config: default
1915
+ name: MTEB STS15
1916
+ revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
1917
+ split: test
1918
+ type: mteb/sts15-sts
1919
+ metrics:
1920
+ - type: cos_sim_pearson
1921
+ value: 86.85421063026625
1922
+ - type: cos_sim_spearman
1923
+ value: 87.55320285299192
1924
+ - type: euclidean_pearson
1925
+ value: 86.69750143323517
1926
+ - type: euclidean_spearman
1927
+ value: 87.55320284326378
1928
+ - type: manhattan_pearson
1929
+ value: 86.63379169960379
1930
+ - type: manhattan_spearman
1931
+ value: 87.4815029877984
1932
+ task:
1933
+ type: STS
1934
+ - dataset:
1935
+ config: default
1936
+ name: MTEB STS16
1937
+ revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
1938
+ split: test
1939
+ type: mteb/sts16-sts
1940
+ metrics:
1941
+ - type: cos_sim_pearson
1942
+ value: 84.31314130411842
1943
+ - type: cos_sim_spearman
1944
+ value: 85.3489588181433
1945
+ - type: euclidean_pearson
1946
+ value: 84.13240933463535
1947
+ - type: euclidean_spearman
1948
+ value: 85.34902871403281
1949
+ - type: manhattan_pearson
1950
+ value: 84.01183086503559
1951
+ - type: manhattan_spearman
1952
+ value: 85.19316703166102
1953
+ task:
1954
+ type: STS
1955
+ - dataset:
1956
+ config: en-en
1957
+ name: MTEB STS17 (en-en)
1958
+ revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
1959
+ split: test
1960
+ type: mteb/sts17-crosslingual-sts
1961
+ metrics:
1962
+ - type: cos_sim_pearson
1963
+ value: 89.09979781689536
1964
+ - type: cos_sim_spearman
1965
+ value: 88.87813323759015
1966
+ - type: euclidean_pearson
1967
+ value: 88.65413031123792
1968
+ - type: euclidean_spearman
1969
+ value: 88.87813323759015
1970
+ - type: manhattan_pearson
1971
+ value: 88.61818758256024
1972
+ - type: manhattan_spearman
1973
+ value: 88.81044100494604
1974
+ task:
1975
+ type: STS
1976
+ - dataset:
1977
+ config: en
1978
+ name: MTEB STS22 (en)
1979
+ revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
1980
+ split: test
1981
+ type: mteb/sts22-crosslingual-sts
1982
+ metrics:
1983
+ - type: cos_sim_pearson
1984
+ value: 62.30693258111531
1985
+ - type: cos_sim_spearman
1986
+ value: 62.195516523251946
1987
+ - type: euclidean_pearson
1988
+ value: 62.951283701049476
1989
+ - type: euclidean_spearman
1990
+ value: 62.195516523251946
1991
+ - type: manhattan_pearson
1992
+ value: 63.068322281439535
1993
+ - type: manhattan_spearman
1994
+ value: 62.10621171028406
1995
+ task:
1996
+ type: STS
1997
+ - dataset:
1998
+ config: default
1999
+ name: MTEB STSBenchmark
2000
+ revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
2001
+ split: test
2002
+ type: mteb/stsbenchmark-sts
2003
+ metrics:
2004
+ - type: cos_sim_pearson
2005
+ value: 84.27092833763909
2006
+ - type: cos_sim_spearman
2007
+ value: 84.84429717949759
2008
+ - type: euclidean_pearson
2009
+ value: 84.8516966060792
2010
+ - type: euclidean_spearman
2011
+ value: 84.84429717949759
2012
+ - type: manhattan_pearson
2013
+ value: 84.82203139242881
2014
+ - type: manhattan_spearman
2015
+ value: 84.8358503952945
2016
+ task:
2017
+ type: STS
2018
+ - dataset:
2019
+ config: default
2020
+ name: MTEB SciDocsRR
2021
+ revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
2022
+ split: test
2023
+ type: mteb/scidocs-reranking
2024
+ metrics:
2025
+ - type: map
2026
+ value: 83.10290863981409
2027
+ - type: mrr
2028
+ value: 95.31168450286097
2029
+ task:
2030
+ type: Reranking
2031
+ - dataset:
2032
+ config: default
2033
+ name: MTEB SciFact
2034
+ revision: None
2035
+ split: test
2036
+ type: scifact
2037
+ metrics:
2038
+ - type: map_at_1
2039
+ value: 52.161
2040
+ - type: map_at_10
2041
+ value: 62.138000000000005
2042
+ - type: map_at_100
2043
+ value: 62.769
2044
+ - type: map_at_1000
2045
+ value: 62.812
2046
+ - type: map_at_3
2047
+ value: 59.111000000000004
2048
+ - type: map_at_5
2049
+ value: 60.995999999999995
2050
+ - type: mrr_at_1
2051
+ value: 55.333
2052
+ - type: mrr_at_10
2053
+ value: 63.504000000000005
2054
+ - type: mrr_at_100
2055
+ value: 64.036
2056
+ - type: mrr_at_1000
2057
+ value: 64.08
2058
+ - type: mrr_at_3
2059
+ value: 61.278
2060
+ - type: mrr_at_5
2061
+ value: 62.778
2062
+ - type: ndcg_at_1
2063
+ value: 55.333
2064
+ - type: ndcg_at_10
2065
+ value: 66.678
2066
+ - type: ndcg_at_100
2067
+ value: 69.415
2068
+ - type: ndcg_at_1000
2069
+ value: 70.453
2070
+ - type: ndcg_at_3
2071
+ value: 61.755
2072
+ - type: ndcg_at_5
2073
+ value: 64.546
2074
+ - type: precision_at_1
2075
+ value: 55.333
2076
+ - type: precision_at_10
2077
+ value: 9.033
2078
+ - type: precision_at_100
2079
+ value: 1.043
2080
+ - type: precision_at_1000
2081
+ value: 0.11199999999999999
2082
+ - type: precision_at_3
2083
+ value: 24.221999999999998
2084
+ - type: precision_at_5
2085
+ value: 16.333000000000002
2086
+ - type: recall_at_1
2087
+ value: 52.161
2088
+ - type: recall_at_10
2089
+ value: 79.156
2090
+ - type: recall_at_100
2091
+ value: 91.333
2092
+ - type: recall_at_1000
2093
+ value: 99.333
2094
+ - type: recall_at_3
2095
+ value: 66.43299999999999
2096
+ - type: recall_at_5
2097
+ value: 73.272
2098
+ task:
2099
+ type: Retrieval
2100
+ - dataset:
2101
+ config: default
2102
+ name: MTEB SprintDuplicateQuestions
2103
+ revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
2104
+ split: test
2105
+ type: mteb/sprintduplicatequestions-pairclassification
2106
+ metrics:
2107
+ - type: cos_sim_accuracy
2108
+ value: 99.81287128712871
2109
+ - type: cos_sim_ap
2110
+ value: 95.30034785910676
2111
+ - type: cos_sim_f1
2112
+ value: 90.28629856850716
2113
+ - type: cos_sim_precision
2114
+ value: 92.36401673640168
2115
+ - type: cos_sim_recall
2116
+ value: 88.3
2117
+ - type: dot_accuracy
2118
+ value: 99.81287128712871
2119
+ - type: dot_ap
2120
+ value: 95.30034785910676
2121
+ - type: dot_f1
2122
+ value: 90.28629856850716
2123
+ - type: dot_precision
2124
+ value: 92.36401673640168
2125
+ - type: dot_recall
2126
+ value: 88.3
2127
+ - type: euclidean_accuracy
2128
+ value: 99.81287128712871
2129
+ - type: euclidean_ap
2130
+ value: 95.30034785910676
2131
+ - type: euclidean_f1
2132
+ value: 90.28629856850716
2133
+ - type: euclidean_precision
2134
+ value: 92.36401673640168
2135
+ - type: euclidean_recall
2136
+ value: 88.3
2137
+ - type: manhattan_accuracy
2138
+ value: 99.80990099009901
2139
+ - type: manhattan_ap
2140
+ value: 95.26880751950654
2141
+ - type: manhattan_f1
2142
+ value: 90.22177419354838
2143
+ - type: manhattan_precision
2144
+ value: 90.95528455284553
2145
+ - type: manhattan_recall
2146
+ value: 89.5
2147
+ - type: max_accuracy
2148
+ value: 99.81287128712871
2149
+ - type: max_ap
2150
+ value: 95.30034785910676
2151
+ - type: max_f1
2152
+ value: 90.28629856850716
2153
+ task:
2154
+ type: PairClassification
2155
+ - dataset:
2156
+ config: default
2157
+ name: MTEB StackExchangeClustering
2158
+ revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
2159
+ split: test
2160
+ type: mteb/stackexchange-clustering
2161
+ metrics:
2162
+ - type: v_measure
2163
+ value: 58.518662504351184
2164
+ task:
2165
+ type: Clustering
2166
+ - dataset:
2167
+ config: default
2168
+ name: MTEB StackExchangeClusteringP2P
2169
+ revision: 815ca46b2622cec33ccafc3735d572c266efdb44
2170
+ split: test
2171
+ type: mteb/stackexchange-clustering-p2p
2172
+ metrics:
2173
+ - type: v_measure
2174
+ value: 34.96168178378587
2175
+ task:
2176
+ type: Clustering
2177
+ - dataset:
2178
+ config: default
2179
+ name: MTEB StackOverflowDupQuestions
2180
+ revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
2181
+ split: test
2182
+ type: mteb/stackoverflowdupquestions-reranking
2183
+ metrics:
2184
+ - type: map
2185
+ value: 52.04862593471896
2186
+ - type: mrr
2187
+ value: 52.97238402936932
2188
+ task:
2189
+ type: Reranking
2190
+ - dataset:
2191
+ config: default
2192
+ name: MTEB SummEval
2193
+ revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
2194
+ split: test
2195
+ type: mteb/summeval
2196
+ metrics:
2197
+ - type: cos_sim_pearson
2198
+ value: 30.092545236479946
2199
+ - type: cos_sim_spearman
2200
+ value: 31.599851000175498
2201
+ - type: dot_pearson
2202
+ value: 30.092542723901676
2203
+ - type: dot_spearman
2204
+ value: 31.599851000175498
2205
+ task:
2206
+ type: Summarization
2207
+ - dataset:
2208
+ config: default
2209
+ name: MTEB TRECCOVID
2210
+ revision: None
2211
+ split: test
2212
+ type: trec-covid
2213
+ metrics:
2214
+ - type: map_at_1
2215
+ value: 0.189
2216
+ - type: map_at_10
2217
+ value: 1.662
2218
+ - type: map_at_100
2219
+ value: 9.384
2220
+ - type: map_at_1000
2221
+ value: 22.669
2222
+ - type: map_at_3
2223
+ value: 0.5559999999999999
2224
+ - type: map_at_5
2225
+ value: 0.9039999999999999
2226
+ - type: mrr_at_1
2227
+ value: 68.0
2228
+ - type: mrr_at_10
2229
+ value: 81.01899999999999
2230
+ - type: mrr_at_100
2231
+ value: 81.01899999999999
2232
+ - type: mrr_at_1000
2233
+ value: 81.01899999999999
2234
+ - type: mrr_at_3
2235
+ value: 79.333
2236
+ - type: mrr_at_5
2237
+ value: 80.733
2238
+ - type: ndcg_at_1
2239
+ value: 63.0
2240
+ - type: ndcg_at_10
2241
+ value: 65.913
2242
+ - type: ndcg_at_100
2243
+ value: 51.895
2244
+ - type: ndcg_at_1000
2245
+ value: 46.967
2246
+ - type: ndcg_at_3
2247
+ value: 65.49199999999999
2248
+ - type: ndcg_at_5
2249
+ value: 66.69699999999999
2250
+ - type: precision_at_1
2251
+ value: 68.0
2252
+ - type: precision_at_10
2253
+ value: 71.6
2254
+ - type: precision_at_100
2255
+ value: 53.66
2256
+ - type: precision_at_1000
2257
+ value: 21.124000000000002
2258
+ - type: precision_at_3
2259
+ value: 72.667
2260
+ - type: precision_at_5
2261
+ value: 74.0
2262
+ - type: recall_at_1
2263
+ value: 0.189
2264
+ - type: recall_at_10
2265
+ value: 1.913
2266
+ - type: recall_at_100
2267
+ value: 12.601999999999999
2268
+ - type: recall_at_1000
2269
+ value: 44.296
2270
+ - type: recall_at_3
2271
+ value: 0.605
2272
+ - type: recall_at_5
2273
+ value: 1.018
2274
+ task:
2275
+ type: Retrieval
2276
+ - dataset:
2277
+ config: default
2278
+ name: MTEB Touche2020
2279
+ revision: None
2280
+ split: test
2281
+ type: webis-touche2020
2282
+ metrics:
2283
+ - type: map_at_1
2284
+ value: 2.701
2285
+ - type: map_at_10
2286
+ value: 10.445
2287
+ - type: map_at_100
2288
+ value: 17.324
2289
+ - type: map_at_1000
2290
+ value: 19.161
2291
+ - type: map_at_3
2292
+ value: 5.497
2293
+ - type: map_at_5
2294
+ value: 7.278
2295
+ - type: mrr_at_1
2296
+ value: 30.612000000000002
2297
+ - type: mrr_at_10
2298
+ value: 45.534
2299
+ - type: mrr_at_100
2300
+ value: 45.792
2301
+ - type: mrr_at_1000
2302
+ value: 45.806999999999995
2303
+ - type: mrr_at_3
2304
+ value: 37.755
2305
+ - type: mrr_at_5
2306
+ value: 43.469
2307
+ - type: ndcg_at_1
2308
+ value: 26.531
2309
+ - type: ndcg_at_10
2310
+ value: 26.235000000000003
2311
+ - type: ndcg_at_100
2312
+ value: 39.17
2313
+ - type: ndcg_at_1000
2314
+ value: 51.038
2315
+ - type: ndcg_at_3
2316
+ value: 23.625
2317
+ - type: ndcg_at_5
2318
+ value: 24.338
2319
+ - type: precision_at_1
2320
+ value: 30.612000000000002
2321
+ - type: precision_at_10
2322
+ value: 24.285999999999998
2323
+ - type: precision_at_100
2324
+ value: 8.224
2325
+ - type: precision_at_1000
2326
+ value: 1.6179999999999999
2327
+ - type: precision_at_3
2328
+ value: 24.490000000000002
2329
+ - type: precision_at_5
2330
+ value: 24.898
2331
+ - type: recall_at_1
2332
+ value: 2.701
2333
+ - type: recall_at_10
2334
+ value: 17.997
2335
+ - type: recall_at_100
2336
+ value: 51.766999999999996
2337
+ - type: recall_at_1000
2338
+ value: 87.863
2339
+ - type: recall_at_3
2340
+ value: 6.295000000000001
2341
+ - type: recall_at_5
2342
+ value: 9.993
2343
+ task:
2344
+ type: Retrieval
2345
+ - dataset:
2346
+ config: default
2347
+ name: MTEB ToxicConversationsClassification
2348
+ revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
2349
+ split: test
2350
+ type: mteb/toxic_conversations_50k
2351
+ metrics:
2352
+ - type: accuracy
2353
+ value: 73.3474
2354
+ - type: ap
2355
+ value: 15.393431414459924
2356
+ - type: f1
2357
+ value: 56.466681887882416
2358
+ task:
2359
+ type: Classification
2360
+ - dataset:
2361
+ config: default
2362
+ name: MTEB TweetSentimentExtractionClassification
2363
+ revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
2364
+ split: test
2365
+ type: mteb/tweet_sentiment_extraction
2366
+ metrics:
2367
+ - type: accuracy
2368
+ value: 62.062818336163
2369
+ - type: f1
2370
+ value: 62.11230840463252
2371
+ task:
2372
+ type: Classification
2373
+ - dataset:
2374
+ config: default
2375
+ name: MTEB TwentyNewsgroupsClustering
2376
+ revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
2377
+ split: test
2378
+ type: mteb/twentynewsgroups-clustering
2379
+ metrics:
2380
+ - type: v_measure
2381
+ value: 42.464892820845115
2382
+ task:
2383
+ type: Clustering
2384
+ - dataset:
2385
+ config: default
2386
+ name: MTEB TwitterSemEval2015
2387
+ revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
2388
+ split: test
2389
+ type: mteb/twittersemeval2015-pairclassification
2390
+ metrics:
2391
+ - type: cos_sim_accuracy
2392
+ value: 86.15962329379508
2393
+ - type: cos_sim_ap
2394
+ value: 74.73674057919256
2395
+ - type: cos_sim_f1
2396
+ value: 68.81245642574947
2397
+ - type: cos_sim_precision
2398
+ value: 61.48255813953488
2399
+ - type: cos_sim_recall
2400
+ value: 78.12664907651715
2401
+ - type: dot_accuracy
2402
+ value: 86.15962329379508
2403
+ - type: dot_ap
2404
+ value: 74.7367634988281
2405
+ - type: dot_f1
2406
+ value: 68.81245642574947
2407
+ - type: dot_precision
2408
+ value: 61.48255813953488
2409
+ - type: dot_recall
2410
+ value: 78.12664907651715
2411
+ - type: euclidean_accuracy
2412
+ value: 86.15962329379508
2413
+ - type: euclidean_ap
2414
+ value: 74.7367761466634
2415
+ - type: euclidean_f1
2416
+ value: 68.81245642574947
2417
+ - type: euclidean_precision
2418
+ value: 61.48255813953488
2419
+ - type: euclidean_recall
2420
+ value: 78.12664907651715
2421
+ - type: manhattan_accuracy
2422
+ value: 86.21326816474935
2423
+ - type: manhattan_ap
2424
+ value: 74.64416473733951
2425
+ - type: manhattan_f1
2426
+ value: 68.80924855491331
2427
+ - type: manhattan_precision
2428
+ value: 61.23456790123457
2429
+ - type: manhattan_recall
2430
+ value: 78.52242744063325
2431
+ - type: max_accuracy
2432
+ value: 86.21326816474935
2433
+ - type: max_ap
2434
+ value: 74.7367761466634
2435
+ - type: max_f1
2436
+ value: 68.81245642574947
2437
+ task:
2438
+ type: PairClassification
2439
+ - dataset:
2440
+ config: default
2441
+ name: MTEB TwitterURLCorpus
2442
+ revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
2443
+ split: test
2444
+ type: mteb/twitterurlcorpus-pairclassification
2445
+ metrics:
2446
+ - type: cos_sim_accuracy
2447
+ value: 88.97620988085536
2448
+ - type: cos_sim_ap
2449
+ value: 86.08680845745758
2450
+ - type: cos_sim_f1
2451
+ value: 78.02793637114438
2452
+ - type: cos_sim_precision
2453
+ value: 73.11082699683736
2454
+ - type: cos_sim_recall
2455
+ value: 83.65414228518632
2456
+ - type: dot_accuracy
2457
+ value: 88.97620988085536
2458
+ - type: dot_ap
2459
+ value: 86.08681149437946
2460
+ - type: dot_f1
2461
+ value: 78.02793637114438
2462
+ - type: dot_precision
2463
+ value: 73.11082699683736
2464
+ - type: dot_recall
2465
+ value: 83.65414228518632
2466
+ - type: euclidean_accuracy
2467
+ value: 88.97620988085536
2468
+ - type: euclidean_ap
2469
+ value: 86.08681215460771
2470
+ - type: euclidean_f1
2471
+ value: 78.02793637114438
2472
+ - type: euclidean_precision
2473
+ value: 73.11082699683736
2474
+ - type: euclidean_recall
2475
+ value: 83.65414228518632
2476
+ - type: manhattan_accuracy
2477
+ value: 88.88888888888889
2478
+ - type: manhattan_ap
2479
+ value: 86.02916327562438
2480
+ - type: manhattan_f1
2481
+ value: 78.02063045516843
2482
+ - type: manhattan_precision
2483
+ value: 73.38851947346994
2484
+ - type: manhattan_recall
2485
+ value: 83.2768709578072
2486
+ - type: max_accuracy
2487
+ value: 88.97620988085536
2488
+ - type: max_ap
2489
+ value: 86.08681215460771
2490
+ - type: max_f1
2491
+ value: 78.02793637114438
2492
+ task:
2493
+ type: PairClassification
2494
+ tags:
2495
+ - sentence-transformers
2496
+ - feature-extraction
2497
+ - sentence-similarity
2498
+ - mteb
2499
+ - onnx
2500
+ - teradata
2501
+
2502
+ ---
2503
+ # A Teradata Vantage compatible Embeddings Model
2504
+
2505
+ # jinaai/jina-embeddings-v2-base-en
2506
+
2507
+ ## Overview of this Model
2508
+
2509
+ An Embedding Model which maps text (sentence/ paragraphs) into a vector. The [jinaai/jina-embeddings-v2-base-en](https://huggingface.co/jinaai/jina-embeddings-v2-base-en) model well known for its effectiveness in capturing semantic meanings in text data. It's a state-of-the-art model trained on a large corpus, capable of generating high-quality text embeddings.
2510
+
2511
+ - 137.37M params (Sizes in ONNX format - "fp32": 522.03MB, "int8": 131.14MB, "uint8": 131.14MB)
2512
+ - 8192 maximum input tokens
2513
+ - 768 dimensions of output vector
2514
+ - Licence: apache-2.0. The released models can be used for commercial purposes free of charge.
2515
+ - Reference to Original Model: https://huggingface.co/jinaai/jina-embeddings-v2-base-en
2516
+
2517
+
2518
+ ## Quickstart: Deploying this Model in Teradata Vantage
2519
+
2520
+ We have pre-converted the model into the ONNX format compatible with BYOM 6.0, eliminating the need for manual conversion.
2521
+
2522
+ **Note:** Ensure you have access to a Teradata Database with BYOM 6.0 installed.
2523
+
2524
+ To get started, clone the pre-converted model directly from the Teradata HuggingFace repository.
2525
+
2526
+
2527
+ ```python
2528
+
2529
+ import teradataml as tdml
2530
+ import getpass
2531
+ from huggingface_hub import hf_hub_download
2532
+
2533
+ model_name = "jina-embeddings-v2-base-en"
2534
+ number_dimensions_output = 768
2535
+ model_file_name = "model.onnx"
2536
+
2537
+ # Step 1: Download Model from Teradata HuggingFace Page
2538
+
2539
+ hf_hub_download(repo_id=f"Teradata/{model_name}", filename=f"onnx/{model_file_name}", local_dir="./")
2540
+ hf_hub_download(repo_id=f"Teradata/{model_name}", filename=f"tokenizer.json", local_dir="./")
2541
+
2542
+ # Step 2: Create Connection to Vantage
2543
+
2544
+ tdml.create_context(host = input('enter your hostname'),
2545
+ username=input('enter your username'),
2546
+ password = getpass.getpass("enter your password"))
2547
+
2548
+ # Step 3: Load Models into Vantage
2549
+ # a) Embedding model
2550
+ tdml.save_byom(model_id = model_name, # must be unique in the models table
2551
+ model_file = model_file_name,
2552
+ table_name = 'embeddings_models' )
2553
+ # b) Tokenizer
2554
+ tdml.save_byom(model_id = model_name, # must be unique in the models table
2555
+ model_file = 'tokenizer.json',
2556
+ table_name = 'embeddings_tokenizers')
2557
+
2558
+ # Step 4: Test ONNXEmbeddings Function
2559
+ # Note that ONNXEmbeddings expects the 'payload' column to be 'txt'.
2560
+ # If it has got a different name, just rename it in a subquery/CTE.
2561
+ input_table = "emails.emails"
2562
+ embeddings_query = f"""
2563
+ SELECT
2564
+ *
2565
+ from mldb.ONNXEmbeddings(
2566
+ on {input_table} as InputTable
2567
+ on (select * from embeddings_models where model_id = '{model_name}') as ModelTable DIMENSION
2568
+ on (select model as tokenizer from embeddings_tokenizers where model_id = '{model_name}') as TokenizerTable DIMENSION
2569
+ using
2570
+ Accumulate('id', 'txt')
2571
+ ModelOutputTensor('sentence_embedding')
2572
+ EnableMemoryCheck('false')
2573
+ OutputFormat('FLOAT32({number_dimensions_output})')
2574
+ OverwriteCachedModel('true')
2575
+ ) a
2576
+ """
2577
+ DF_embeddings = tdml.DataFrame.from_query(embeddings_query)
2578
+ DF_embeddings
2579
+ ```
2580
+
2581
+
2582
+
2583
+ ## What Can I Do with the Embeddings?
2584
+
2585
+ Teradata Vantage includes pre-built in-database functions to process embeddings further. Explore the following examples:
2586
+
2587
+ - **Semantic Clustering with TD_KMeans:** [Semantic Clustering Python Notebook](https://github.com/Teradata/jupyter-demos/blob/main/UseCases/Language_Models_InVantage/Semantic_Clustering_Python.ipynb)
2588
+ - **Semantic Distance with TD_VectorDistance:** [Semantic Similarity Python Notebook](https://github.com/Teradata/jupyter-demos/blob/main/UseCases/Language_Models_InVantage/Semantic_Similarity_Python.ipynb)
2589
+ - **RAG-Based Application with TD_VectorDistance:** [RAG and Bedrock Query PDF Notebook](https://github.com/Teradata/jupyter-demos/blob/main/UseCases/Language_Models_InVantage/RAG_and_Bedrock_QueryPDF.ipynb)
2590
+
2591
+
2592
+ ## Deep Dive into Model Conversion to ONNX
2593
+
2594
+ **The steps below outline how we converted the open-source Hugging Face model into an ONNX file compatible with the in-database ONNXEmbeddings function.**
2595
+
2596
+ You do not need to perform these steps—they are provided solely for documentation and transparency. However, they may be helpful if you wish to convert another model to the required format.
2597
+
2598
+
2599
+ ### Part 1. Importing and Converting Model using optimum
2600
+
2601
+ We start by importing the pre-trained [jinaai/jina-embeddings-v2-base-en](https://huggingface.co/jinaai/jina-embeddings-v2-base-en) model from Hugging Face.
2602
+
2603
+ We are downloading the ONNX files from the repository prepared by the model authors.
2604
+
2605
+ After downloading, we are fixing the opset in the ONNX file for compatibility with ONNX runtime used in Teradata Vantage
2606
+
2607
+ Also we adding the man pooling and normalization layers to the ONNX file
2608
+
2609
+ We are generating ONNX files for multiple different precisions: fp32, int8, uint8
2610
+
2611
+ You can find the detailed conversion steps in the file [convert.py](./convert.py)
2612
+
2613
+ ### Part 2. Running the model in Python with onnxruntime & compare results
2614
+
2615
+ Once the fixes are applied, we proceed to test the correctness of the ONNX model by calculating cosine similarity between two texts using native SentenceTransformers and ONNX runtime, comparing the results.
2616
+
2617
+ If the results are identical, it confirms that the ONNX model gives the same result as the native models, validating its correctness and suitability for further use in the database.
2618
+
2619
+
2620
+ ```python
2621
+ import onnxruntime as rt
2622
+
2623
+ from sentence_transformers.util import cos_sim
2624
+ from sentence_transformers import SentenceTransformer
2625
+
2626
+ import transformers
2627
+
2628
+
2629
+ sentences_1 = 'How is the weather today?'
2630
+ sentences_2 = 'What is the current weather like today?'
2631
+
2632
+ # Calculate ONNX result
2633
+ tokenizer = transformers.AutoTokenizer.from_pretrained("jinaai/jina-embeddings-v2-base-en")
2634
+ predef_sess = rt.InferenceSession("onnx/model.onnx")
2635
+
2636
+ enc1 = tokenizer(sentences_1)
2637
+ embeddings_1_onnx = predef_sess.run(None, {"input_ids": [enc1.input_ids],
2638
+ "attention_mask": [enc1.attention_mask]})
2639
+
2640
+ enc2 = tokenizer(sentences_2)
2641
+ embeddings_2_onnx = predef_sess.run(None, {"input_ids": [enc2.input_ids],
2642
+ "attention_mask": [enc2.attention_mask]})
2643
+
2644
+
2645
+ # Calculate embeddings with SentenceTransformer
2646
+ model = SentenceTransformer(model_id, trust_remote_code=True)
2647
+ embeddings_1_sentence_transformer = model.encode(sentences_1, normalize_embeddings=True, trust_remote_code=True)
2648
+ embeddings_2_sentence_transformer = model.encode(sentences_2, normalize_embeddings=True, trust_remote_code=True)
2649
+
2650
+ # Compare results
2651
+ print("Cosine similiarity for embeddings calculated with ONNX:" + str(cos_sim(embeddings_1_onnx[1][0], embeddings_2_onnx[1][0])))
2652
+ print("Cosine similiarity for embeddings calculated with SentenceTransformer:" + str(cos_sim(embeddings_1_sentence_transformer, embeddings_2_sentence_transformer)))
2653
+ ```
2654
+
2655
+ You can find the detailed ONNX vs. SentenceTransformer result comparison steps in the file [test_local.py](./test_local.py)
2656
+
config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_attn_implementation_autoset": true,
3
+ "_name_or_path": "jinaai/jina-embeddings-v2-base-en",
4
+ "architectures": [
5
+ "JinaBertForMaskedLM"
6
+ ],
7
+ "attention_probs_dropout_prob": 0.0,
8
+ "attn_implementation": null,
9
+ "auto_map": {
10
+ "AutoConfig": "jinaai/jina-bert-implementation--configuration_bert.JinaBertConfig",
11
+ "AutoModel": "jinaai/jina-bert-implementation--modeling_bert.JinaBertModel",
12
+ "AutoModelForMaskedLM": "jinaai/jina-bert-implementation--modeling_bert.JinaBertForMaskedLM",
13
+ "AutoModelForSequenceClassification": "jinaai/jina-bert-implementation--modeling_bert.JinaBertForSequenceClassification"
14
+ },
15
+ "classifier_dropout": null,
16
+ "emb_pooler": "mean",
17
+ "export_model_type": "transformer",
18
+ "feed_forward_type": "geglu",
19
+ "gradient_checkpointing": false,
20
+ "hidden_act": "gelu",
21
+ "hidden_dropout_prob": 0.1,
22
+ "hidden_size": 768,
23
+ "initializer_range": 0.02,
24
+ "intermediate_size": 3072,
25
+ "layer_norm_eps": 1e-12,
26
+ "max_position_embeddings": 8192,
27
+ "model_max_length": 8192,
28
+ "model_type": "bert",
29
+ "num_attention_heads": 12,
30
+ "num_hidden_layers": 12,
31
+ "pad_token_id": 0,
32
+ "position_embedding_type": "alibi",
33
+ "torch_dtype": "float32",
34
+ "transformers_version": "4.47.1",
35
+ "type_vocab_size": 2,
36
+ "use_cache": true,
37
+ "vocab_size": 30528
38
+ }
conversion_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_id": "jinaai/jina-embeddings-v2-base-en",
3
+ "number_of_generated_embeddings": 768,
4
+ "precision_to_filename_map": {
5
+ "fp32": "onnx/model.onnx",
6
+ "int8": "onnx/model_int8.onnx",
7
+ "uint8": "onnx/model_uint8.onnx"
8
+
9
+ },
10
+ "opset": 16,
11
+ "IR": 8
12
+ }
convert.py ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import shutil
4
+
5
+ from optimum.exporters.onnx import main_export
6
+ import onnx
7
+ from onnxconverter_common import float16
8
+ import onnxruntime as rt
9
+ from onnxruntime.tools.onnx_model_utils import *
10
+ from onnxruntime.quantization import quantize_dynamic, QuantType
11
+ import huggingface_hub
12
+
13
+ def add_mean_pooling(input_model, output_model, op, IR, output_embeddings_number):
14
+ model = onnx.load(input_model)
15
+ model_ir8 = onnx.helper.make_model(model.graph, ir_version = IR, opset_imports = [op]) #to be sure that we have compatible opset and IR version
16
+
17
+ minus_one_axis = onnx.helper.make_tensor(
18
+ name = "minus_one_axis",
19
+ data_type = onnx.TensorProto.INT64,
20
+ dims = [1],
21
+ vals = [-1])
22
+
23
+ model_ir8.graph.initializer.append(minus_one_axis)
24
+
25
+ mask_clip_lower_limit = onnx.helper.make_tensor(
26
+ name = "mask_clip_lower_limit",
27
+ data_type = onnx.TensorProto.FLOAT,
28
+ dims = [1],
29
+ vals = [1e-9])
30
+
31
+ model_ir8.graph.initializer.append(mask_clip_lower_limit)
32
+
33
+ sum_one_axis = onnx.helper.make_tensor(
34
+ name = "sum_one_axis",
35
+ data_type = onnx.TensorProto.INT64,
36
+ dims = [1],
37
+ vals = [1])
38
+
39
+ model_ir8.graph.initializer.append(sum_one_axis)
40
+
41
+ attention_mask_cast_op = onnx.helper.make_node(
42
+ "Cast",
43
+ inputs=["attention_mask"],
44
+ outputs=["attention_mask_fp32"],
45
+ to=onnx.TensorProto.FLOAT
46
+ )
47
+
48
+ model_ir8.graph.node.append(attention_mask_cast_op)
49
+
50
+ expand_dims_op = onnx.helper.make_node(
51
+ "Unsqueeze",
52
+ inputs=["attention_mask_fp32", "minus_one_axis"],
53
+ outputs=["unsqueezed_attention_mask"],
54
+ )
55
+
56
+ model_ir8.graph.node.append(expand_dims_op)
57
+
58
+ shape_op = onnx.helper.make_node(
59
+ "Shape",
60
+ inputs = ["last_hidden_state"],
61
+ outputs = ["last_hidden_state_shape"]
62
+ )
63
+
64
+ model_ir8.graph.node.append(shape_op)
65
+
66
+ broadcast_to_op = onnx.helper.make_node(
67
+ "Expand",
68
+ inputs=["unsqueezed_attention_mask", "last_hidden_state_shape"],
69
+ outputs=["expanded_attention_mask"],
70
+ )
71
+
72
+ model_ir8.graph.node.append(broadcast_to_op)
73
+
74
+ multiply_op = onnx.helper.make_node(
75
+ "Mul",
76
+ inputs=["last_hidden_state", "expanded_attention_mask"],
77
+ outputs=["last_hidden_state_x_expanded_attention_mask"],
78
+ )
79
+
80
+ model_ir8.graph.node.append(multiply_op)
81
+
82
+ sum_embeddings_op = onnx.helper.make_node(
83
+ "ReduceSum",
84
+ inputs=["last_hidden_state_x_expanded_attention_mask", "sum_one_axis"],
85
+ outputs=["sum_last_hidden_state_x_expanded_attention_mask"],
86
+ )
87
+
88
+ model_ir8.graph.node.append(sum_embeddings_op)
89
+
90
+ sum_mask_op = onnx.helper.make_node(
91
+ "ReduceSum",
92
+ inputs=["expanded_attention_mask", "sum_one_axis"],
93
+ outputs=["sum_expanded_attention_mask"],
94
+ )
95
+
96
+ model_ir8.graph.node.append(sum_mask_op)
97
+
98
+ clip_mask_op = onnx.helper.make_node(
99
+ "Clip",
100
+ inputs=["sum_expanded_attention_mask", "mask_clip_lower_limit"],
101
+ outputs=["clipped_sum_expanded_attention_mask"],
102
+ )
103
+
104
+ model_ir8.graph.node.append(clip_mask_op)
105
+
106
+ pooled_embeddings_op = onnx.helper.make_node(
107
+ "Div",
108
+ inputs=["sum_last_hidden_state_x_expanded_attention_mask", "clipped_sum_expanded_attention_mask"],
109
+ outputs=["pooled_embeddings"],
110
+ # outputs=["sentence_embeddings"]
111
+ )
112
+
113
+ model_ir8.graph.node.append(pooled_embeddings_op)
114
+
115
+ squeeze_pooled_embeddings_op = onnx.helper.make_node(
116
+ "Squeeze",
117
+ inputs=["pooled_embeddings", "sum_one_axis"],
118
+ outputs=["squeezed_pooled_embeddings"]
119
+
120
+ )
121
+
122
+ model_ir8.graph.node.append(squeeze_pooled_embeddings_op)
123
+
124
+ normalized_pooled_embeddings_op = onnx.helper.make_node(
125
+ "Normalizer",
126
+ domain="ai.onnx.ml",
127
+ inputs=["squeezed_pooled_embeddings"],
128
+ outputs=["sentence_embedding"],
129
+ norm = "L2"
130
+ )
131
+
132
+
133
+ model_ir8.graph.node.append(normalized_pooled_embeddings_op)
134
+
135
+ sentence_embeddings_output = onnx.helper.make_tensor_value_info(
136
+ "sentence_embedding",
137
+ onnx.TensorProto.FLOAT,
138
+ shape=["batch_size", output_embeddings_number]
139
+ )
140
+
141
+ model_ir8.graph.output.append(sentence_embeddings_output)
142
+
143
+ for node in model_ir8.graph.output:
144
+ if node.name == "last_hidden_state":
145
+ model_ir8.graph.output.remove(node)
146
+
147
+ model_ir8 = onnx.helper.make_model(model_ir8.graph, ir_version = 8, opset_imports = [op]) #to be sure that we have compatible opset and IR version
148
+
149
+ onnx.save(model_ir8, output_model, save_as_external_data = False)
150
+
151
+
152
+
153
+ with open('conversion_config.json') as json_file:
154
+ conversion_config = json.load(json_file)
155
+
156
+
157
+ model_id = conversion_config["model_id"]
158
+ number_of_generated_embeddings = conversion_config["number_of_generated_embeddings"]
159
+ precision_to_filename_map = conversion_config["precision_to_filename_map"]
160
+ opset = conversion_config["opset"]
161
+ IR = conversion_config["IR"]
162
+
163
+
164
+ op = onnx.OperatorSetIdProto()
165
+ op.version = opset
166
+
167
+
168
+ if not os.path.exists("onnx"):
169
+ os.makedirs("onnx")
170
+
171
+ print("Exporting the main model version")
172
+ try:
173
+ main_export(model_name_or_path=model_id, output="./", opset=opset, trust_remote_code=True, task="feature-extraction", dtype="fp32")
174
+ except:
175
+ huggingface_hub.hf_hub_download(repo_id=model_id, filename="model.onnx", local_dir="./")
176
+
177
+
178
+ if "fp32" in precision_to_filename_map:
179
+ print("Exporting the fp32 onnx file...")
180
+
181
+ shutil.copyfile('model.onnx', precision_to_filename_map["fp32"])
182
+ add_mean_pooling("model.onnx", precision_to_filename_map["fp32"], op, IR, number_of_generated_embeddings)
183
+
184
+ print("Done\n\n")
185
+
186
+ if "int8" in precision_to_filename_map:
187
+ print("Quantizing fp32 model to int8...")
188
+ quantize_dynamic("model.onnx", precision_to_filename_map["int8"], weight_type=QuantType.QInt8)
189
+ add_mean_pooling( precision_to_filename_map["int8"], precision_to_filename_map["int8"], op, IR, number_of_generated_embeddings)
190
+ print("Done\n\n")
191
+
192
+ if "uint8" in precision_to_filename_map:
193
+ print("Quantizing fp32 model to uint8...")
194
+ quantize_dynamic("model.onnx", precision_to_filename_map["uint8"], weight_type=QuantType.QUInt8)
195
+ add_mean_pooling( precision_to_filename_map["uint8"], precision_to_filename_map["uint8"], op, IR, number_of_generated_embeddings)
196
+ print("Done\n\n")
197
+
198
+ os.remove("model.onnx")
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f60cdfa2271dcc9a63ef1a6a24bbbd91e9c6360926225dc6977763f043d991b
3
+ size 547391321
onnx/model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a710cd487b92474378d06c630d5562f5dce9804199b3cca12aeba51a465c8098
3
+ size 137512014
onnx/model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe0d28b40aa940d30fe785f07c4c53462870368265a490e9fa7b6ff64ebe8430
3
+ size 137512051
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
test_local.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import onnxruntime as rt
2
+
3
+ from sentence_transformers.util import cos_sim
4
+ from sentence_transformers import SentenceTransformer
5
+
6
+ import transformers
7
+
8
+ import gc
9
+ import json
10
+
11
+
12
+ with open('conversion_config.json') as json_file:
13
+ conversion_config = json.load(json_file)
14
+
15
+
16
+ model_id = conversion_config["model_id"]
17
+ number_of_generated_embeddings = conversion_config["number_of_generated_embeddings"]
18
+ precision_to_filename_map = conversion_config["precision_to_filename_map"]
19
+
20
+ sentences_1 = 'How is the weather today?'
21
+ sentences_2 = 'What is the current weather like today?'
22
+
23
+ print(f"Testing on cosine similiarity between sentences: \n'{sentences_1}'\n'{sentences_2}'\n\n\n")
24
+
25
+ tokenizer = transformers.AutoTokenizer.from_pretrained("./", trust_remote_code=True)
26
+ enc1 = tokenizer(sentences_1)
27
+ enc2 = tokenizer(sentences_2)
28
+
29
+ for precision, file_name in precision_to_filename_map.items():
30
+
31
+
32
+ onnx_session = rt.InferenceSession(file_name)
33
+ embeddings_1_onnx = onnx_session.run(None, {"input_ids": [enc1.input_ids],
34
+ "attention_mask": [enc1.attention_mask], "token_type_ids": [enc1.token_type_ids] })[0][0]
35
+
36
+
37
+ embeddings_2_onnx = onnx_session.run(None, {"input_ids": [enc2.input_ids],
38
+ "attention_mask": [enc2.attention_mask], "token_type_ids": [enc2.token_type_ids]})[0][0]
39
+ del onnx_session
40
+ gc.collect()
41
+ print(f'Cosine similiarity for ONNX model with precision "{precision}" is {str(cos_sim(embeddings_1_onnx, embeddings_2_onnx))}')
42
+
43
+
44
+
45
+
46
+ model = SentenceTransformer(model_id, trust_remote_code=True)
47
+ embeddings_1_sentence_transformer = model.encode(sentences_1, normalize_embeddings=True, trust_remote_code=True)
48
+ embeddings_2_sentence_transformer = model.encode(sentences_2, normalize_embeddings=True, trust_remote_code=True)
49
+ print('Cosine similiarity for original sentence transformer model is '+str(cos_sim(embeddings_1_sentence_transformer, embeddings_2_sentence_transformer)))
test_teradata.py ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ import teradataml as tdml
3
+ from tabulate import tabulate
4
+
5
+ import json
6
+
7
+
8
+ with open('conversion_config.json') as json_file:
9
+ conversion_config = json.load(json_file)
10
+
11
+
12
+ model_id = conversion_config["model_id"]
13
+ number_of_generated_embeddings = conversion_config["number_of_generated_embeddings"]
14
+ precision_to_filename_map = conversion_config["precision_to_filename_map"]
15
+
16
+ host = sys.argv[1]
17
+ username = sys.argv[2]
18
+ password = sys.argv[3]
19
+
20
+ print("Setting up connection to teradata...")
21
+ tdml.create_context(host = host, username = username, password = password)
22
+ print("Done\n\n")
23
+
24
+
25
+ print("Deploying tokenizer...")
26
+ try:
27
+ tdml.db_drop_table('tokenizer_table')
28
+ except:
29
+ print("Can't drop tokenizers table - it's not existing")
30
+ tdml.save_byom('tokenizer',
31
+ 'tokenizer.json',
32
+ 'tokenizer_table')
33
+ print("Done\n\n")
34
+
35
+ print("Testing models...")
36
+ try:
37
+ tdml.db_drop_table('model_table')
38
+ except:
39
+ print("Can't drop models table - it's not existing")
40
+
41
+ for precision, file_name in precision_to_filename_map.items():
42
+ print(f"Deploying {precision} model...")
43
+ tdml.save_byom(precision,
44
+ file_name,
45
+ 'model_table')
46
+ print(f"Model {precision} is deployed\n")
47
+
48
+ print(f"Calculating embeddings with {precision} model...")
49
+ try:
50
+ tdml.db_drop_table('emails_embeddings_store')
51
+ except:
52
+ print("Can't drop embeddings table - it's not existing")
53
+
54
+ tdml.execute_sql(f"""
55
+ create volatile table emails_embeddings_store as (
56
+ select
57
+ *
58
+ from mldb.ONNXEmbeddings(
59
+ on emails.emails as InputTable
60
+ on (select * from model_table where model_id = '{precision}') as ModelTable DIMENSION
61
+ on (select model as tokenizer from tokenizer_table where model_id = 'tokenizer') as TokenizerTable DIMENSION
62
+
63
+ using
64
+ Accumulate('id', 'txt')
65
+ ModelOutputTensor('sentence_embedding')
66
+ EnableMemoryCheck('false')
67
+ OutputFormat('FLOAT32({number_of_generated_embeddings})')
68
+ OverwriteCachedModel('true')
69
+ ) a
70
+ ) with data on commit preserve rows
71
+
72
+ """)
73
+ print("Embeddings calculated")
74
+ print(f"Testing semantic search with cosine similiarity on the output of the model with precision '{precision}'...")
75
+ tdf_embeddings_store = tdml.DataFrame('emails_embeddings_store')
76
+ tdf_embeddings_store_tgt = tdf_embeddings_store[tdf_embeddings_store.id == 3]
77
+
78
+ tdf_embeddings_store_ref = tdf_embeddings_store[tdf_embeddings_store.id != 3]
79
+
80
+ cos_sim_pd = tdml.DataFrame.from_query(f"""
81
+ SELECT
82
+ dt.target_id,
83
+ dt.reference_id,
84
+ e_tgt.txt as target_txt,
85
+ e_ref.txt as reference_txt,
86
+ (1.0 - dt.distance) as similiarity
87
+ FROM
88
+ TD_VECTORDISTANCE (
89
+ ON ({tdf_embeddings_store_tgt.show_query()}) AS TargetTable
90
+ ON ({tdf_embeddings_store_ref.show_query()}) AS ReferenceTable DIMENSION
91
+ USING
92
+ TargetIDColumn('id')
93
+ TargetFeatureColumns('[emb_0:emb_{number_of_generated_embeddings - 1}]')
94
+ RefIDColumn('id')
95
+ RefFeatureColumns('[emb_0:emb_{number_of_generated_embeddings - 1}]')
96
+ DistanceMeasure('cosine')
97
+ topk(3)
98
+ ) AS dt
99
+ JOIN emails.emails e_tgt on e_tgt.id = dt.target_id
100
+ JOIN emails.emails e_ref on e_ref.id = dt.reference_id;
101
+ """).to_pandas()
102
+ print(tabulate(cos_sim_pd, headers='keys', tablefmt='fancy_grid'))
103
+ print("Done\n\n")
104
+
105
+
106
+ tdml.remove_context()
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 2147483648,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff