martinhillebrandtd commited on
Commit
329d98d
·
1 Parent(s): 77250d2
.gitattributes CHANGED
@@ -7,6 +7,7 @@
7
  *.gz filter=lfs diff=lfs merge=lfs -text
8
  *.h5 filter=lfs diff=lfs merge=lfs -text
9
  *.joblib filter=lfs diff=lfs merge=lfs -text
 
10
  *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
  *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
  *.model filter=lfs diff=lfs merge=lfs -text
 
7
  *.gz filter=lfs diff=lfs merge=lfs -text
8
  *.h5 filter=lfs diff=lfs merge=lfs -text
9
  *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.json filter=lfs diff=lfs merge=lfs -text
11
  *.lfs.* filter=lfs diff=lfs merge=lfs -text
12
  *.mlmodel filter=lfs diff=lfs merge=lfs -text
13
  *.model filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,2764 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - allenai/c4
4
+ language:
5
+ - en
6
+ library_name: transformers
7
+ license: apache-2.0
8
+ model-index:
9
+ - name: gte-large-en-v1.5
10
+ results:
11
+ - dataset:
12
+ config: en
13
+ name: MTEB AmazonCounterfactualClassification (en)
14
+ revision: e8379541af4e31359cca9fbcf4b00f2671dba205
15
+ split: test
16
+ type: mteb/amazon_counterfactual
17
+ metrics:
18
+ - type: accuracy
19
+ value: 73.01492537313432
20
+ - type: ap
21
+ value: 35.05341696659522
22
+ - type: f1
23
+ value: 66.71270310883853
24
+ task:
25
+ type: Classification
26
+ - dataset:
27
+ config: default
28
+ name: MTEB AmazonPolarityClassification
29
+ revision: e2d317d38cd51312af73b3d32a06d1a08b442046
30
+ split: test
31
+ type: mteb/amazon_polarity
32
+ metrics:
33
+ - type: accuracy
34
+ value: 93.97189999999999
35
+ - type: ap
36
+ value: 90.5952493948908
37
+ - type: f1
38
+ value: 93.95848137716877
39
+ task:
40
+ type: Classification
41
+ - dataset:
42
+ config: en
43
+ name: MTEB AmazonReviewsClassification (en)
44
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
45
+ split: test
46
+ type: mteb/amazon_reviews_multi
47
+ metrics:
48
+ - type: accuracy
49
+ value: 54.196
50
+ - type: f1
51
+ value: 53.80122334012787
52
+ task:
53
+ type: Classification
54
+ - dataset:
55
+ config: default
56
+ name: MTEB ArguAna
57
+ revision: c22ab2a51041ffd869aaddef7af8d8215647e41a
58
+ split: test
59
+ type: mteb/arguana
60
+ metrics:
61
+ - type: map_at_1
62
+ value: 47.297
63
+ - type: map_at_10
64
+ value: 64.303
65
+ - type: map_at_100
66
+ value: 64.541
67
+ - type: map_at_1000
68
+ value: 64.541
69
+ - type: map_at_3
70
+ value: 60.728
71
+ - type: map_at_5
72
+ value: 63.114000000000004
73
+ - type: mrr_at_1
74
+ value: 48.435
75
+ - type: mrr_at_10
76
+ value: 64.657
77
+ - type: mrr_at_100
78
+ value: 64.901
79
+ - type: mrr_at_1000
80
+ value: 64.901
81
+ - type: mrr_at_3
82
+ value: 61.06
83
+ - type: mrr_at_5
84
+ value: 63.514
85
+ - type: ndcg_at_1
86
+ value: 47.297
87
+ - type: ndcg_at_10
88
+ value: 72.107
89
+ - type: ndcg_at_100
90
+ value: 72.963
91
+ - type: ndcg_at_1000
92
+ value: 72.963
93
+ - type: ndcg_at_3
94
+ value: 65.063
95
+ - type: ndcg_at_5
96
+ value: 69.352
97
+ - type: precision_at_1
98
+ value: 47.297
99
+ - type: precision_at_10
100
+ value: 9.623
101
+ - type: precision_at_100
102
+ value: 0.996
103
+ - type: precision_at_1000
104
+ value: 0.1
105
+ - type: precision_at_3
106
+ value: 25.865
107
+ - type: precision_at_5
108
+ value: 17.596
109
+ - type: recall_at_1
110
+ value: 47.297
111
+ - type: recall_at_10
112
+ value: 96.23
113
+ - type: recall_at_100
114
+ value: 99.644
115
+ - type: recall_at_1000
116
+ value: 99.644
117
+ - type: recall_at_3
118
+ value: 77.596
119
+ - type: recall_at_5
120
+ value: 87.98
121
+ task:
122
+ type: Retrieval
123
+ - dataset:
124
+ config: default
125
+ name: MTEB ArxivClusteringP2P
126
+ revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
127
+ split: test
128
+ type: mteb/arxiv-clustering-p2p
129
+ metrics:
130
+ - type: v_measure
131
+ value: 48.467787861077475
132
+ task:
133
+ type: Clustering
134
+ - dataset:
135
+ config: default
136
+ name: MTEB ArxivClusteringS2S
137
+ revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
138
+ split: test
139
+ type: mteb/arxiv-clustering-s2s
140
+ metrics:
141
+ - type: v_measure
142
+ value: 43.39198391914257
143
+ task:
144
+ type: Clustering
145
+ - dataset:
146
+ config: default
147
+ name: MTEB AskUbuntuDupQuestions
148
+ revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
149
+ split: test
150
+ type: mteb/askubuntudupquestions-reranking
151
+ metrics:
152
+ - type: map
153
+ value: 63.12794820591384
154
+ - type: mrr
155
+ value: 75.9331442641692
156
+ task:
157
+ type: Reranking
158
+ - dataset:
159
+ config: default
160
+ name: MTEB BIOSSES
161
+ revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
162
+ split: test
163
+ type: mteb/biosses-sts
164
+ metrics:
165
+ - type: cos_sim_pearson
166
+ value: 87.85062993863319
167
+ - type: cos_sim_spearman
168
+ value: 85.39049989733459
169
+ - type: euclidean_pearson
170
+ value: 86.00222680278333
171
+ - type: euclidean_spearman
172
+ value: 85.45556162077396
173
+ - type: manhattan_pearson
174
+ value: 85.88769871785621
175
+ - type: manhattan_spearman
176
+ value: 85.11760211290839
177
+ task:
178
+ type: STS
179
+ - dataset:
180
+ config: default
181
+ name: MTEB Banking77Classification
182
+ revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
183
+ split: test
184
+ type: mteb/banking77
185
+ metrics:
186
+ - type: accuracy
187
+ value: 87.32792207792208
188
+ - type: f1
189
+ value: 87.29132945999555
190
+ task:
191
+ type: Classification
192
+ - dataset:
193
+ config: default
194
+ name: MTEB BiorxivClusteringP2P
195
+ revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
196
+ split: test
197
+ type: mteb/biorxiv-clustering-p2p
198
+ metrics:
199
+ - type: v_measure
200
+ value: 40.5779328301945
201
+ task:
202
+ type: Clustering
203
+ - dataset:
204
+ config: default
205
+ name: MTEB BiorxivClusteringS2S
206
+ revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
207
+ split: test
208
+ type: mteb/biorxiv-clustering-s2s
209
+ metrics:
210
+ - type: v_measure
211
+ value: 37.94425623865118
212
+ task:
213
+ type: Clustering
214
+ - dataset:
215
+ config: default
216
+ name: MTEB CQADupstackAndroidRetrieval
217
+ revision: f46a197baaae43b4f621051089b82a364682dfeb
218
+ split: test
219
+ type: mteb/cqadupstack-android
220
+ metrics:
221
+ - type: map_at_1
222
+ value: 32.978
223
+ - type: map_at_10
224
+ value: 44.45
225
+ - type: map_at_100
226
+ value: 46.19
227
+ - type: map_at_1000
228
+ value: 46.303
229
+ - type: map_at_3
230
+ value: 40.849000000000004
231
+ - type: map_at_5
232
+ value: 42.55
233
+ - type: mrr_at_1
234
+ value: 40.629
235
+ - type: mrr_at_10
236
+ value: 50.848000000000006
237
+ - type: mrr_at_100
238
+ value: 51.669
239
+ - type: mrr_at_1000
240
+ value: 51.705
241
+ - type: mrr_at_3
242
+ value: 47.997
243
+ - type: mrr_at_5
244
+ value: 49.506
245
+ - type: ndcg_at_1
246
+ value: 40.629
247
+ - type: ndcg_at_10
248
+ value: 51.102000000000004
249
+ - type: ndcg_at_100
250
+ value: 57.159000000000006
251
+ - type: ndcg_at_1000
252
+ value: 58.669000000000004
253
+ - type: ndcg_at_3
254
+ value: 45.738
255
+ - type: ndcg_at_5
256
+ value: 47.632999999999996
257
+ - type: precision_at_1
258
+ value: 40.629
259
+ - type: precision_at_10
260
+ value: 9.700000000000001
261
+ - type: precision_at_100
262
+ value: 1.5970000000000002
263
+ - type: precision_at_1000
264
+ value: 0.202
265
+ - type: precision_at_3
266
+ value: 21.698
267
+ - type: precision_at_5
268
+ value: 15.393
269
+ - type: recall_at_1
270
+ value: 32.978
271
+ - type: recall_at_10
272
+ value: 63.711
273
+ - type: recall_at_100
274
+ value: 88.39399999999999
275
+ - type: recall_at_1000
276
+ value: 97.513
277
+ - type: recall_at_3
278
+ value: 48.025
279
+ - type: recall_at_5
280
+ value: 53.52
281
+ task:
282
+ type: Retrieval
283
+ - dataset:
284
+ config: default
285
+ name: MTEB CQADupstackEnglishRetrieval
286
+ revision: ad9991cb51e31e31e430383c75ffb2885547b5f0
287
+ split: test
288
+ type: mteb/cqadupstack-english
289
+ metrics:
290
+ - type: map_at_1
291
+ value: 30.767
292
+ - type: map_at_10
293
+ value: 42.195
294
+ - type: map_at_100
295
+ value: 43.541999999999994
296
+ - type: map_at_1000
297
+ value: 43.673
298
+ - type: map_at_3
299
+ value: 38.561
300
+ - type: map_at_5
301
+ value: 40.532000000000004
302
+ - type: mrr_at_1
303
+ value: 38.79
304
+ - type: mrr_at_10
305
+ value: 48.021
306
+ - type: mrr_at_100
307
+ value: 48.735
308
+ - type: mrr_at_1000
309
+ value: 48.776
310
+ - type: mrr_at_3
311
+ value: 45.594
312
+ - type: mrr_at_5
313
+ value: 46.986
314
+ - type: ndcg_at_1
315
+ value: 38.79
316
+ - type: ndcg_at_10
317
+ value: 48.468
318
+ - type: ndcg_at_100
319
+ value: 53.037
320
+ - type: ndcg_at_1000
321
+ value: 55.001999999999995
322
+ - type: ndcg_at_3
323
+ value: 43.409
324
+ - type: ndcg_at_5
325
+ value: 45.654
326
+ - type: precision_at_1
327
+ value: 38.79
328
+ - type: precision_at_10
329
+ value: 9.452
330
+ - type: precision_at_100
331
+ value: 1.518
332
+ - type: precision_at_1000
333
+ value: 0.201
334
+ - type: precision_at_3
335
+ value: 21.21
336
+ - type: precision_at_5
337
+ value: 15.171999999999999
338
+ - type: recall_at_1
339
+ value: 30.767
340
+ - type: recall_at_10
341
+ value: 60.118
342
+ - type: recall_at_100
343
+ value: 79.271
344
+ - type: recall_at_1000
345
+ value: 91.43299999999999
346
+ - type: recall_at_3
347
+ value: 45.36
348
+ - type: recall_at_5
349
+ value: 51.705
350
+ task:
351
+ type: Retrieval
352
+ - dataset:
353
+ config: default
354
+ name: MTEB CQADupstackGamingRetrieval
355
+ revision: 4885aa143210c98657558c04aaf3dc47cfb54340
356
+ split: test
357
+ type: mteb/cqadupstack-gaming
358
+ metrics:
359
+ - type: map_at_1
360
+ value: 40.007
361
+ - type: map_at_10
362
+ value: 53.529
363
+ - type: map_at_100
364
+ value: 54.602
365
+ - type: map_at_1000
366
+ value: 54.647
367
+ - type: map_at_3
368
+ value: 49.951
369
+ - type: map_at_5
370
+ value: 52.066
371
+ - type: mrr_at_1
372
+ value: 45.705
373
+ - type: mrr_at_10
374
+ value: 56.745000000000005
375
+ - type: mrr_at_100
376
+ value: 57.43899999999999
377
+ - type: mrr_at_1000
378
+ value: 57.462999999999994
379
+ - type: mrr_at_3
380
+ value: 54.25299999999999
381
+ - type: mrr_at_5
382
+ value: 55.842000000000006
383
+ - type: ndcg_at_1
384
+ value: 45.705
385
+ - type: ndcg_at_10
386
+ value: 59.809
387
+ - type: ndcg_at_100
388
+ value: 63.837999999999994
389
+ - type: ndcg_at_1000
390
+ value: 64.729
391
+ - type: ndcg_at_3
392
+ value: 53.994
393
+ - type: ndcg_at_5
394
+ value: 57.028
395
+ - type: precision_at_1
396
+ value: 45.705
397
+ - type: precision_at_10
398
+ value: 9.762
399
+ - type: precision_at_100
400
+ value: 1.275
401
+ - type: precision_at_1000
402
+ value: 0.13899999999999998
403
+ - type: precision_at_3
404
+ value: 24.368000000000002
405
+ - type: precision_at_5
406
+ value: 16.84
407
+ - type: recall_at_1
408
+ value: 40.007
409
+ - type: recall_at_10
410
+ value: 75.017
411
+ - type: recall_at_100
412
+ value: 91.99000000000001
413
+ - type: recall_at_1000
414
+ value: 98.265
415
+ - type: recall_at_3
416
+ value: 59.704
417
+ - type: recall_at_5
418
+ value: 67.109
419
+ task:
420
+ type: Retrieval
421
+ - dataset:
422
+ config: default
423
+ name: MTEB CQADupstackGisRetrieval
424
+ revision: 5003b3064772da1887988e05400cf3806fe491f2
425
+ split: test
426
+ type: mteb/cqadupstack-gis
427
+ metrics:
428
+ - type: map_at_1
429
+ value: 26.639000000000003
430
+ - type: map_at_10
431
+ value: 35.926
432
+ - type: map_at_100
433
+ value: 37.126999999999995
434
+ - type: map_at_1000
435
+ value: 37.202
436
+ - type: map_at_3
437
+ value: 32.989000000000004
438
+ - type: map_at_5
439
+ value: 34.465
440
+ - type: mrr_at_1
441
+ value: 28.475
442
+ - type: mrr_at_10
443
+ value: 37.7
444
+ - type: mrr_at_100
445
+ value: 38.753
446
+ - type: mrr_at_1000
447
+ value: 38.807
448
+ - type: mrr_at_3
449
+ value: 35.066
450
+ - type: mrr_at_5
451
+ value: 36.512
452
+ - type: ndcg_at_1
453
+ value: 28.475
454
+ - type: ndcg_at_10
455
+ value: 41.245
456
+ - type: ndcg_at_100
457
+ value: 46.814
458
+ - type: ndcg_at_1000
459
+ value: 48.571
460
+ - type: ndcg_at_3
461
+ value: 35.528999999999996
462
+ - type: ndcg_at_5
463
+ value: 38.066
464
+ - type: precision_at_1
465
+ value: 28.475
466
+ - type: precision_at_10
467
+ value: 6.497
468
+ - type: precision_at_100
469
+ value: 0.9650000000000001
470
+ - type: precision_at_1000
471
+ value: 0.11499999999999999
472
+ - type: precision_at_3
473
+ value: 15.065999999999999
474
+ - type: precision_at_5
475
+ value: 10.599
476
+ - type: recall_at_1
477
+ value: 26.639000000000003
478
+ - type: recall_at_10
479
+ value: 55.759
480
+ - type: recall_at_100
481
+ value: 80.913
482
+ - type: recall_at_1000
483
+ value: 93.929
484
+ - type: recall_at_3
485
+ value: 40.454
486
+ - type: recall_at_5
487
+ value: 46.439
488
+ task:
489
+ type: Retrieval
490
+ - dataset:
491
+ config: default
492
+ name: MTEB CQADupstackMathematicaRetrieval
493
+ revision: 90fceea13679c63fe563ded68f3b6f06e50061de
494
+ split: test
495
+ type: mteb/cqadupstack-mathematica
496
+ metrics:
497
+ - type: map_at_1
498
+ value: 15.767999999999999
499
+ - type: map_at_10
500
+ value: 24.811
501
+ - type: map_at_100
502
+ value: 26.064999999999998
503
+ - type: map_at_1000
504
+ value: 26.186999999999998
505
+ - type: map_at_3
506
+ value: 21.736
507
+ - type: map_at_5
508
+ value: 23.283
509
+ - type: mrr_at_1
510
+ value: 19.527
511
+ - type: mrr_at_10
512
+ value: 29.179
513
+ - type: mrr_at_100
514
+ value: 30.153999999999996
515
+ - type: mrr_at_1000
516
+ value: 30.215999999999998
517
+ - type: mrr_at_3
518
+ value: 26.223000000000003
519
+ - type: mrr_at_5
520
+ value: 27.733999999999998
521
+ - type: ndcg_at_1
522
+ value: 19.527
523
+ - type: ndcg_at_10
524
+ value: 30.786
525
+ - type: ndcg_at_100
526
+ value: 36.644
527
+ - type: ndcg_at_1000
528
+ value: 39.440999999999995
529
+ - type: ndcg_at_3
530
+ value: 24.958
531
+ - type: ndcg_at_5
532
+ value: 27.392
533
+ - type: precision_at_1
534
+ value: 19.527
535
+ - type: precision_at_10
536
+ value: 5.995
537
+ - type: precision_at_100
538
+ value: 1.03
539
+ - type: precision_at_1000
540
+ value: 0.14100000000000001
541
+ - type: precision_at_3
542
+ value: 12.520999999999999
543
+ - type: precision_at_5
544
+ value: 9.129
545
+ - type: recall_at_1
546
+ value: 15.767999999999999
547
+ - type: recall_at_10
548
+ value: 44.824000000000005
549
+ - type: recall_at_100
550
+ value: 70.186
551
+ - type: recall_at_1000
552
+ value: 89.934
553
+ - type: recall_at_3
554
+ value: 28.607
555
+ - type: recall_at_5
556
+ value: 34.836
557
+ task:
558
+ type: Retrieval
559
+ - dataset:
560
+ config: default
561
+ name: MTEB CQADupstackPhysicsRetrieval
562
+ revision: 79531abbd1fb92d06c6d6315a0cbbbf5bb247ea4
563
+ split: test
564
+ type: mteb/cqadupstack-physics
565
+ metrics:
566
+ - type: map_at_1
567
+ value: 31.952
568
+ - type: map_at_10
569
+ value: 44.438
570
+ - type: map_at_100
571
+ value: 45.778
572
+ - type: map_at_1000
573
+ value: 45.883
574
+ - type: map_at_3
575
+ value: 41.044000000000004
576
+ - type: map_at_5
577
+ value: 42.986000000000004
578
+ - type: mrr_at_1
579
+ value: 39.172000000000004
580
+ - type: mrr_at_10
581
+ value: 49.76
582
+ - type: mrr_at_100
583
+ value: 50.583999999999996
584
+ - type: mrr_at_1000
585
+ value: 50.621
586
+ - type: mrr_at_3
587
+ value: 47.353
588
+ - type: mrr_at_5
589
+ value: 48.739
590
+ - type: ndcg_at_1
591
+ value: 39.172000000000004
592
+ - type: ndcg_at_10
593
+ value: 50.760000000000005
594
+ - type: ndcg_at_100
595
+ value: 56.084
596
+ - type: ndcg_at_1000
597
+ value: 57.865
598
+ - type: ndcg_at_3
599
+ value: 45.663
600
+ - type: ndcg_at_5
601
+ value: 48.178
602
+ - type: precision_at_1
603
+ value: 39.172000000000004
604
+ - type: precision_at_10
605
+ value: 9.22
606
+ - type: precision_at_100
607
+ value: 1.387
608
+ - type: precision_at_1000
609
+ value: 0.17099999999999999
610
+ - type: precision_at_3
611
+ value: 21.976000000000003
612
+ - type: precision_at_5
613
+ value: 15.457
614
+ - type: recall_at_1
615
+ value: 31.952
616
+ - type: recall_at_10
617
+ value: 63.900999999999996
618
+ - type: recall_at_100
619
+ value: 85.676
620
+ - type: recall_at_1000
621
+ value: 97.03699999999999
622
+ - type: recall_at_3
623
+ value: 49.781
624
+ - type: recall_at_5
625
+ value: 56.330000000000005
626
+ task:
627
+ type: Retrieval
628
+ - dataset:
629
+ config: default
630
+ name: MTEB CQADupstackProgrammersRetrieval
631
+ revision: 6184bc1440d2dbc7612be22b50686b8826d22b32
632
+ split: test
633
+ type: mteb/cqadupstack-programmers
634
+ metrics:
635
+ - type: map_at_1
636
+ value: 25.332
637
+ - type: map_at_10
638
+ value: 36.874
639
+ - type: map_at_100
640
+ value: 38.340999999999994
641
+ - type: map_at_1000
642
+ value: 38.452
643
+ - type: map_at_3
644
+ value: 33.068
645
+ - type: map_at_5
646
+ value: 35.324
647
+ - type: mrr_at_1
648
+ value: 30.822
649
+ - type: mrr_at_10
650
+ value: 41.641
651
+ - type: mrr_at_100
652
+ value: 42.519
653
+ - type: mrr_at_1000
654
+ value: 42.573
655
+ - type: mrr_at_3
656
+ value: 38.413000000000004
657
+ - type: mrr_at_5
658
+ value: 40.542
659
+ - type: ndcg_at_1
660
+ value: 30.822
661
+ - type: ndcg_at_10
662
+ value: 43.414
663
+ - type: ndcg_at_100
664
+ value: 49.196
665
+ - type: ndcg_at_1000
666
+ value: 51.237
667
+ - type: ndcg_at_3
668
+ value: 37.230000000000004
669
+ - type: ndcg_at_5
670
+ value: 40.405
671
+ - type: precision_at_1
672
+ value: 30.822
673
+ - type: precision_at_10
674
+ value: 8.379
675
+ - type: precision_at_100
676
+ value: 1.315
677
+ - type: precision_at_1000
678
+ value: 0.168
679
+ - type: precision_at_3
680
+ value: 18.417
681
+ - type: precision_at_5
682
+ value: 13.744
683
+ - type: recall_at_1
684
+ value: 25.332
685
+ - type: recall_at_10
686
+ value: 57.774
687
+ - type: recall_at_100
688
+ value: 82.071
689
+ - type: recall_at_1000
690
+ value: 95.60600000000001
691
+ - type: recall_at_3
692
+ value: 40.722
693
+ - type: recall_at_5
694
+ value: 48.754999999999995
695
+ task:
696
+ type: Retrieval
697
+ - dataset:
698
+ config: default
699
+ name: MTEB CQADupstackRetrieval
700
+ revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4
701
+ split: test
702
+ type: mteb/cqadupstack
703
+ metrics:
704
+ - type: map_at_1
705
+ value: 25.91033333333334
706
+ - type: map_at_10
707
+ value: 36.23225000000001
708
+ - type: map_at_100
709
+ value: 37.55766666666667
710
+ - type: map_at_1000
711
+ value: 37.672583333333336
712
+ - type: map_at_3
713
+ value: 32.95666666666667
714
+ - type: map_at_5
715
+ value: 34.73375
716
+ - type: mrr_at_1
717
+ value: 30.634
718
+ - type: mrr_at_10
719
+ value: 40.19449999999999
720
+ - type: mrr_at_100
721
+ value: 41.099250000000005
722
+ - type: mrr_at_1000
723
+ value: 41.15091666666667
724
+ - type: mrr_at_3
725
+ value: 37.4615
726
+ - type: mrr_at_5
727
+ value: 39.00216666666667
728
+ - type: ndcg_at_1
729
+ value: 30.634
730
+ - type: ndcg_at_10
731
+ value: 42.162166666666664
732
+ - type: ndcg_at_100
733
+ value: 47.60708333333333
734
+ - type: ndcg_at_1000
735
+ value: 49.68616666666666
736
+ - type: ndcg_at_3
737
+ value: 36.60316666666666
738
+ - type: ndcg_at_5
739
+ value: 39.15616666666668
740
+ - type: precision_at_1
741
+ value: 30.634
742
+ - type: precision_at_10
743
+ value: 7.6193333333333335
744
+ - type: precision_at_100
745
+ value: 1.2198333333333333
746
+ - type: precision_at_1000
747
+ value: 0.15975000000000003
748
+ - type: precision_at_3
749
+ value: 17.087
750
+ - type: precision_at_5
751
+ value: 12.298333333333334
752
+ - type: recall_at_1
753
+ value: 25.91033333333334
754
+ - type: recall_at_10
755
+ value: 55.67300000000001
756
+ - type: recall_at_100
757
+ value: 79.20608333333334
758
+ - type: recall_at_1000
759
+ value: 93.34866666666667
760
+ - type: recall_at_3
761
+ value: 40.34858333333333
762
+ - type: recall_at_5
763
+ value: 46.834083333333325
764
+ task:
765
+ type: Retrieval
766
+ - dataset:
767
+ config: default
768
+ name: MTEB CQADupstackStatsRetrieval
769
+ revision: 65ac3a16b8e91f9cee4c9828cc7c335575432a2a
770
+ split: test
771
+ type: mteb/cqadupstack-stats
772
+ metrics:
773
+ - type: map_at_1
774
+ value: 25.006
775
+ - type: map_at_10
776
+ value: 32.177
777
+ - type: map_at_100
778
+ value: 33.324999999999996
779
+ - type: map_at_1000
780
+ value: 33.419
781
+ - type: map_at_3
782
+ value: 29.952
783
+ - type: map_at_5
784
+ value: 31.095
785
+ - type: mrr_at_1
786
+ value: 28.066999999999997
787
+ - type: mrr_at_10
788
+ value: 34.995
789
+ - type: mrr_at_100
790
+ value: 35.978
791
+ - type: mrr_at_1000
792
+ value: 36.042
793
+ - type: mrr_at_3
794
+ value: 33.103
795
+ - type: mrr_at_5
796
+ value: 34.001
797
+ - type: ndcg_at_1
798
+ value: 28.066999999999997
799
+ - type: ndcg_at_10
800
+ value: 36.481
801
+ - type: ndcg_at_100
802
+ value: 42.022999999999996
803
+ - type: ndcg_at_1000
804
+ value: 44.377
805
+ - type: ndcg_at_3
806
+ value: 32.394
807
+ - type: ndcg_at_5
808
+ value: 34.108
809
+ - type: precision_at_1
810
+ value: 28.066999999999997
811
+ - type: precision_at_10
812
+ value: 5.736
813
+ - type: precision_at_100
814
+ value: 0.9259999999999999
815
+ - type: precision_at_1000
816
+ value: 0.12
817
+ - type: precision_at_3
818
+ value: 13.804
819
+ - type: precision_at_5
820
+ value: 9.508999999999999
821
+ - type: recall_at_1
822
+ value: 25.006
823
+ - type: recall_at_10
824
+ value: 46.972
825
+ - type: recall_at_100
826
+ value: 72.138
827
+ - type: recall_at_1000
828
+ value: 89.479
829
+ - type: recall_at_3
830
+ value: 35.793
831
+ - type: recall_at_5
832
+ value: 39.947
833
+ task:
834
+ type: Retrieval
835
+ - dataset:
836
+ config: default
837
+ name: MTEB CQADupstackTexRetrieval
838
+ revision: 46989137a86843e03a6195de44b09deda022eec7
839
+ split: test
840
+ type: mteb/cqadupstack-tex
841
+ metrics:
842
+ - type: map_at_1
843
+ value: 16.07
844
+ - type: map_at_10
845
+ value: 24.447
846
+ - type: map_at_100
847
+ value: 25.685999999999996
848
+ - type: map_at_1000
849
+ value: 25.813999999999997
850
+ - type: map_at_3
851
+ value: 21.634
852
+ - type: map_at_5
853
+ value: 23.133
854
+ - type: mrr_at_1
855
+ value: 19.580000000000002
856
+ - type: mrr_at_10
857
+ value: 28.127999999999997
858
+ - type: mrr_at_100
859
+ value: 29.119
860
+ - type: mrr_at_1000
861
+ value: 29.192
862
+ - type: mrr_at_3
863
+ value: 25.509999999999998
864
+ - type: mrr_at_5
865
+ value: 26.878
866
+ - type: ndcg_at_1
867
+ value: 19.580000000000002
868
+ - type: ndcg_at_10
869
+ value: 29.804000000000002
870
+ - type: ndcg_at_100
871
+ value: 35.555
872
+ - type: ndcg_at_1000
873
+ value: 38.421
874
+ - type: ndcg_at_3
875
+ value: 24.654999999999998
876
+ - type: ndcg_at_5
877
+ value: 26.881
878
+ - type: precision_at_1
879
+ value: 19.580000000000002
880
+ - type: precision_at_10
881
+ value: 5.736
882
+ - type: precision_at_100
883
+ value: 1.005
884
+ - type: precision_at_1000
885
+ value: 0.145
886
+ - type: precision_at_3
887
+ value: 12.033000000000001
888
+ - type: precision_at_5
889
+ value: 8.871
890
+ - type: recall_at_1
891
+ value: 16.07
892
+ - type: recall_at_10
893
+ value: 42.364000000000004
894
+ - type: recall_at_100
895
+ value: 68.01899999999999
896
+ - type: recall_at_1000
897
+ value: 88.122
898
+ - type: recall_at_3
899
+ value: 27.846
900
+ - type: recall_at_5
901
+ value: 33.638
902
+ task:
903
+ type: Retrieval
904
+ - dataset:
905
+ config: default
906
+ name: MTEB CQADupstackUnixRetrieval
907
+ revision: 6c6430d3a6d36f8d2a829195bc5dc94d7e063e53
908
+ split: test
909
+ type: mteb/cqadupstack-unix
910
+ metrics:
911
+ - type: map_at_1
912
+ value: 26.365
913
+ - type: map_at_10
914
+ value: 36.591
915
+ - type: map_at_100
916
+ value: 37.730000000000004
917
+ - type: map_at_1000
918
+ value: 37.84
919
+ - type: map_at_3
920
+ value: 33.403
921
+ - type: map_at_5
922
+ value: 35.272999999999996
923
+ - type: mrr_at_1
924
+ value: 30.503999999999998
925
+ - type: mrr_at_10
926
+ value: 39.940999999999995
927
+ - type: mrr_at_100
928
+ value: 40.818
929
+ - type: mrr_at_1000
930
+ value: 40.876000000000005
931
+ - type: mrr_at_3
932
+ value: 37.065
933
+ - type: mrr_at_5
934
+ value: 38.814
935
+ - type: ndcg_at_1
936
+ value: 30.503999999999998
937
+ - type: ndcg_at_10
938
+ value: 42.185
939
+ - type: ndcg_at_100
940
+ value: 47.416000000000004
941
+ - type: ndcg_at_1000
942
+ value: 49.705
943
+ - type: ndcg_at_3
944
+ value: 36.568
945
+ - type: ndcg_at_5
946
+ value: 39.416000000000004
947
+ - type: precision_at_1
948
+ value: 30.503999999999998
949
+ - type: precision_at_10
950
+ value: 7.276000000000001
951
+ - type: precision_at_100
952
+ value: 1.118
953
+ - type: precision_at_1000
954
+ value: 0.14300000000000002
955
+ - type: precision_at_3
956
+ value: 16.729
957
+ - type: precision_at_5
958
+ value: 12.107999999999999
959
+ - type: recall_at_1
960
+ value: 26.365
961
+ - type: recall_at_10
962
+ value: 55.616
963
+ - type: recall_at_100
964
+ value: 78.129
965
+ - type: recall_at_1000
966
+ value: 93.95599999999999
967
+ - type: recall_at_3
968
+ value: 40.686
969
+ - type: recall_at_5
970
+ value: 47.668
971
+ task:
972
+ type: Retrieval
973
+ - dataset:
974
+ config: default
975
+ name: MTEB CQADupstackWebmastersRetrieval
976
+ revision: 160c094312a0e1facb97e55eeddb698c0abe3571
977
+ split: test
978
+ type: mteb/cqadupstack-webmasters
979
+ metrics:
980
+ - type: map_at_1
981
+ value: 22.750999999999998
982
+ - type: map_at_10
983
+ value: 33.446
984
+ - type: map_at_100
985
+ value: 35.235
986
+ - type: map_at_1000
987
+ value: 35.478
988
+ - type: map_at_3
989
+ value: 29.358
990
+ - type: map_at_5
991
+ value: 31.525
992
+ - type: mrr_at_1
993
+ value: 27.668
994
+ - type: mrr_at_10
995
+ value: 37.694
996
+ - type: mrr_at_100
997
+ value: 38.732
998
+ - type: mrr_at_1000
999
+ value: 38.779
1000
+ - type: mrr_at_3
1001
+ value: 34.223
1002
+ - type: mrr_at_5
1003
+ value: 36.08
1004
+ - type: ndcg_at_1
1005
+ value: 27.668
1006
+ - type: ndcg_at_10
1007
+ value: 40.557
1008
+ - type: ndcg_at_100
1009
+ value: 46.605999999999995
1010
+ - type: ndcg_at_1000
1011
+ value: 48.917
1012
+ - type: ndcg_at_3
1013
+ value: 33.677
1014
+ - type: ndcg_at_5
1015
+ value: 36.85
1016
+ - type: precision_at_1
1017
+ value: 27.668
1018
+ - type: precision_at_10
1019
+ value: 8.3
1020
+ - type: precision_at_100
1021
+ value: 1.6260000000000001
1022
+ - type: precision_at_1000
1023
+ value: 0.253
1024
+ - type: precision_at_3
1025
+ value: 16.008
1026
+ - type: precision_at_5
1027
+ value: 12.292
1028
+ - type: recall_at_1
1029
+ value: 22.750999999999998
1030
+ - type: recall_at_10
1031
+ value: 55.643
1032
+ - type: recall_at_100
1033
+ value: 82.151
1034
+ - type: recall_at_1000
1035
+ value: 95.963
1036
+ - type: recall_at_3
1037
+ value: 36.623
1038
+ - type: recall_at_5
1039
+ value: 44.708
1040
+ task:
1041
+ type: Retrieval
1042
+ - dataset:
1043
+ config: default
1044
+ name: MTEB CQADupstackWordpressRetrieval
1045
+ revision: 4ffe81d471b1924886b33c7567bfb200e9eec5c4
1046
+ split: test
1047
+ type: mteb/cqadupstack-wordpress
1048
+ metrics:
1049
+ - type: map_at_1
1050
+ value: 17.288999999999998
1051
+ - type: map_at_10
1052
+ value: 25.903
1053
+ - type: map_at_100
1054
+ value: 27.071
1055
+ - type: map_at_1000
1056
+ value: 27.173000000000002
1057
+ - type: map_at_3
1058
+ value: 22.935
1059
+ - type: map_at_5
1060
+ value: 24.573
1061
+ - type: mrr_at_1
1062
+ value: 18.669
1063
+ - type: mrr_at_10
1064
+ value: 27.682000000000002
1065
+ - type: mrr_at_100
1066
+ value: 28.691
1067
+ - type: mrr_at_1000
1068
+ value: 28.761
1069
+ - type: mrr_at_3
1070
+ value: 24.738
1071
+ - type: mrr_at_5
1072
+ value: 26.392
1073
+ - type: ndcg_at_1
1074
+ value: 18.669
1075
+ - type: ndcg_at_10
1076
+ value: 31.335
1077
+ - type: ndcg_at_100
1078
+ value: 36.913000000000004
1079
+ - type: ndcg_at_1000
1080
+ value: 39.300000000000004
1081
+ - type: ndcg_at_3
1082
+ value: 25.423000000000002
1083
+ - type: ndcg_at_5
1084
+ value: 28.262999999999998
1085
+ - type: precision_at_1
1086
+ value: 18.669
1087
+ - type: precision_at_10
1088
+ value: 5.379
1089
+ - type: precision_at_100
1090
+ value: 0.876
1091
+ - type: precision_at_1000
1092
+ value: 0.11900000000000001
1093
+ - type: precision_at_3
1094
+ value: 11.214
1095
+ - type: precision_at_5
1096
+ value: 8.466
1097
+ - type: recall_at_1
1098
+ value: 17.288999999999998
1099
+ - type: recall_at_10
1100
+ value: 46.377
1101
+ - type: recall_at_100
1102
+ value: 71.53500000000001
1103
+ - type: recall_at_1000
1104
+ value: 88.947
1105
+ - type: recall_at_3
1106
+ value: 30.581999999999997
1107
+ - type: recall_at_5
1108
+ value: 37.354
1109
+ task:
1110
+ type: Retrieval
1111
+ - dataset:
1112
+ config: default
1113
+ name: MTEB ClimateFEVER
1114
+ revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380
1115
+ split: test
1116
+ type: mteb/climate-fever
1117
+ metrics:
1118
+ - type: map_at_1
1119
+ value: 21.795
1120
+ - type: map_at_10
1121
+ value: 37.614999999999995
1122
+ - type: map_at_100
1123
+ value: 40.037
1124
+ - type: map_at_1000
1125
+ value: 40.184999999999995
1126
+ - type: map_at_3
1127
+ value: 32.221
1128
+ - type: map_at_5
1129
+ value: 35.154999999999994
1130
+ - type: mrr_at_1
1131
+ value: 50.358000000000004
1132
+ - type: mrr_at_10
1133
+ value: 62.129
1134
+ - type: mrr_at_100
1135
+ value: 62.613
1136
+ - type: mrr_at_1000
1137
+ value: 62.62
1138
+ - type: mrr_at_3
1139
+ value: 59.272999999999996
1140
+ - type: mrr_at_5
1141
+ value: 61.138999999999996
1142
+ - type: ndcg_at_1
1143
+ value: 50.358000000000004
1144
+ - type: ndcg_at_10
1145
+ value: 48.362
1146
+ - type: ndcg_at_100
1147
+ value: 55.932
1148
+ - type: ndcg_at_1000
1149
+ value: 58.062999999999995
1150
+ - type: ndcg_at_3
1151
+ value: 42.111
1152
+ - type: ndcg_at_5
1153
+ value: 44.063
1154
+ - type: precision_at_1
1155
+ value: 50.358000000000004
1156
+ - type: precision_at_10
1157
+ value: 14.677999999999999
1158
+ - type: precision_at_100
1159
+ value: 2.2950000000000004
1160
+ - type: precision_at_1000
1161
+ value: 0.271
1162
+ - type: precision_at_3
1163
+ value: 31.77
1164
+ - type: precision_at_5
1165
+ value: 23.375
1166
+ - type: recall_at_1
1167
+ value: 21.795
1168
+ - type: recall_at_10
1169
+ value: 53.846000000000004
1170
+ - type: recall_at_100
1171
+ value: 78.952
1172
+ - type: recall_at_1000
1173
+ value: 90.41900000000001
1174
+ - type: recall_at_3
1175
+ value: 37.257
1176
+ - type: recall_at_5
1177
+ value: 44.661
1178
+ task:
1179
+ type: Retrieval
1180
+ - dataset:
1181
+ config: default
1182
+ name: MTEB DBPedia
1183
+ revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659
1184
+ split: test
1185
+ type: mteb/dbpedia
1186
+ metrics:
1187
+ - type: map_at_1
1188
+ value: 9.728
1189
+ - type: map_at_10
1190
+ value: 22.691
1191
+ - type: map_at_100
1192
+ value: 31.734
1193
+ - type: map_at_1000
1194
+ value: 33.464
1195
+ - type: map_at_3
1196
+ value: 16.273
1197
+ - type: map_at_5
1198
+ value: 19.016
1199
+ - type: mrr_at_1
1200
+ value: 73.25
1201
+ - type: mrr_at_10
1202
+ value: 80.782
1203
+ - type: mrr_at_100
1204
+ value: 81.01899999999999
1205
+ - type: mrr_at_1000
1206
+ value: 81.021
1207
+ - type: mrr_at_3
1208
+ value: 79.583
1209
+ - type: mrr_at_5
1210
+ value: 80.146
1211
+ - type: ndcg_at_1
1212
+ value: 59.62499999999999
1213
+ - type: ndcg_at_10
1214
+ value: 46.304
1215
+ - type: ndcg_at_100
1216
+ value: 51.23
1217
+ - type: ndcg_at_1000
1218
+ value: 58.048
1219
+ - type: ndcg_at_3
1220
+ value: 51.541000000000004
1221
+ - type: ndcg_at_5
1222
+ value: 48.635
1223
+ - type: precision_at_1
1224
+ value: 73.25
1225
+ - type: precision_at_10
1226
+ value: 36.375
1227
+ - type: precision_at_100
1228
+ value: 11.53
1229
+ - type: precision_at_1000
1230
+ value: 2.23
1231
+ - type: precision_at_3
1232
+ value: 55.583000000000006
1233
+ - type: precision_at_5
1234
+ value: 47.15
1235
+ - type: recall_at_1
1236
+ value: 9.728
1237
+ - type: recall_at_10
1238
+ value: 28.793999999999997
1239
+ - type: recall_at_100
1240
+ value: 57.885
1241
+ - type: recall_at_1000
1242
+ value: 78.759
1243
+ - type: recall_at_3
1244
+ value: 17.79
1245
+ - type: recall_at_5
1246
+ value: 21.733
1247
+ task:
1248
+ type: Retrieval
1249
+ - dataset:
1250
+ config: default
1251
+ name: MTEB EmotionClassification
1252
+ revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
1253
+ split: test
1254
+ type: mteb/emotion
1255
+ metrics:
1256
+ - type: accuracy
1257
+ value: 46.775
1258
+ - type: f1
1259
+ value: 41.89794273264891
1260
+ task:
1261
+ type: Classification
1262
+ - dataset:
1263
+ config: default
1264
+ name: MTEB FEVER
1265
+ revision: bea83ef9e8fb933d90a2f1d5515737465d613e12
1266
+ split: test
1267
+ type: mteb/fever
1268
+ metrics:
1269
+ - type: map_at_1
1270
+ value: 85.378
1271
+ - type: map_at_10
1272
+ value: 91.51
1273
+ - type: map_at_100
1274
+ value: 91.666
1275
+ - type: map_at_1000
1276
+ value: 91.676
1277
+ - type: map_at_3
1278
+ value: 90.757
1279
+ - type: map_at_5
1280
+ value: 91.277
1281
+ - type: mrr_at_1
1282
+ value: 91.839
1283
+ - type: mrr_at_10
1284
+ value: 95.49
1285
+ - type: mrr_at_100
1286
+ value: 95.493
1287
+ - type: mrr_at_1000
1288
+ value: 95.493
1289
+ - type: mrr_at_3
1290
+ value: 95.345
1291
+ - type: mrr_at_5
1292
+ value: 95.47200000000001
1293
+ - type: ndcg_at_1
1294
+ value: 91.839
1295
+ - type: ndcg_at_10
1296
+ value: 93.806
1297
+ - type: ndcg_at_100
1298
+ value: 94.255
1299
+ - type: ndcg_at_1000
1300
+ value: 94.399
1301
+ - type: ndcg_at_3
1302
+ value: 93.027
1303
+ - type: ndcg_at_5
1304
+ value: 93.51
1305
+ - type: precision_at_1
1306
+ value: 91.839
1307
+ - type: precision_at_10
1308
+ value: 10.93
1309
+ - type: precision_at_100
1310
+ value: 1.1400000000000001
1311
+ - type: precision_at_1000
1312
+ value: 0.117
1313
+ - type: precision_at_3
1314
+ value: 34.873
1315
+ - type: precision_at_5
1316
+ value: 21.44
1317
+ - type: recall_at_1
1318
+ value: 85.378
1319
+ - type: recall_at_10
1320
+ value: 96.814
1321
+ - type: recall_at_100
1322
+ value: 98.386
1323
+ - type: recall_at_1000
1324
+ value: 99.21600000000001
1325
+ - type: recall_at_3
1326
+ value: 94.643
1327
+ - type: recall_at_5
1328
+ value: 95.976
1329
+ task:
1330
+ type: Retrieval
1331
+ - dataset:
1332
+ config: default
1333
+ name: MTEB FiQA2018
1334
+ revision: 27a168819829fe9bcd655c2df245fb19452e8e06
1335
+ split: test
1336
+ type: mteb/fiqa
1337
+ metrics:
1338
+ - type: map_at_1
1339
+ value: 32.190000000000005
1340
+ - type: map_at_10
1341
+ value: 53.605000000000004
1342
+ - type: map_at_100
1343
+ value: 55.550999999999995
1344
+ - type: map_at_1000
1345
+ value: 55.665
1346
+ - type: map_at_3
1347
+ value: 46.62
1348
+ - type: map_at_5
1349
+ value: 50.517999999999994
1350
+ - type: mrr_at_1
1351
+ value: 60.34
1352
+ - type: mrr_at_10
1353
+ value: 70.775
1354
+ - type: mrr_at_100
1355
+ value: 71.238
1356
+ - type: mrr_at_1000
1357
+ value: 71.244
1358
+ - type: mrr_at_3
1359
+ value: 68.72399999999999
1360
+ - type: mrr_at_5
1361
+ value: 69.959
1362
+ - type: ndcg_at_1
1363
+ value: 60.34
1364
+ - type: ndcg_at_10
1365
+ value: 63.226000000000006
1366
+ - type: ndcg_at_100
1367
+ value: 68.60300000000001
1368
+ - type: ndcg_at_1000
1369
+ value: 69.901
1370
+ - type: ndcg_at_3
1371
+ value: 58.048
1372
+ - type: ndcg_at_5
1373
+ value: 59.789
1374
+ - type: precision_at_1
1375
+ value: 60.34
1376
+ - type: precision_at_10
1377
+ value: 17.130000000000003
1378
+ - type: precision_at_100
1379
+ value: 2.29
1380
+ - type: precision_at_1000
1381
+ value: 0.256
1382
+ - type: precision_at_3
1383
+ value: 38.323
1384
+ - type: precision_at_5
1385
+ value: 27.87
1386
+ - type: recall_at_1
1387
+ value: 32.190000000000005
1388
+ - type: recall_at_10
1389
+ value: 73.041
1390
+ - type: recall_at_100
1391
+ value: 91.31
1392
+ - type: recall_at_1000
1393
+ value: 98.104
1394
+ - type: recall_at_3
1395
+ value: 53.70399999999999
1396
+ - type: recall_at_5
1397
+ value: 62.358999999999995
1398
+ task:
1399
+ type: Retrieval
1400
+ - dataset:
1401
+ config: default
1402
+ name: MTEB HotpotQA
1403
+ revision: ab518f4d6fcca38d87c25209f94beba119d02014
1404
+ split: test
1405
+ type: mteb/hotpotqa
1406
+ metrics:
1407
+ - type: map_at_1
1408
+ value: 43.511
1409
+ - type: map_at_10
1410
+ value: 58.15
1411
+ - type: map_at_100
1412
+ value: 58.95399999999999
1413
+ - type: map_at_1000
1414
+ value: 59.018
1415
+ - type: map_at_3
1416
+ value: 55.31700000000001
1417
+ - type: map_at_5
1418
+ value: 57.04900000000001
1419
+ - type: mrr_at_1
1420
+ value: 87.022
1421
+ - type: mrr_at_10
1422
+ value: 91.32000000000001
1423
+ - type: mrr_at_100
1424
+ value: 91.401
1425
+ - type: mrr_at_1000
1426
+ value: 91.403
1427
+ - type: mrr_at_3
1428
+ value: 90.77
1429
+ - type: mrr_at_5
1430
+ value: 91.156
1431
+ - type: ndcg_at_1
1432
+ value: 87.022
1433
+ - type: ndcg_at_10
1434
+ value: 68.183
1435
+ - type: ndcg_at_100
1436
+ value: 70.781
1437
+ - type: ndcg_at_1000
1438
+ value: 72.009
1439
+ - type: ndcg_at_3
1440
+ value: 64.334
1441
+ - type: ndcg_at_5
1442
+ value: 66.449
1443
+ - type: precision_at_1
1444
+ value: 87.022
1445
+ - type: precision_at_10
1446
+ value: 13.406
1447
+ - type: precision_at_100
1448
+ value: 1.542
1449
+ - type: precision_at_1000
1450
+ value: 0.17099999999999999
1451
+ - type: precision_at_3
1452
+ value: 39.023
1453
+ - type: precision_at_5
1454
+ value: 25.080000000000002
1455
+ - type: recall_at_1
1456
+ value: 43.511
1457
+ - type: recall_at_10
1458
+ value: 67.02900000000001
1459
+ - type: recall_at_100
1460
+ value: 77.11
1461
+ - type: recall_at_1000
1462
+ value: 85.294
1463
+ - type: recall_at_3
1464
+ value: 58.535000000000004
1465
+ - type: recall_at_5
1466
+ value: 62.70099999999999
1467
+ task:
1468
+ type: Retrieval
1469
+ - dataset:
1470
+ config: default
1471
+ name: MTEB ImdbClassification
1472
+ revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
1473
+ split: test
1474
+ type: mteb/imdb
1475
+ metrics:
1476
+ - type: accuracy
1477
+ value: 92.0996
1478
+ - type: ap
1479
+ value: 87.86206089096373
1480
+ - type: f1
1481
+ value: 92.07554547510763
1482
+ task:
1483
+ type: Classification
1484
+ - dataset:
1485
+ config: default
1486
+ name: MTEB MSMARCO
1487
+ revision: c5a29a104738b98a9e76336939199e264163d4a0
1488
+ split: dev
1489
+ type: mteb/msmarco
1490
+ metrics:
1491
+ - type: map_at_1
1492
+ value: 23.179
1493
+ - type: map_at_10
1494
+ value: 35.86
1495
+ - type: map_at_100
1496
+ value: 37.025999999999996
1497
+ - type: map_at_1000
1498
+ value: 37.068
1499
+ - type: map_at_3
1500
+ value: 31.921
1501
+ - type: map_at_5
1502
+ value: 34.172000000000004
1503
+ - type: mrr_at_1
1504
+ value: 23.926
1505
+ - type: mrr_at_10
1506
+ value: 36.525999999999996
1507
+ - type: mrr_at_100
1508
+ value: 37.627
1509
+ - type: mrr_at_1000
1510
+ value: 37.665
1511
+ - type: mrr_at_3
1512
+ value: 32.653
1513
+ - type: mrr_at_5
1514
+ value: 34.897
1515
+ - type: ndcg_at_1
1516
+ value: 23.910999999999998
1517
+ - type: ndcg_at_10
1518
+ value: 42.927
1519
+ - type: ndcg_at_100
1520
+ value: 48.464
1521
+ - type: ndcg_at_1000
1522
+ value: 49.533
1523
+ - type: ndcg_at_3
1524
+ value: 34.910000000000004
1525
+ - type: ndcg_at_5
1526
+ value: 38.937
1527
+ - type: precision_at_1
1528
+ value: 23.910999999999998
1529
+ - type: precision_at_10
1530
+ value: 6.758
1531
+ - type: precision_at_100
1532
+ value: 0.9520000000000001
1533
+ - type: precision_at_1000
1534
+ value: 0.104
1535
+ - type: precision_at_3
1536
+ value: 14.838000000000001
1537
+ - type: precision_at_5
1538
+ value: 10.934000000000001
1539
+ - type: recall_at_1
1540
+ value: 23.179
1541
+ - type: recall_at_10
1542
+ value: 64.622
1543
+ - type: recall_at_100
1544
+ value: 90.135
1545
+ - type: recall_at_1000
1546
+ value: 98.301
1547
+ - type: recall_at_3
1548
+ value: 42.836999999999996
1549
+ - type: recall_at_5
1550
+ value: 52.512
1551
+ task:
1552
+ type: Retrieval
1553
+ - dataset:
1554
+ config: en
1555
+ name: MTEB MTOPDomainClassification (en)
1556
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
1557
+ split: test
1558
+ type: mteb/mtop_domain
1559
+ metrics:
1560
+ - type: accuracy
1561
+ value: 96.59598723210215
1562
+ - type: f1
1563
+ value: 96.41913500001952
1564
+ task:
1565
+ type: Classification
1566
+ - dataset:
1567
+ config: en
1568
+ name: MTEB MTOPIntentClassification (en)
1569
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1570
+ split: test
1571
+ type: mteb/mtop_intent
1572
+ metrics:
1573
+ - type: accuracy
1574
+ value: 82.89557683538533
1575
+ - type: f1
1576
+ value: 63.379319722356264
1577
+ task:
1578
+ type: Classification
1579
+ - dataset:
1580
+ config: en
1581
+ name: MTEB MassiveIntentClassification (en)
1582
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1583
+ split: test
1584
+ type: mteb/amazon_massive_intent
1585
+ metrics:
1586
+ - type: accuracy
1587
+ value: 78.93745796906524
1588
+ - type: f1
1589
+ value: 75.71616541785902
1590
+ task:
1591
+ type: Classification
1592
+ - dataset:
1593
+ config: en
1594
+ name: MTEB MassiveScenarioClassification (en)
1595
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
1596
+ split: test
1597
+ type: mteb/amazon_massive_scenario
1598
+ metrics:
1599
+ - type: accuracy
1600
+ value: 81.41223940820443
1601
+ - type: f1
1602
+ value: 81.2877893719078
1603
+ task:
1604
+ type: Classification
1605
+ - dataset:
1606
+ config: default
1607
+ name: MTEB MedrxivClusteringP2P
1608
+ revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
1609
+ split: test
1610
+ type: mteb/medrxiv-clustering-p2p
1611
+ metrics:
1612
+ - type: v_measure
1613
+ value: 35.03682528325662
1614
+ task:
1615
+ type: Clustering
1616
+ - dataset:
1617
+ config: default
1618
+ name: MTEB MedrxivClusteringS2S
1619
+ revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
1620
+ split: test
1621
+ type: mteb/medrxiv-clustering-s2s
1622
+ metrics:
1623
+ - type: v_measure
1624
+ value: 32.942529406124
1625
+ task:
1626
+ type: Clustering
1627
+ - dataset:
1628
+ config: default
1629
+ name: MTEB MindSmallReranking
1630
+ revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
1631
+ split: test
1632
+ type: mteb/mind_small
1633
+ metrics:
1634
+ - type: map
1635
+ value: 31.459949660460317
1636
+ - type: mrr
1637
+ value: 32.70509582031616
1638
+ task:
1639
+ type: Reranking
1640
+ - dataset:
1641
+ config: default
1642
+ name: MTEB NFCorpus
1643
+ revision: ec0fa4fe99da2ff19ca1214b7966684033a58814
1644
+ split: test
1645
+ type: mteb/nfcorpus
1646
+ metrics:
1647
+ - type: map_at_1
1648
+ value: 6.497
1649
+ - type: map_at_10
1650
+ value: 13.843
1651
+ - type: map_at_100
1652
+ value: 17.713
1653
+ - type: map_at_1000
1654
+ value: 19.241
1655
+ - type: map_at_3
1656
+ value: 10.096
1657
+ - type: map_at_5
1658
+ value: 11.85
1659
+ - type: mrr_at_1
1660
+ value: 48.916
1661
+ - type: mrr_at_10
1662
+ value: 57.764
1663
+ - type: mrr_at_100
1664
+ value: 58.251
1665
+ - type: mrr_at_1000
1666
+ value: 58.282999999999994
1667
+ - type: mrr_at_3
1668
+ value: 55.623999999999995
1669
+ - type: mrr_at_5
1670
+ value: 57.018
1671
+ - type: ndcg_at_1
1672
+ value: 46.594
1673
+ - type: ndcg_at_10
1674
+ value: 36.945
1675
+ - type: ndcg_at_100
1676
+ value: 34.06
1677
+ - type: ndcg_at_1000
1678
+ value: 43.05
1679
+ - type: ndcg_at_3
1680
+ value: 41.738
1681
+ - type: ndcg_at_5
1682
+ value: 39.330999999999996
1683
+ - type: precision_at_1
1684
+ value: 48.916
1685
+ - type: precision_at_10
1686
+ value: 27.43
1687
+ - type: precision_at_100
1688
+ value: 8.616
1689
+ - type: precision_at_1000
1690
+ value: 2.155
1691
+ - type: precision_at_3
1692
+ value: 39.112
1693
+ - type: precision_at_5
1694
+ value: 33.808
1695
+ - type: recall_at_1
1696
+ value: 6.497
1697
+ - type: recall_at_10
1698
+ value: 18.163
1699
+ - type: recall_at_100
1700
+ value: 34.566
1701
+ - type: recall_at_1000
1702
+ value: 67.15
1703
+ - type: recall_at_3
1704
+ value: 11.100999999999999
1705
+ - type: recall_at_5
1706
+ value: 14.205000000000002
1707
+ task:
1708
+ type: Retrieval
1709
+ - dataset:
1710
+ config: default
1711
+ name: MTEB NQ
1712
+ revision: b774495ed302d8c44a3a7ea25c90dbce03968f31
1713
+ split: test
1714
+ type: mteb/nq
1715
+ metrics:
1716
+ - type: map_at_1
1717
+ value: 31.916
1718
+ - type: map_at_10
1719
+ value: 48.123
1720
+ - type: map_at_100
1721
+ value: 49.103
1722
+ - type: map_at_1000
1723
+ value: 49.131
1724
+ - type: map_at_3
1725
+ value: 43.711
1726
+ - type: map_at_5
1727
+ value: 46.323
1728
+ - type: mrr_at_1
1729
+ value: 36.181999999999995
1730
+ - type: mrr_at_10
1731
+ value: 50.617999999999995
1732
+ - type: mrr_at_100
1733
+ value: 51.329
1734
+ - type: mrr_at_1000
1735
+ value: 51.348000000000006
1736
+ - type: mrr_at_3
1737
+ value: 47.010999999999996
1738
+ - type: mrr_at_5
1739
+ value: 49.175000000000004
1740
+ - type: ndcg_at_1
1741
+ value: 36.181999999999995
1742
+ - type: ndcg_at_10
1743
+ value: 56.077999999999996
1744
+ - type: ndcg_at_100
1745
+ value: 60.037
1746
+ - type: ndcg_at_1000
1747
+ value: 60.63499999999999
1748
+ - type: ndcg_at_3
1749
+ value: 47.859
1750
+ - type: ndcg_at_5
1751
+ value: 52.178999999999995
1752
+ - type: precision_at_1
1753
+ value: 36.181999999999995
1754
+ - type: precision_at_10
1755
+ value: 9.284
1756
+ - type: precision_at_100
1757
+ value: 1.149
1758
+ - type: precision_at_1000
1759
+ value: 0.121
1760
+ - type: precision_at_3
1761
+ value: 22.006999999999998
1762
+ - type: precision_at_5
1763
+ value: 15.695
1764
+ - type: recall_at_1
1765
+ value: 31.916
1766
+ - type: recall_at_10
1767
+ value: 77.771
1768
+ - type: recall_at_100
1769
+ value: 94.602
1770
+ - type: recall_at_1000
1771
+ value: 98.967
1772
+ - type: recall_at_3
1773
+ value: 56.528
1774
+ - type: recall_at_5
1775
+ value: 66.527
1776
+ task:
1777
+ type: Retrieval
1778
+ - dataset:
1779
+ config: default
1780
+ name: MTEB QuoraRetrieval
1781
+ revision: None
1782
+ split: test
1783
+ type: mteb/quora
1784
+ metrics:
1785
+ - type: map_at_1
1786
+ value: 71.486
1787
+ - type: map_at_10
1788
+ value: 85.978
1789
+ - type: map_at_100
1790
+ value: 86.587
1791
+ - type: map_at_1000
1792
+ value: 86.598
1793
+ - type: map_at_3
1794
+ value: 83.04899999999999
1795
+ - type: map_at_5
1796
+ value: 84.857
1797
+ - type: mrr_at_1
1798
+ value: 82.32000000000001
1799
+ - type: mrr_at_10
1800
+ value: 88.64
1801
+ - type: mrr_at_100
1802
+ value: 88.702
1803
+ - type: mrr_at_1000
1804
+ value: 88.702
1805
+ - type: mrr_at_3
1806
+ value: 87.735
1807
+ - type: mrr_at_5
1808
+ value: 88.36
1809
+ - type: ndcg_at_1
1810
+ value: 82.34
1811
+ - type: ndcg_at_10
1812
+ value: 89.67
1813
+ - type: ndcg_at_100
1814
+ value: 90.642
1815
+ - type: ndcg_at_1000
1816
+ value: 90.688
1817
+ - type: ndcg_at_3
1818
+ value: 86.932
1819
+ - type: ndcg_at_5
1820
+ value: 88.408
1821
+ - type: precision_at_1
1822
+ value: 82.34
1823
+ - type: precision_at_10
1824
+ value: 13.675999999999998
1825
+ - type: precision_at_100
1826
+ value: 1.544
1827
+ - type: precision_at_1000
1828
+ value: 0.157
1829
+ - type: precision_at_3
1830
+ value: 38.24
1831
+ - type: precision_at_5
1832
+ value: 25.068
1833
+ - type: recall_at_1
1834
+ value: 71.486
1835
+ - type: recall_at_10
1836
+ value: 96.844
1837
+ - type: recall_at_100
1838
+ value: 99.843
1839
+ - type: recall_at_1000
1840
+ value: 99.996
1841
+ - type: recall_at_3
1842
+ value: 88.92099999999999
1843
+ - type: recall_at_5
1844
+ value: 93.215
1845
+ task:
1846
+ type: Retrieval
1847
+ - dataset:
1848
+ config: default
1849
+ name: MTEB RedditClustering
1850
+ revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
1851
+ split: test
1852
+ type: mteb/reddit-clustering
1853
+ metrics:
1854
+ - type: v_measure
1855
+ value: 59.75758437908334
1856
+ task:
1857
+ type: Clustering
1858
+ - dataset:
1859
+ config: default
1860
+ name: MTEB RedditClusteringP2P
1861
+ revision: 282350215ef01743dc01b456c7f5241fa8937f16
1862
+ split: test
1863
+ type: mteb/reddit-clustering-p2p
1864
+ metrics:
1865
+ - type: v_measure
1866
+ value: 68.03497914092789
1867
+ task:
1868
+ type: Clustering
1869
+ - dataset:
1870
+ config: default
1871
+ name: MTEB SCIDOCS
1872
+ revision: None
1873
+ split: test
1874
+ type: mteb/scidocs
1875
+ metrics:
1876
+ - type: map_at_1
1877
+ value: 5.808
1878
+ - type: map_at_10
1879
+ value: 16.059
1880
+ - type: map_at_100
1881
+ value: 19.048000000000002
1882
+ - type: map_at_1000
1883
+ value: 19.43
1884
+ - type: map_at_3
1885
+ value: 10.953
1886
+ - type: map_at_5
1887
+ value: 13.363
1888
+ - type: mrr_at_1
1889
+ value: 28.7
1890
+ - type: mrr_at_10
1891
+ value: 42.436
1892
+ - type: mrr_at_100
1893
+ value: 43.599
1894
+ - type: mrr_at_1000
1895
+ value: 43.62
1896
+ - type: mrr_at_3
1897
+ value: 38.45
1898
+ - type: mrr_at_5
1899
+ value: 40.89
1900
+ - type: ndcg_at_1
1901
+ value: 28.7
1902
+ - type: ndcg_at_10
1903
+ value: 26.346000000000004
1904
+ - type: ndcg_at_100
1905
+ value: 36.758
1906
+ - type: ndcg_at_1000
1907
+ value: 42.113
1908
+ - type: ndcg_at_3
1909
+ value: 24.254
1910
+ - type: ndcg_at_5
1911
+ value: 21.506
1912
+ - type: precision_at_1
1913
+ value: 28.7
1914
+ - type: precision_at_10
1915
+ value: 13.969999999999999
1916
+ - type: precision_at_100
1917
+ value: 2.881
1918
+ - type: precision_at_1000
1919
+ value: 0.414
1920
+ - type: precision_at_3
1921
+ value: 22.933
1922
+ - type: precision_at_5
1923
+ value: 19.220000000000002
1924
+ - type: recall_at_1
1925
+ value: 5.808
1926
+ - type: recall_at_10
1927
+ value: 28.310000000000002
1928
+ - type: recall_at_100
1929
+ value: 58.475
1930
+ - type: recall_at_1000
1931
+ value: 84.072
1932
+ - type: recall_at_3
1933
+ value: 13.957
1934
+ - type: recall_at_5
1935
+ value: 19.515
1936
+ task:
1937
+ type: Retrieval
1938
+ - dataset:
1939
+ config: default
1940
+ name: MTEB SICK-R
1941
+ revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
1942
+ split: test
1943
+ type: mteb/sickr-sts
1944
+ metrics:
1945
+ - type: cos_sim_pearson
1946
+ value: 82.39274129958557
1947
+ - type: cos_sim_spearman
1948
+ value: 79.78021235170053
1949
+ - type: euclidean_pearson
1950
+ value: 79.35335401300166
1951
+ - type: euclidean_spearman
1952
+ value: 79.7271870968275
1953
+ - type: manhattan_pearson
1954
+ value: 79.35256263340601
1955
+ - type: manhattan_spearman
1956
+ value: 79.76036386976321
1957
+ task:
1958
+ type: STS
1959
+ - dataset:
1960
+ config: default
1961
+ name: MTEB STS12
1962
+ revision: a0d554a64d88156834ff5ae9920b964011b16384
1963
+ split: test
1964
+ type: mteb/sts12-sts
1965
+ metrics:
1966
+ - type: cos_sim_pearson
1967
+ value: 83.99130429246708
1968
+ - type: cos_sim_spearman
1969
+ value: 73.88322811171203
1970
+ - type: euclidean_pearson
1971
+ value: 80.7569419170376
1972
+ - type: euclidean_spearman
1973
+ value: 73.82542155409597
1974
+ - type: manhattan_pearson
1975
+ value: 80.79468183847625
1976
+ - type: manhattan_spearman
1977
+ value: 73.87027144047784
1978
+ task:
1979
+ type: STS
1980
+ - dataset:
1981
+ config: default
1982
+ name: MTEB STS13
1983
+ revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
1984
+ split: test
1985
+ type: mteb/sts13-sts
1986
+ metrics:
1987
+ - type: cos_sim_pearson
1988
+ value: 84.88548789489907
1989
+ - type: cos_sim_spearman
1990
+ value: 85.07535893847255
1991
+ - type: euclidean_pearson
1992
+ value: 84.6637222061494
1993
+ - type: euclidean_spearman
1994
+ value: 85.14200626702456
1995
+ - type: manhattan_pearson
1996
+ value: 84.75327892344734
1997
+ - type: manhattan_spearman
1998
+ value: 85.24406181838596
1999
+ task:
2000
+ type: STS
2001
+ - dataset:
2002
+ config: default
2003
+ name: MTEB STS14
2004
+ revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
2005
+ split: test
2006
+ type: mteb/sts14-sts
2007
+ metrics:
2008
+ - type: cos_sim_pearson
2009
+ value: 82.88140039325008
2010
+ - type: cos_sim_spearman
2011
+ value: 79.61211268112362
2012
+ - type: euclidean_pearson
2013
+ value: 81.29639728816458
2014
+ - type: euclidean_spearman
2015
+ value: 79.51284578041442
2016
+ - type: manhattan_pearson
2017
+ value: 81.3381797137111
2018
+ - type: manhattan_spearman
2019
+ value: 79.55683684039808
2020
+ task:
2021
+ type: STS
2022
+ - dataset:
2023
+ config: default
2024
+ name: MTEB STS15
2025
+ revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
2026
+ split: test
2027
+ type: mteb/sts15-sts
2028
+ metrics:
2029
+ - type: cos_sim_pearson
2030
+ value: 85.16716737270485
2031
+ - type: cos_sim_spearman
2032
+ value: 86.14823841857738
2033
+ - type: euclidean_pearson
2034
+ value: 85.36325733440725
2035
+ - type: euclidean_spearman
2036
+ value: 86.04919691402029
2037
+ - type: manhattan_pearson
2038
+ value: 85.3147511385052
2039
+ - type: manhattan_spearman
2040
+ value: 86.00676205857764
2041
+ task:
2042
+ type: STS
2043
+ - dataset:
2044
+ config: default
2045
+ name: MTEB STS16
2046
+ revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
2047
+ split: test
2048
+ type: mteb/sts16-sts
2049
+ metrics:
2050
+ - type: cos_sim_pearson
2051
+ value: 80.34266645861588
2052
+ - type: cos_sim_spearman
2053
+ value: 81.59914035005882
2054
+ - type: euclidean_pearson
2055
+ value: 81.15053076245988
2056
+ - type: euclidean_spearman
2057
+ value: 81.52776915798489
2058
+ - type: manhattan_pearson
2059
+ value: 81.1819647418673
2060
+ - type: manhattan_spearman
2061
+ value: 81.57479527353556
2062
+ task:
2063
+ type: STS
2064
+ - dataset:
2065
+ config: en-en
2066
+ name: MTEB STS17 (en-en)
2067
+ revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2068
+ split: test
2069
+ type: mteb/sts17-crosslingual-sts
2070
+ metrics:
2071
+ - type: cos_sim_pearson
2072
+ value: 89.38263326821439
2073
+ - type: cos_sim_spearman
2074
+ value: 89.10946308202642
2075
+ - type: euclidean_pearson
2076
+ value: 88.87831312540068
2077
+ - type: euclidean_spearman
2078
+ value: 89.03615865973664
2079
+ - type: manhattan_pearson
2080
+ value: 88.79835539970384
2081
+ - type: manhattan_spearman
2082
+ value: 88.9766156339753
2083
+ task:
2084
+ type: STS
2085
+ - dataset:
2086
+ config: en
2087
+ name: MTEB STS22 (en)
2088
+ revision: eea2b4fe26a775864c896887d910b76a8098ad3f
2089
+ split: test
2090
+ type: mteb/sts22-crosslingual-sts
2091
+ metrics:
2092
+ - type: cos_sim_pearson
2093
+ value: 70.1574915581685
2094
+ - type: cos_sim_spearman
2095
+ value: 70.59144980004054
2096
+ - type: euclidean_pearson
2097
+ value: 71.43246306918755
2098
+ - type: euclidean_spearman
2099
+ value: 70.5544189562984
2100
+ - type: manhattan_pearson
2101
+ value: 71.4071414609503
2102
+ - type: manhattan_spearman
2103
+ value: 70.31799126163712
2104
+ task:
2105
+ type: STS
2106
+ - dataset:
2107
+ config: default
2108
+ name: MTEB STSBenchmark
2109
+ revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
2110
+ split: test
2111
+ type: mteb/stsbenchmark-sts
2112
+ metrics:
2113
+ - type: cos_sim_pearson
2114
+ value: 83.36215796635351
2115
+ - type: cos_sim_spearman
2116
+ value: 83.07276756467208
2117
+ - type: euclidean_pearson
2118
+ value: 83.06690453635584
2119
+ - type: euclidean_spearman
2120
+ value: 82.9635366303289
2121
+ - type: manhattan_pearson
2122
+ value: 83.04994049700815
2123
+ - type: manhattan_spearman
2124
+ value: 82.98120125356036
2125
+ task:
2126
+ type: STS
2127
+ - dataset:
2128
+ config: default
2129
+ name: MTEB SciDocsRR
2130
+ revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
2131
+ split: test
2132
+ type: mteb/scidocs-reranking
2133
+ metrics:
2134
+ - type: map
2135
+ value: 86.92530011616722
2136
+ - type: mrr
2137
+ value: 96.21826793395421
2138
+ task:
2139
+ type: Reranking
2140
+ - dataset:
2141
+ config: default
2142
+ name: MTEB SciFact
2143
+ revision: 0228b52cf27578f30900b9e5271d331663a030d7
2144
+ split: test
2145
+ type: mteb/scifact
2146
+ metrics:
2147
+ - type: map_at_1
2148
+ value: 65.75
2149
+ - type: map_at_10
2150
+ value: 77.701
2151
+ - type: map_at_100
2152
+ value: 78.005
2153
+ - type: map_at_1000
2154
+ value: 78.006
2155
+ - type: map_at_3
2156
+ value: 75.48
2157
+ - type: map_at_5
2158
+ value: 76.927
2159
+ - type: mrr_at_1
2160
+ value: 68.333
2161
+ - type: mrr_at_10
2162
+ value: 78.511
2163
+ - type: mrr_at_100
2164
+ value: 78.704
2165
+ - type: mrr_at_1000
2166
+ value: 78.704
2167
+ - type: mrr_at_3
2168
+ value: 77
2169
+ - type: mrr_at_5
2170
+ value: 78.083
2171
+ - type: ndcg_at_1
2172
+ value: 68.333
2173
+ - type: ndcg_at_10
2174
+ value: 82.42699999999999
2175
+ - type: ndcg_at_100
2176
+ value: 83.486
2177
+ - type: ndcg_at_1000
2178
+ value: 83.511
2179
+ - type: ndcg_at_3
2180
+ value: 78.96300000000001
2181
+ - type: ndcg_at_5
2182
+ value: 81.028
2183
+ - type: precision_at_1
2184
+ value: 68.333
2185
+ - type: precision_at_10
2186
+ value: 10.667
2187
+ - type: precision_at_100
2188
+ value: 1.127
2189
+ - type: precision_at_1000
2190
+ value: 0.11299999999999999
2191
+ - type: precision_at_3
2192
+ value: 31.333
2193
+ - type: precision_at_5
2194
+ value: 20.133000000000003
2195
+ - type: recall_at_1
2196
+ value: 65.75
2197
+ - type: recall_at_10
2198
+ value: 95.578
2199
+ - type: recall_at_100
2200
+ value: 99.833
2201
+ - type: recall_at_1000
2202
+ value: 100
2203
+ - type: recall_at_3
2204
+ value: 86.506
2205
+ - type: recall_at_5
2206
+ value: 91.75
2207
+ task:
2208
+ type: Retrieval
2209
+ - dataset:
2210
+ config: default
2211
+ name: MTEB SprintDuplicateQuestions
2212
+ revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
2213
+ split: test
2214
+ type: mteb/sprintduplicatequestions-pairclassification
2215
+ metrics:
2216
+ - type: cos_sim_accuracy
2217
+ value: 99.75247524752476
2218
+ - type: cos_sim_ap
2219
+ value: 94.16065078045173
2220
+ - type: cos_sim_f1
2221
+ value: 87.22986247544205
2222
+ - type: cos_sim_precision
2223
+ value: 85.71428571428571
2224
+ - type: cos_sim_recall
2225
+ value: 88.8
2226
+ - type: dot_accuracy
2227
+ value: 99.74554455445545
2228
+ - type: dot_ap
2229
+ value: 93.90633887037264
2230
+ - type: dot_f1
2231
+ value: 86.9873417721519
2232
+ - type: dot_precision
2233
+ value: 88.1025641025641
2234
+ - type: dot_recall
2235
+ value: 85.9
2236
+ - type: euclidean_accuracy
2237
+ value: 99.75247524752476
2238
+ - type: euclidean_ap
2239
+ value: 94.17466319018055
2240
+ - type: euclidean_f1
2241
+ value: 87.3405299313052
2242
+ - type: euclidean_precision
2243
+ value: 85.74181117533719
2244
+ - type: euclidean_recall
2245
+ value: 89
2246
+ - type: manhattan_accuracy
2247
+ value: 99.75445544554455
2248
+ - type: manhattan_ap
2249
+ value: 94.27688371923577
2250
+ - type: manhattan_f1
2251
+ value: 87.74002954209749
2252
+ - type: manhattan_precision
2253
+ value: 86.42095053346266
2254
+ - type: manhattan_recall
2255
+ value: 89.1
2256
+ - type: max_accuracy
2257
+ value: 99.75445544554455
2258
+ - type: max_ap
2259
+ value: 94.27688371923577
2260
+ - type: max_f1
2261
+ value: 87.74002954209749
2262
+ task:
2263
+ type: PairClassification
2264
+ - dataset:
2265
+ config: default
2266
+ name: MTEB StackExchangeClustering
2267
+ revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
2268
+ split: test
2269
+ type: mteb/stackexchange-clustering
2270
+ metrics:
2271
+ - type: v_measure
2272
+ value: 71.26500637517056
2273
+ task:
2274
+ type: Clustering
2275
+ - dataset:
2276
+ config: default
2277
+ name: MTEB StackExchangeClusteringP2P
2278
+ revision: 815ca46b2622cec33ccafc3735d572c266efdb44
2279
+ split: test
2280
+ type: mteb/stackexchange-clustering-p2p
2281
+ metrics:
2282
+ - type: v_measure
2283
+ value: 39.17507906280528
2284
+ task:
2285
+ type: Clustering
2286
+ - dataset:
2287
+ config: default
2288
+ name: MTEB StackOverflowDupQuestions
2289
+ revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
2290
+ split: test
2291
+ type: mteb/stackoverflowdupquestions-reranking
2292
+ metrics:
2293
+ - type: map
2294
+ value: 52.4848744828509
2295
+ - type: mrr
2296
+ value: 53.33678168236992
2297
+ task:
2298
+ type: Reranking
2299
+ - dataset:
2300
+ config: default
2301
+ name: MTEB SummEval
2302
+ revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
2303
+ split: test
2304
+ type: mteb/summeval
2305
+ metrics:
2306
+ - type: cos_sim_pearson
2307
+ value: 30.599864323827887
2308
+ - type: cos_sim_spearman
2309
+ value: 30.91116204665598
2310
+ - type: dot_pearson
2311
+ value: 30.82637894269936
2312
+ - type: dot_spearman
2313
+ value: 30.957573868416066
2314
+ task:
2315
+ type: Summarization
2316
+ - dataset:
2317
+ config: default
2318
+ name: MTEB TRECCOVID
2319
+ revision: None
2320
+ split: test
2321
+ type: mteb/trec-covid
2322
+ metrics:
2323
+ - type: map_at_1
2324
+ value: 0.23600000000000002
2325
+ - type: map_at_10
2326
+ value: 1.892
2327
+ - type: map_at_100
2328
+ value: 11.586
2329
+ - type: map_at_1000
2330
+ value: 27.761999999999997
2331
+ - type: map_at_3
2332
+ value: 0.653
2333
+ - type: map_at_5
2334
+ value: 1.028
2335
+ - type: mrr_at_1
2336
+ value: 88
2337
+ - type: mrr_at_10
2338
+ value: 94
2339
+ - type: mrr_at_100
2340
+ value: 94
2341
+ - type: mrr_at_1000
2342
+ value: 94
2343
+ - type: mrr_at_3
2344
+ value: 94
2345
+ - type: mrr_at_5
2346
+ value: 94
2347
+ - type: ndcg_at_1
2348
+ value: 82
2349
+ - type: ndcg_at_10
2350
+ value: 77.48899999999999
2351
+ - type: ndcg_at_100
2352
+ value: 60.141
2353
+ - type: ndcg_at_1000
2354
+ value: 54.228
2355
+ - type: ndcg_at_3
2356
+ value: 82.358
2357
+ - type: ndcg_at_5
2358
+ value: 80.449
2359
+ - type: precision_at_1
2360
+ value: 88
2361
+ - type: precision_at_10
2362
+ value: 82.19999999999999
2363
+ - type: precision_at_100
2364
+ value: 61.760000000000005
2365
+ - type: precision_at_1000
2366
+ value: 23.684
2367
+ - type: precision_at_3
2368
+ value: 88
2369
+ - type: precision_at_5
2370
+ value: 85.6
2371
+ - type: recall_at_1
2372
+ value: 0.23600000000000002
2373
+ - type: recall_at_10
2374
+ value: 2.117
2375
+ - type: recall_at_100
2376
+ value: 14.985000000000001
2377
+ - type: recall_at_1000
2378
+ value: 51.107
2379
+ - type: recall_at_3
2380
+ value: 0.688
2381
+ - type: recall_at_5
2382
+ value: 1.1039999999999999
2383
+ task:
2384
+ type: Retrieval
2385
+ - dataset:
2386
+ config: default
2387
+ name: MTEB Touche2020
2388
+ revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f
2389
+ split: test
2390
+ type: mteb/touche2020
2391
+ metrics:
2392
+ - type: map_at_1
2393
+ value: 2.3040000000000003
2394
+ - type: map_at_10
2395
+ value: 9.025
2396
+ - type: map_at_100
2397
+ value: 15.312999999999999
2398
+ - type: map_at_1000
2399
+ value: 16.954
2400
+ - type: map_at_3
2401
+ value: 4.981
2402
+ - type: map_at_5
2403
+ value: 6.32
2404
+ - type: mrr_at_1
2405
+ value: 24.490000000000002
2406
+ - type: mrr_at_10
2407
+ value: 39.835
2408
+ - type: mrr_at_100
2409
+ value: 40.8
2410
+ - type: mrr_at_1000
2411
+ value: 40.8
2412
+ - type: mrr_at_3
2413
+ value: 35.034
2414
+ - type: mrr_at_5
2415
+ value: 37.687
2416
+ - type: ndcg_at_1
2417
+ value: 22.448999999999998
2418
+ - type: ndcg_at_10
2419
+ value: 22.545
2420
+ - type: ndcg_at_100
2421
+ value: 35.931999999999995
2422
+ - type: ndcg_at_1000
2423
+ value: 47.665
2424
+ - type: ndcg_at_3
2425
+ value: 23.311
2426
+ - type: ndcg_at_5
2427
+ value: 22.421
2428
+ - type: precision_at_1
2429
+ value: 24.490000000000002
2430
+ - type: precision_at_10
2431
+ value: 20.408
2432
+ - type: precision_at_100
2433
+ value: 7.815999999999999
2434
+ - type: precision_at_1000
2435
+ value: 1.553
2436
+ - type: precision_at_3
2437
+ value: 25.169999999999998
2438
+ - type: precision_at_5
2439
+ value: 23.265
2440
+ - type: recall_at_1
2441
+ value: 2.3040000000000003
2442
+ - type: recall_at_10
2443
+ value: 15.693999999999999
2444
+ - type: recall_at_100
2445
+ value: 48.917
2446
+ - type: recall_at_1000
2447
+ value: 84.964
2448
+ - type: recall_at_3
2449
+ value: 6.026
2450
+ - type: recall_at_5
2451
+ value: 9.066
2452
+ task:
2453
+ type: Retrieval
2454
+ - dataset:
2455
+ config: default
2456
+ name: MTEB ToxicConversationsClassification
2457
+ revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
2458
+ split: test
2459
+ type: mteb/toxic_conversations_50k
2460
+ metrics:
2461
+ - type: accuracy
2462
+ value: 82.6074
2463
+ - type: ap
2464
+ value: 23.187467098602013
2465
+ - type: f1
2466
+ value: 65.36829506379657
2467
+ task:
2468
+ type: Classification
2469
+ - dataset:
2470
+ config: default
2471
+ name: MTEB TweetSentimentExtractionClassification
2472
+ revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
2473
+ split: test
2474
+ type: mteb/tweet_sentiment_extraction
2475
+ metrics:
2476
+ - type: accuracy
2477
+ value: 63.16355404640635
2478
+ - type: f1
2479
+ value: 63.534725639863346
2480
+ task:
2481
+ type: Classification
2482
+ - dataset:
2483
+ config: default
2484
+ name: MTEB TwentyNewsgroupsClustering
2485
+ revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
2486
+ split: test
2487
+ type: mteb/twentynewsgroups-clustering
2488
+ metrics:
2489
+ - type: v_measure
2490
+ value: 50.91004094411276
2491
+ task:
2492
+ type: Clustering
2493
+ - dataset:
2494
+ config: default
2495
+ name: MTEB TwitterSemEval2015
2496
+ revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
2497
+ split: test
2498
+ type: mteb/twittersemeval2015-pairclassification
2499
+ metrics:
2500
+ - type: cos_sim_accuracy
2501
+ value: 86.55301901412649
2502
+ - type: cos_sim_ap
2503
+ value: 75.25312618556728
2504
+ - type: cos_sim_f1
2505
+ value: 68.76561719140429
2506
+ - type: cos_sim_precision
2507
+ value: 65.3061224489796
2508
+ - type: cos_sim_recall
2509
+ value: 72.61213720316623
2510
+ - type: dot_accuracy
2511
+ value: 86.29671574178936
2512
+ - type: dot_ap
2513
+ value: 75.11910195501207
2514
+ - type: dot_f1
2515
+ value: 68.44048376830045
2516
+ - type: dot_precision
2517
+ value: 66.12546125461255
2518
+ - type: dot_recall
2519
+ value: 70.92348284960423
2520
+ - type: euclidean_accuracy
2521
+ value: 86.5828217202122
2522
+ - type: euclidean_ap
2523
+ value: 75.22986344900924
2524
+ - type: euclidean_f1
2525
+ value: 68.81267797449549
2526
+ - type: euclidean_precision
2527
+ value: 64.8238861674831
2528
+ - type: euclidean_recall
2529
+ value: 73.3245382585752
2530
+ - type: manhattan_accuracy
2531
+ value: 86.61262442629791
2532
+ - type: manhattan_ap
2533
+ value: 75.24401608557328
2534
+ - type: manhattan_f1
2535
+ value: 68.80473982483257
2536
+ - type: manhattan_precision
2537
+ value: 67.21187720181177
2538
+ - type: manhattan_recall
2539
+ value: 70.47493403693932
2540
+ - type: max_accuracy
2541
+ value: 86.61262442629791
2542
+ - type: max_ap
2543
+ value: 75.25312618556728
2544
+ - type: max_f1
2545
+ value: 68.81267797449549
2546
+ task:
2547
+ type: PairClassification
2548
+ - dataset:
2549
+ config: default
2550
+ name: MTEB TwitterURLCorpus
2551
+ revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
2552
+ split: test
2553
+ type: mteb/twitterurlcorpus-pairclassification
2554
+ metrics:
2555
+ - type: cos_sim_accuracy
2556
+ value: 88.10688089416696
2557
+ - type: cos_sim_ap
2558
+ value: 84.17862178779863
2559
+ - type: cos_sim_f1
2560
+ value: 76.17305208781748
2561
+ - type: cos_sim_precision
2562
+ value: 71.31246641590543
2563
+ - type: cos_sim_recall
2564
+ value: 81.74468740375731
2565
+ - type: dot_accuracy
2566
+ value: 88.1844995536927
2567
+ - type: dot_ap
2568
+ value: 84.33816725235876
2569
+ - type: dot_f1
2570
+ value: 76.43554032918746
2571
+ - type: dot_precision
2572
+ value: 74.01557767200346
2573
+ - type: dot_recall
2574
+ value: 79.0190945488143
2575
+ - type: euclidean_accuracy
2576
+ value: 88.07001203089223
2577
+ - type: euclidean_ap
2578
+ value: 84.12267000814985
2579
+ - type: euclidean_f1
2580
+ value: 76.12232600180778
2581
+ - type: euclidean_precision
2582
+ value: 74.50604541433205
2583
+ - type: euclidean_recall
2584
+ value: 77.81028641823221
2585
+ - type: manhattan_accuracy
2586
+ value: 88.06419063142779
2587
+ - type: manhattan_ap
2588
+ value: 84.11648917164187
2589
+ - type: manhattan_f1
2590
+ value: 76.20579953925474
2591
+ - type: manhattan_precision
2592
+ value: 72.56772755762935
2593
+ - type: manhattan_recall
2594
+ value: 80.22790267939637
2595
+ - type: max_accuracy
2596
+ value: 88.1844995536927
2597
+ - type: max_ap
2598
+ value: 84.33816725235876
2599
+ - type: max_f1
2600
+ value: 76.43554032918746
2601
+ task:
2602
+ type: PairClassification
2603
+ tags:
2604
+ - sentence-transformers
2605
+ - gte
2606
+ - mteb
2607
+ - transformers.js
2608
+ - sentence-similarity
2609
+ - onnx
2610
+ - teradata
2611
+
2612
+ ---
2613
+ # A Teradata Vantage compatible Embeddings Model
2614
+
2615
+ # Alibaba-NLP/gte-large-en-v1.5
2616
+
2617
+ ## Overview of this Model
2618
+
2619
+ An Embedding Model which maps text (sentence/ paragraphs) into a vector. The [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) model well known for its effectiveness in capturing semantic meanings in text data. It's a state-of-the-art model trained on a large corpus, capable of generating high-quality text embeddings.
2620
+
2621
+ - 434.14M params (Sizes in ONNX format - "fp32": 1664.98MB, "int8": 424.0MB, "uint8": 424.0MB)
2622
+ - 8192 maximum input tokens
2623
+ - 1024 dimensions of output vector
2624
+ - Licence: apache-2.0. The released models can be used for commercial purposes free of charge.
2625
+ - Reference to Original Model: https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5
2626
+
2627
+
2628
+ ## Quickstart: Deploying this Model in Teradata Vantage
2629
+
2630
+ We have pre-converted the model into the ONNX format compatible with BYOM 6.0, eliminating the need for manual conversion.
2631
+
2632
+ **Note:** Ensure you have access to a Teradata Database with BYOM 6.0 installed.
2633
+
2634
+ To get started, clone the pre-converted model directly from the Teradata HuggingFace repository.
2635
+
2636
+
2637
+ ```python
2638
+
2639
+ import teradataml as tdml
2640
+ import getpass
2641
+ from huggingface_hub import hf_hub_download
2642
+
2643
+ model_name = "gte-large-en-v1.5"
2644
+ number_dimensions_output = 1024
2645
+ model_file_name = "model.onnx"
2646
+
2647
+ # Step 1: Download Model from Teradata HuggingFace Page
2648
+
2649
+ hf_hub_download(repo_id=f"Teradata/{model_name}", filename=f"onnx/{model_file_name}", local_dir="./")
2650
+ hf_hub_download(repo_id=f"Teradata/{model_name}", filename=f"tokenizer.json", local_dir="./")
2651
+
2652
+ # Step 2: Create Connection to Vantage
2653
+
2654
+ tdml.create_context(host = input('enter your hostname'),
2655
+ username=input('enter your username'),
2656
+ password = getpass.getpass("enter your password"))
2657
+
2658
+ # Step 3: Load Models into Vantage
2659
+ # a) Embedding model
2660
+ tdml.save_byom(model_id = model_name, # must be unique in the models table
2661
+ model_file = model_file_name,
2662
+ table_name = 'embeddings_models' )
2663
+ # b) Tokenizer
2664
+ tdml.save_byom(model_id = model_name, # must be unique in the models table
2665
+ model_file = 'tokenizer.json',
2666
+ table_name = 'embeddings_tokenizers')
2667
+
2668
+ # Step 4: Test ONNXEmbeddings Function
2669
+ # Note that ONNXEmbeddings expects the 'payload' column to be 'txt'.
2670
+ # If it has got a different name, just rename it in a subquery/CTE.
2671
+ input_table = "emails.emails"
2672
+ embeddings_query = f"""
2673
+ SELECT
2674
+ *
2675
+ from mldb.ONNXEmbeddings(
2676
+ on {input_table} as InputTable
2677
+ on (select * from embeddings_models where model_id = '{model_name}') as ModelTable DIMENSION
2678
+ on (select model as tokenizer from embeddings_tokenizers where model_id = '{model_name}') as TokenizerTable DIMENSION
2679
+ using
2680
+ Accumulate('id', 'txt')
2681
+ ModelOutputTensor('sentence_embedding')
2682
+ EnableMemoryCheck('false')
2683
+ OutputFormat('FLOAT32({number_dimensions_output})')
2684
+ OverwriteCachedModel('true')
2685
+ ) a
2686
+ """
2687
+ DF_embeddings = tdml.DataFrame.from_query(embeddings_query)
2688
+ DF_embeddings
2689
+ ```
2690
+
2691
+
2692
+
2693
+ ## What Can I Do with the Embeddings?
2694
+
2695
+ Teradata Vantage includes pre-built in-database functions to process embeddings further. Explore the following examples:
2696
+
2697
+ - **Semantic Clustering with TD_KMeans:** [Semantic Clustering Python Notebook](https://github.com/Teradata/jupyter-demos/blob/main/UseCases/Language_Models_InVantage/Semantic_Clustering_Python.ipynb)
2698
+ - **Semantic Distance with TD_VectorDistance:** [Semantic Similarity Python Notebook](https://github.com/Teradata/jupyter-demos/blob/main/UseCases/Language_Models_InVantage/Semantic_Similarity_Python.ipynb)
2699
+ - **RAG-Based Application with TD_VectorDistance:** [RAG and Bedrock Query PDF Notebook](https://github.com/Teradata/jupyter-demos/blob/main/UseCases/Language_Models_InVantage/RAG_and_Bedrock_QueryPDF.ipynb)
2700
+
2701
+
2702
+ ## Deep Dive into Model Conversion to ONNX
2703
+
2704
+ **The steps below outline how we converted the open-source Hugging Face model into an ONNX file compatible with the in-database ONNXEmbeddings function.**
2705
+
2706
+ You do not need to perform these steps—they are provided solely for documentation and transparency. However, they may be helpful if you wish to convert another model to the required format.
2707
+
2708
+
2709
+ ### Part 1. Importing and Converting Model using optimum
2710
+
2711
+ We start by importing the pre-trained [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) model from Hugging Face.
2712
+
2713
+ To enhance performance and ensure compatibility with various execution environments, we'll use the [Optimum](https://github.com/huggingface/optimum) utility to convert the model into the ONNX (Open Neural Network Exchange) format.
2714
+
2715
+ After conversion to ONNX, we are fixing the opset in the ONNX file for compatibility with ONNX runtime used in Teradata Vantage
2716
+
2717
+ We are generating ONNX files for multiple different precisions: fp32, int8, uint8
2718
+
2719
+ You can find the detailed conversion steps in the file [convert.py](./convert.py)
2720
+
2721
+ ### Part 2. Running the model in Python with onnxruntime & compare results
2722
+
2723
+ Once the fixes are applied, we proceed to test the correctness of the ONNX model by calculating cosine similarity between two texts using native SentenceTransformers and ONNX runtime, comparing the results.
2724
+
2725
+ If the results are identical, it confirms that the ONNX model gives the same result as the native models, validating its correctness and suitability for further use in the database.
2726
+
2727
+
2728
+ ```python
2729
+ import onnxruntime as rt
2730
+
2731
+ from sentence_transformers.util import cos_sim
2732
+ from sentence_transformers import SentenceTransformer
2733
+
2734
+ import transformers
2735
+
2736
+
2737
+ sentences_1 = 'How is the weather today?'
2738
+ sentences_2 = 'What is the current weather like today?'
2739
+
2740
+ # Calculate ONNX result
2741
+ tokenizer = transformers.AutoTokenizer.from_pretrained("Alibaba-NLP/gte-large-en-v1.5")
2742
+ predef_sess = rt.InferenceSession("onnx/model.onnx")
2743
+
2744
+ enc1 = tokenizer(sentences_1)
2745
+ embeddings_1_onnx = predef_sess.run(None, {"input_ids": [enc1.input_ids],
2746
+ "attention_mask": [enc1.attention_mask]})
2747
+
2748
+ enc2 = tokenizer(sentences_2)
2749
+ embeddings_2_onnx = predef_sess.run(None, {"input_ids": [enc2.input_ids],
2750
+ "attention_mask": [enc2.attention_mask]})
2751
+
2752
+
2753
+ # Calculate embeddings with SentenceTransformer
2754
+ model = SentenceTransformer(model_id, trust_remote_code=True)
2755
+ embeddings_1_sentence_transformer = model.encode(sentences_1, normalize_embeddings=True, trust_remote_code=True)
2756
+ embeddings_2_sentence_transformer = model.encode(sentences_2, normalize_embeddings=True, trust_remote_code=True)
2757
+
2758
+ # Compare results
2759
+ print("Cosine similiarity for embeddings calculated with ONNX:" + str(cos_sim(embeddings_1_onnx[1][0], embeddings_2_onnx[1][0])))
2760
+ print("Cosine similiarity for embeddings calculated with SentenceTransformer:" + str(cos_sim(embeddings_1_sentence_transformer, embeddings_2_sentence_transformer)))
2761
+ ```
2762
+
2763
+ You can find the detailed ONNX vs. SentenceTransformer result comparison steps in the file [test_local.py](./test_local.py)
2764
+
config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ad27385d728efc552e2a2c42e1b15353554990ed47f6e161bdc9073caaaec5b
3
+ size 1575
conversion_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9398aec0d299affc8b33da6610e7a0e7711bf3d762e8a556e26f5ed48c47f40f
3
+ size 291
convert.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import shutil
4
+
5
+ from optimum.exporters.onnx import main_export
6
+ import onnx
7
+ from onnxconverter_common import float16
8
+ import onnxruntime as rt
9
+ from onnxruntime.tools.onnx_model_utils import *
10
+ from onnxruntime.quantization import quantize_dynamic, QuantType
11
+
12
+ with open('conversion_config.json') as json_file:
13
+ conversion_config = json.load(json_file)
14
+
15
+
16
+ model_id = conversion_config["model_id"]
17
+ number_of_generated_embeddings = conversion_config["number_of_generated_embeddings"]
18
+ precision_to_filename_map = conversion_config["precision_to_filename_map"]
19
+ opset = conversion_config["opset"]
20
+ IR = conversion_config["IR"]
21
+
22
+
23
+ op = onnx.OperatorSetIdProto()
24
+ op.version = opset
25
+
26
+
27
+ if not os.path.exists("onnx"):
28
+ os.makedirs("onnx")
29
+
30
+ print("Exporting the main model version")
31
+
32
+ main_export(model_name_or_path=model_id, output="./", opset=opset, trust_remote_code=True, task="feature-extraction", dtype="fp32")
33
+
34
+ if "fp32" in precision_to_filename_map:
35
+ print("Exporting the fp32 onnx file...")
36
+
37
+ shutil.copyfile('model.onnx', precision_to_filename_map["fp32"])
38
+
39
+ print("Done\n\n")
40
+
41
+ if "int8" in precision_to_filename_map:
42
+ print("Quantizing fp32 model to int8...")
43
+ quantize_dynamic("model.onnx", precision_to_filename_map["int8"], weight_type=QuantType.QInt8)
44
+ print("Done\n\n")
45
+
46
+ if "uint8" in precision_to_filename_map:
47
+ print("Quantizing fp32 model to uint8...")
48
+ quantize_dynamic("model.onnx", precision_to_filename_map["uint8"], weight_type=QuantType.QUInt8)
49
+ print("Done\n\n")
50
+
51
+ os.remove("model.onnx")
convert.py.bk ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import shutil
4
+
5
+ from optimum.exporters.onnx import main_export
6
+ import onnx
7
+ from onnxconverter_common import float16
8
+ import onnxruntime as rt
9
+ from onnxruntime.tools.onnx_model_utils import *
10
+ from onnxruntime.quantization import quantize_dynamic, QuantType
11
+ from huggingface_hub import hf_hub_download
12
+ import transformers
13
+
14
+
15
+ with open('conversion_config.json') as json_file:
16
+ conversion_config = json.load(json_file)
17
+
18
+
19
+ model_id = conversion_config["model_id"]
20
+ number_of_generated_embeddings = conversion_config["number_of_generated_embeddings"]
21
+ precision_to_filename_map = conversion_config["precision_to_filename_map"]
22
+ opset = conversion_config["opset"]
23
+ IR = conversion_config["IR"]
24
+
25
+
26
+ op = onnx.OperatorSetIdProto()
27
+ op.version = opset
28
+
29
+ print("Exporting tokenizer...")
30
+
31
+ tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
32
+ tokenizer.save_pretrained("./")
33
+
34
+ print("Done\n\n")
35
+
36
+
37
+ if not os.path.exists("onnx"):
38
+ os.makedirs("onnx")
39
+
40
+ if "fp32" in precision_to_filename_map:
41
+ print("Exporting the fp32 onnx file...")
42
+
43
+ filename = precision_to_filename_map['fp32']
44
+
45
+ hf_hub_download(repo_id=model_id, filename=filename, local_dir = "./")
46
+ model = onnx.load(filename)
47
+ model_fixed = onnx.helper.make_model(model.graph, ir_version = IR, opset_imports = [op]) #to be sure that we have compatible opset and IR version
48
+ onnx.save(model_fixed, filename)
49
+
50
+ print("Done\n\n")
51
+
52
+
53
+ if "int8" in precision_to_filename_map:
54
+ print("Exporting the int8 onnx file...")
55
+
56
+
57
+ filename = precision_to_filename_map['int8']
58
+
59
+ hf_hub_download(repo_id=model_id, filename=filename, local_dir = "./")
60
+ model = onnx.load(filename)
61
+ model_fixed = onnx.helper.make_model(model.graph, ir_version = IR, opset_imports = [op]) #to be sure that we have compatible opset and IR version
62
+ onnx.save(model_fixed, filename)
63
+
64
+
65
+ print("Done\n\n")
66
+
67
+ if "uint8" in precision_to_filename_map:
68
+ print("Exporting the uint8 onnx file...")
69
+
70
+ filename = precision_to_filename_map['uint8']
71
+
72
+ hf_hub_download(repo_id=model_id, filename=filename, local_dir = "./")
73
+ model = onnx.load(filename)
74
+ model_fixed = onnx.helper.make_model(model.graph, ir_version = IR, opset_imports = [op]) #to be sure that we have compatible opset and IR version
75
+ onnx.save(model_fixed, filename)
76
+
77
+
78
+ print("Done\n\n")
79
+
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06ed0e91abae57c89a61c2a1717c166900dbd987b6ed457a540ea47706b32b50
3
+ size 1745854634
onnx/model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a1fcd18e147eb97c14830c8570bc019399edbfb34185a60159f946e19cb360b
3
+ size 444595156
onnx/model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be182274e8d591c683a5f43a7361998e88454cc6955708257e9104b5908ad546
3
+ size 444595204
special_tokens_map.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d5b662e421ea9fac075174bb0688ee0d9431699900b90662acd44b2a350503a
3
+ size 695
test_local.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import onnxruntime as rt
2
+
3
+ from sentence_transformers.util import cos_sim
4
+ from sentence_transformers import SentenceTransformer
5
+
6
+ import transformers
7
+
8
+ import gc
9
+ import json
10
+
11
+
12
+ with open('conversion_config.json') as json_file:
13
+ conversion_config = json.load(json_file)
14
+
15
+
16
+ model_id = conversion_config["model_id"]
17
+ number_of_generated_embeddings = conversion_config["number_of_generated_embeddings"]
18
+ precision_to_filename_map = conversion_config["precision_to_filename_map"]
19
+
20
+ sentences_1 = 'How is the weather today?'
21
+ sentences_2 = 'What is the current weather like today?'
22
+
23
+ print(f"Testing on cosine similiarity between sentences: \n'{sentences_1}'\n'{sentences_2}'\n\n\n")
24
+
25
+ tokenizer = transformers.AutoTokenizer.from_pretrained("./")
26
+ enc1 = tokenizer(sentences_1)
27
+ enc2 = tokenizer(sentences_2)
28
+
29
+ for precision, file_name in precision_to_filename_map.items():
30
+
31
+
32
+ onnx_session = rt.InferenceSession(file_name)
33
+ embeddings_1_onnx = onnx_session.run(None, {"input_ids": [enc1.input_ids],
34
+ "attention_mask": [enc1.attention_mask]})[1][0]
35
+
36
+ embeddings_2_onnx = onnx_session.run(None, {"input_ids": [enc2.input_ids],
37
+ "attention_mask": [enc2.attention_mask]})[1][0]
38
+
39
+ del onnx_session
40
+ gc.collect()
41
+ print(f'Cosine similiarity for ONNX model with precision "{precision}" is {str(cos_sim(embeddings_1_onnx, embeddings_2_onnx))}')
42
+
43
+
44
+
45
+
46
+ model = SentenceTransformer(model_id, trust_remote_code=True)
47
+ embeddings_1_sentence_transformer = model.encode(sentences_1, normalize_embeddings=True, trust_remote_code=True)
48
+ embeddings_2_sentence_transformer = model.encode(sentences_2, normalize_embeddings=True, trust_remote_code=True)
49
+ print('Cosine similiarity for original sentence transformer model is '+str(cos_sim(embeddings_1_sentence_transformer, embeddings_2_sentence_transformer)))
test_teradata.py ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ import teradataml as tdml
3
+ from tabulate import tabulate
4
+
5
+ import json
6
+
7
+
8
+ with open('conversion_config.json') as json_file:
9
+ conversion_config = json.load(json_file)
10
+
11
+
12
+ model_id = conversion_config["model_id"]
13
+ number_of_generated_embeddings = conversion_config["number_of_generated_embeddings"]
14
+ precision_to_filename_map = conversion_config["precision_to_filename_map"]
15
+
16
+ host = sys.argv[1]
17
+ username = sys.argv[2]
18
+ password = sys.argv[3]
19
+
20
+ print("Setting up connection to teradata...")
21
+ tdml.create_context(host = host, username = username, password = password)
22
+ print("Done\n\n")
23
+
24
+
25
+ print("Deploying tokenizer...")
26
+ try:
27
+ tdml.db_drop_table('tokenizer_table')
28
+ except:
29
+ print("Can't drop tokenizers table - it's not existing")
30
+ tdml.save_byom('tokenizer',
31
+ 'tokenizer.json',
32
+ 'tokenizer_table')
33
+ print("Done\n\n")
34
+
35
+ print("Testing models...")
36
+ try:
37
+ tdml.db_drop_table('model_table')
38
+ except:
39
+ print("Can't drop models table - it's not existing")
40
+
41
+ for precision, file_name in precision_to_filename_map.items():
42
+ print(f"Deploying {precision} model...")
43
+ tdml.save_byom(precision,
44
+ file_name,
45
+ 'model_table')
46
+ print(f"Model {precision} is deployed\n")
47
+
48
+ print(f"Calculating embeddings with {precision} model...")
49
+ try:
50
+ tdml.db_drop_table('emails_embeddings_store')
51
+ except:
52
+ print("Can't drop embeddings table - it's not existing")
53
+
54
+ tdml.execute_sql(f"""
55
+ create volatile table emails_embeddings_store as (
56
+ select
57
+ *
58
+ from mldb.ONNXEmbeddings(
59
+ on emails.emails as InputTable
60
+ on (select * from model_table where model_id = '{precision}') as ModelTable DIMENSION
61
+ on (select model as tokenizer from tokenizer_table where model_id = 'tokenizer') as TokenizerTable DIMENSION
62
+
63
+ using
64
+ Accumulate('id', 'txt')
65
+ ModelOutputTensor('sentence_embedding')
66
+ EnableMemoryCheck('false')
67
+ OutputFormat('FLOAT32({number_of_generated_embeddings})')
68
+ OverwriteCachedModel('true')
69
+ ) a
70
+ ) with data on commit preserve rows
71
+
72
+ """)
73
+ print("Embeddings calculated")
74
+ print(f"Testing semantic search with cosine similiarity on the output of the model with precision '{precision}'...")
75
+ tdf_embeddings_store = tdml.DataFrame('emails_embeddings_store')
76
+ tdf_embeddings_store_tgt = tdf_embeddings_store[tdf_embeddings_store.id == 3]
77
+
78
+ tdf_embeddings_store_ref = tdf_embeddings_store[tdf_embeddings_store.id != 3]
79
+
80
+ cos_sim_pd = tdml.DataFrame.from_query(f"""
81
+ SELECT
82
+ dt.target_id,
83
+ dt.reference_id,
84
+ e_tgt.txt as target_txt,
85
+ e_ref.txt as reference_txt,
86
+ (1.0 - dt.distance) as similiarity
87
+ FROM
88
+ TD_VECTORDISTANCE (
89
+ ON ({tdf_embeddings_store_tgt.show_query()}) AS TargetTable
90
+ ON ({tdf_embeddings_store_ref.show_query()}) AS ReferenceTable DIMENSION
91
+ USING
92
+ TargetIDColumn('id')
93
+ TargetFeatureColumns('[emb_0:emb_{number_of_generated_embeddings - 1}]')
94
+ RefIDColumn('id')
95
+ RefFeatureColumns('[emb_0:emb_{number_of_generated_embeddings - 1}]')
96
+ DistanceMeasure('cosine')
97
+ topk(3)
98
+ ) AS dt
99
+ JOIN emails.emails e_tgt on e_tgt.id = dt.target_id
100
+ JOIN emails.emails e_ref on e_ref.id = dt.reference_id;
101
+ """).to_pandas()
102
+ print(tabulate(cos_sim_pd, headers='keys', tablefmt='fancy_grid'))
103
+ print("Done\n\n")
104
+
105
+
106
+ tdml.remove_context()
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb374d6bc042c22455946f4e09a89d29882a199fdaf8fb25be00dc8b8857a448
3
+ size 711661
tokenizer_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19fa014d59708114b5913495534d97efaa3316a4f9227e87861d9c7e0840df2e
3
+ size 1414
vocab.txt ADDED
The diff for this file is too large to render. See raw diff