lw2134 commited on
Commit
a551f5a
1 Parent(s): cdf83da

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,654 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Alibaba-NLP/gte-large-en-v1.5
3
+ library_name: sentence-transformers
4
+ metrics:
5
+ - cosine_accuracy@1
6
+ - cosine_accuracy@3
7
+ - cosine_accuracy@5
8
+ - cosine_accuracy@10
9
+ - cosine_precision@1
10
+ - cosine_precision@3
11
+ - cosine_precision@5
12
+ - cosine_precision@10
13
+ - cosine_recall@1
14
+ - cosine_recall@3
15
+ - cosine_recall@5
16
+ - cosine_recall@10
17
+ - cosine_ndcg@10
18
+ - cosine_mrr@10
19
+ - cosine_map@100
20
+ - dot_accuracy@1
21
+ - dot_accuracy@3
22
+ - dot_accuracy@5
23
+ - dot_accuracy@10
24
+ - dot_precision@1
25
+ - dot_precision@3
26
+ - dot_precision@5
27
+ - dot_precision@10
28
+ - dot_recall@1
29
+ - dot_recall@3
30
+ - dot_recall@5
31
+ - dot_recall@10
32
+ - dot_ndcg@10
33
+ - dot_mrr@10
34
+ - dot_map@100
35
+ pipeline_tag: sentence-similarity
36
+ tags:
37
+ - sentence-transformers
38
+ - sentence-similarity
39
+ - feature-extraction
40
+ - generated_from_trainer
41
+ - dataset_size:700
42
+ - loss:MatryoshkaLoss
43
+ - loss:MultipleNegativesRankingLoss
44
+ widget:
45
+ - source_sentence: What are the expectations for automated systems in relation to
46
+ data privacy?
47
+ sentences:
48
+ - 'https://beta.nsf.gov/funding/opportunities/designing-accountable-software-systems-dass
49
+
50
+ 28. The Leadership Conference Education Fund. The Use Of Pretrial “Risk Assessment”
51
+ Instruments: A
52
+
53
+ Shared Statement Of Civil Rights Concerns. Jul. 30, 2018. http://civilrightsdocs.info/pdf/criminal-justice/
54
+
55
+ Pretrial-Risk-Assessment-Short.pdf; https://civilrights.org/edfund/pretrial-risk-assessments/'
56
+ - "DATA PRIVACY \nWHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS\nThe expectations\
57
+ \ for automated systems are meant to serve as a blueprint for the development\
58
+ \ of additional \ntechnical standards and practices that are tailored for particular\
59
+ \ sectors and contexts. ­­­­­­\nIn addition to the privacy expectations above\
60
+ \ for general non-sensitive data, any system collecting, using, shar-"
61
+ - "standing that it may be these users who are most likely to need the human assistance.\
62
+ \ Similarly, it should be \ntested to ensure that users with disabilities are\
63
+ \ able to find and use human consideration and fallback and also \nrequest reasonable\
64
+ \ accommodations or modifications. \nConvenient. Mechanisms for human consideration\
65
+ \ and fallback should not be unreasonably burdensome as \ncompared to the automated\
66
+ \ system’s equivalent. \n49"
67
+ - source_sentence: What is the purpose of the U.S. AI Safety Institute and the AI
68
+ Safety Institute Consortium established by NIST?
69
+ sentences:
70
+ - "AI. NIST established the U.S. AI Safety Institute and the companion AI Safety\
71
+ \ Institute Consortium to \ncontinue the efforts set in motion by the E.O. to build\
72
+ \ the science necessary for safe, secure, and \ntrustworthy development and use\
73
+ \ of AI. \nAcknowledgments: This report was accomplished with the many helpful\
74
+ \ comments and contributions"
75
+ - "SAFE AND EFFECTIVE \nSYSTEMS \nWHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS\n\
76
+ The expectations for automated systems are meant to serve as a blueprint for the\
77
+ \ development of additional \ntechnical standards and practices that are tailored\
78
+ \ for particular sectors and contexts. \nOngoing monitoring. Automated systems\
79
+ \ should have ongoing monitoring procedures, including recalibra­"
80
+ - "differ from an explanation provided to allow for the possibility of recourse,\
81
+ \ an appeal, or one provided in the \ncontext of a dispute or contestation process.\
82
+ \ For the purposes of this framework, 'explanation' should be \nconstrued broadly.\
83
+ \ An explanation need not be a plain-language statement about causality but could\
84
+ \ consist of \nany mechanism that allows the recipient to build the necessary\
85
+ \ understanding and intuitions to achieve the"
86
+ - source_sentence: What are the consequences faced by individuals when they are unable
87
+ to reach a human decision-maker in automated systems?
88
+ sentences:
89
+ - 'ENDNOTES
90
+
91
+ 85. Mick Dumke and Frank Main. A look inside the watch list Chicago police fought
92
+ to keep secret. The
93
+
94
+ Chicago Sun Times. May 18, 2017.
95
+
96
+ https://chicago.suntimes.com/2017/5/18/18386116/a-look-inside-the-watch-list-chicago-police-fought­
97
+
98
+ to-keep-secret'
99
+ - "presented with no alternative, or are forced to endure a cumbersome process to\
100
+ \ reach a human decision-maker once \nthey decide they no longer want to deal\
101
+ \ exclusively with the automated system or be impacted by its results. As a result\
102
+ \ \nof this lack of human reconsideration, many receive delayed access, or lose\
103
+ \ access, to rights, opportunities, benefits, \nand critical services. The American\
104
+ \ public deserves the assurance that, when rights, opportunities, or access are"
105
+ - "compliance in mind. \nSome state legislatures have placed strong transparency\
106
+ \ and validity requirements on \nthe use of pretrial risk assessments. The use\
107
+ \ of algorithmic pretrial risk assessments has been a \ncause of concern for civil\
108
+ \ rights groups.28 Idaho Code Section 19-1910, enacted in 2019,29 requires that\
109
+ \ any \npretrial risk assessment, before use in the state, first be \"shown to\
110
+ \ be free of bias against any class of"
111
+ - source_sentence: What organizations are mentioned in the appendix alongside individuals
112
+ such as Lisa Feldman Barrett and Madeline Owens?
113
+ sentences:
114
+ - "APPENDIX\nLisa Feldman Barrett \nMadeline Owens \nMarsha Tudor \nMicrosoft Corporation\
115
+ \ \nMITRE Corporation \nNational Association for the \nAdvancement of Colored\
116
+ \ People \nLegal Defense and Educational \nFund \nNational Association of Criminal\
117
+ \ \nDefense Lawyers \nNational Center for Missing & \nExploited Children \nNational\
118
+ \ Fair Housing Alliance \nNational Immigration Law Center \nNEC Corporation of\
119
+ \ America"
120
+ - "or label to ensure the goal of the automated system is appropriately identified\
121
+ \ and measured. Additionally, \njustification should be documented for each data\
122
+ \ attribute and source to explain why it is appropriate to use \nthat data to\
123
+ \ inform the results of the automated system and why such use will not violate\
124
+ \ any applicable laws. \nIn cases of high-dimensional and/or derived attributes,\
125
+ \ such justifications can be provided as overall \ndescriptions of the attribute\
126
+ \ generation process and appropriateness. \n19"
127
+ - "ers and other experts across fields and sectors, as well as policymakers throughout\
128
+ \ the Federal government—on \nthe issue of algorithmic and data-driven harms and\
129
+ \ potential remedies. Through panel discussions, public listen-\ning sessions,\
130
+ \ meetings, a formal request for information, and input to a publicly accessible\
131
+ \ and widely-publicized \nemail address, people throughout the United States,\
132
+ \ public servants across Federal agencies, and members of the"
133
+ - source_sentence: What should individuals or organizations provide to ensure that
134
+ people impacted by an automated system are informed about significant changes
135
+ in use cases or key functionalities?
136
+ sentences:
137
+ - "with an intent or reasonably foreseeable possibility of endangering \nyour safety\
138
+ \ or the safety of your community. They should be designed \nto proactively protect\
139
+ \ you from harms stemming from unintended, \nyet foreseeable, uses or impacts\
140
+ \ of automated systems. You should be \nprotected from inappropriate or irrelevant\
141
+ \ data use in the design, de­\nvelopment, and deployment of automated systems,\
142
+ \ and from the \ncompounded harm of its reuse. Independent evaluation and report­"
143
+ - "use, the individual or organization responsible for the system, and ex­\nplanations\
144
+ \ of outcomes that are clear, timely, and accessible. Such \nnotice should be\
145
+ \ kept up-to-date and people impacted by the system \nshould be notified of significant\
146
+ \ use case or key functionality chang­\nes. You should know how and why an outcome\
147
+ \ impacting you was de­\ntermined by an automated system, including when the automated"
148
+ - 'software-algorithms-and-artificial-intelligence; U.S. Department of Justice.
149
+ Algorithms, Artificial
150
+
151
+ Intelligence, and Disability Discrimination in Hiring. May 12, 2022. https://beta.ada.gov/resources/ai­
152
+
153
+ guidance/
154
+
155
+ 54. Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan.
156
+ Dissecting racial bias in'
157
+ model-index:
158
+ - name: SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5
159
+ results:
160
+ - task:
161
+ type: information-retrieval
162
+ name: Information Retrieval
163
+ dataset:
164
+ name: Unknown
165
+ type: unknown
166
+ metrics:
167
+ - type: cosine_accuracy@1
168
+ value: 0.8666666666666667
169
+ name: Cosine Accuracy@1
170
+ - type: cosine_accuracy@3
171
+ value: 0.9866666666666667
172
+ name: Cosine Accuracy@3
173
+ - type: cosine_accuracy@5
174
+ value: 1.0
175
+ name: Cosine Accuracy@5
176
+ - type: cosine_accuracy@10
177
+ value: 1.0
178
+ name: Cosine Accuracy@10
179
+ - type: cosine_precision@1
180
+ value: 0.8666666666666667
181
+ name: Cosine Precision@1
182
+ - type: cosine_precision@3
183
+ value: 0.3288888888888888
184
+ name: Cosine Precision@3
185
+ - type: cosine_precision@5
186
+ value: 0.19999999999999996
187
+ name: Cosine Precision@5
188
+ - type: cosine_precision@10
189
+ value: 0.09999999999999998
190
+ name: Cosine Precision@10
191
+ - type: cosine_recall@1
192
+ value: 0.8666666666666667
193
+ name: Cosine Recall@1
194
+ - type: cosine_recall@3
195
+ value: 0.9866666666666667
196
+ name: Cosine Recall@3
197
+ - type: cosine_recall@5
198
+ value: 1.0
199
+ name: Cosine Recall@5
200
+ - type: cosine_recall@10
201
+ value: 1.0
202
+ name: Cosine Recall@10
203
+ - type: cosine_ndcg@10
204
+ value: 0.9481205912028868
205
+ name: Cosine Ndcg@10
206
+ - type: cosine_mrr@10
207
+ value: 0.93
208
+ name: Cosine Mrr@10
209
+ - type: cosine_map@100
210
+ value: 0.93
211
+ name: Cosine Map@100
212
+ - type: dot_accuracy@1
213
+ value: 0.8666666666666667
214
+ name: Dot Accuracy@1
215
+ - type: dot_accuracy@3
216
+ value: 1.0
217
+ name: Dot Accuracy@3
218
+ - type: dot_accuracy@5
219
+ value: 1.0
220
+ name: Dot Accuracy@5
221
+ - type: dot_accuracy@10
222
+ value: 1.0
223
+ name: Dot Accuracy@10
224
+ - type: dot_precision@1
225
+ value: 0.8666666666666667
226
+ name: Dot Precision@1
227
+ - type: dot_precision@3
228
+ value: 0.33333333333333326
229
+ name: Dot Precision@3
230
+ - type: dot_precision@5
231
+ value: 0.19999999999999996
232
+ name: Dot Precision@5
233
+ - type: dot_precision@10
234
+ value: 0.09999999999999998
235
+ name: Dot Precision@10
236
+ - type: dot_recall@1
237
+ value: 0.8666666666666667
238
+ name: Dot Recall@1
239
+ - type: dot_recall@3
240
+ value: 1.0
241
+ name: Dot Recall@3
242
+ - type: dot_recall@5
243
+ value: 1.0
244
+ name: Dot Recall@5
245
+ - type: dot_recall@10
246
+ value: 1.0
247
+ name: Dot Recall@10
248
+ - type: dot_ndcg@10
249
+ value: 0.9490449037619082
250
+ name: Dot Ndcg@10
251
+ - type: dot_mrr@10
252
+ value: 0.9311111111111112
253
+ name: Dot Mrr@10
254
+ - type: dot_map@100
255
+ value: 0.931111111111111
256
+ name: Dot Map@100
257
+ ---
258
+
259
+ # SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5
260
+
261
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
262
+
263
+ ## Model Details
264
+
265
+ ### Model Description
266
+ - **Model Type:** Sentence Transformer
267
+ - **Base model:** [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) <!-- at revision 104333d6af6f97649377c2afbde10a7704870c7b -->
268
+ - **Maximum Sequence Length:** 8192 tokens
269
+ - **Output Dimensionality:** 1024 tokens
270
+ - **Similarity Function:** Cosine Similarity
271
+ - **Training Dataset:**
272
+ - json
273
+ <!-- - **Language:** Unknown -->
274
+ <!-- - **License:** Unknown -->
275
+
276
+ ### Model Sources
277
+
278
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
279
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
280
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
281
+
282
+ ### Full Model Architecture
283
+
284
+ ```
285
+ SentenceTransformer(
286
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel
287
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
288
+ )
289
+ ```
290
+
291
+ ## Usage
292
+
293
+ ### Direct Usage (Sentence Transformers)
294
+
295
+ First install the Sentence Transformers library:
296
+
297
+ ```bash
298
+ pip install -U sentence-transformers
299
+ ```
300
+
301
+ Then you can load this model and run inference.
302
+ ```python
303
+ from sentence_transformers import SentenceTransformer
304
+
305
+ # Download from the 🤗 Hub
306
+ model = SentenceTransformer("sentence_transformers_model_id")
307
+ # Run inference
308
+ sentences = [
309
+ 'What should individuals or organizations provide to ensure that people impacted by an automated system are informed about significant changes in use cases or key functionalities?',
310
+ 'use, the individual or organization responsible for the system, and ex\xad\nplanations of outcomes that are clear, timely, and accessible. Such \nnotice should be kept up-to-date and people impacted by the system \nshould be notified of significant use case or key functionality chang\xad\nes. You should know how and why an outcome impacting you was de\xad\ntermined by an automated system, including when the automated',
311
+ 'software-algorithms-and-artificial-intelligence; U.S. Department of Justice. Algorithms, Artificial\nIntelligence, and Disability Discrimination in Hiring. May 12, 2022. https://beta.ada.gov/resources/ai\xad\nguidance/\n54. Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. Dissecting racial bias in',
312
+ ]
313
+ embeddings = model.encode(sentences)
314
+ print(embeddings.shape)
315
+ # [3, 1024]
316
+
317
+ # Get the similarity scores for the embeddings
318
+ similarities = model.similarity(embeddings, embeddings)
319
+ print(similarities.shape)
320
+ # [3, 3]
321
+ ```
322
+
323
+ <!--
324
+ ### Direct Usage (Transformers)
325
+
326
+ <details><summary>Click to see the direct usage in Transformers</summary>
327
+
328
+ </details>
329
+ -->
330
+
331
+ <!--
332
+ ### Downstream Usage (Sentence Transformers)
333
+
334
+ You can finetune this model on your own dataset.
335
+
336
+ <details><summary>Click to expand</summary>
337
+
338
+ </details>
339
+ -->
340
+
341
+ <!--
342
+ ### Out-of-Scope Use
343
+
344
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
345
+ -->
346
+
347
+ ## Evaluation
348
+
349
+ ### Metrics
350
+
351
+ #### Information Retrieval
352
+
353
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
354
+
355
+ | Metric | Value |
356
+ |:--------------------|:---------|
357
+ | cosine_accuracy@1 | 0.8667 |
358
+ | cosine_accuracy@3 | 0.9867 |
359
+ | cosine_accuracy@5 | 1.0 |
360
+ | cosine_accuracy@10 | 1.0 |
361
+ | cosine_precision@1 | 0.8667 |
362
+ | cosine_precision@3 | 0.3289 |
363
+ | cosine_precision@5 | 0.2 |
364
+ | cosine_precision@10 | 0.1 |
365
+ | cosine_recall@1 | 0.8667 |
366
+ | cosine_recall@3 | 0.9867 |
367
+ | cosine_recall@5 | 1.0 |
368
+ | cosine_recall@10 | 1.0 |
369
+ | cosine_ndcg@10 | 0.9481 |
370
+ | cosine_mrr@10 | 0.93 |
371
+ | **cosine_map@100** | **0.93** |
372
+ | dot_accuracy@1 | 0.8667 |
373
+ | dot_accuracy@3 | 1.0 |
374
+ | dot_accuracy@5 | 1.0 |
375
+ | dot_accuracy@10 | 1.0 |
376
+ | dot_precision@1 | 0.8667 |
377
+ | dot_precision@3 | 0.3333 |
378
+ | dot_precision@5 | 0.2 |
379
+ | dot_precision@10 | 0.1 |
380
+ | dot_recall@1 | 0.8667 |
381
+ | dot_recall@3 | 1.0 |
382
+ | dot_recall@5 | 1.0 |
383
+ | dot_recall@10 | 1.0 |
384
+ | dot_ndcg@10 | 0.949 |
385
+ | dot_mrr@10 | 0.9311 |
386
+ | dot_map@100 | 0.9311 |
387
+
388
+ <!--
389
+ ## Bias, Risks and Limitations
390
+
391
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
392
+ -->
393
+
394
+ <!--
395
+ ### Recommendations
396
+
397
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
398
+ -->
399
+
400
+ ## Training Details
401
+
402
+ ### Training Dataset
403
+
404
+ #### json
405
+
406
+ * Dataset: json
407
+ * Size: 700 training samples
408
+ * Columns: <code>anchor</code> and <code>positive</code>
409
+ * Approximate statistics based on the first 700 samples:
410
+ | | anchor | positive |
411
+ |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
412
+ | type | string | string |
413
+ | details | <ul><li>min: 12 tokens</li><li>mean: 22.12 tokens</li><li>max: 44 tokens</li></ul> | <ul><li>min: 11 tokens</li><li>mean: 80.96 tokens</li><li>max: 571 tokens</li></ul> |
414
+ * Samples:
415
+ | anchor | positive |
416
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
417
+ | <code>What is the primary purpose of the AI Bill of Rights outlined in the October 2022 blueprint?</code> | <code>BLUEPRINT FOR AN <br>AI BILL OF <br>RIGHTS <br>MAKING AUTOMATED <br>SYSTEMS WORK FOR <br>THE AMERICAN PEOPLE <br>OCTOBER 2022</code> |
418
+ | <code>What is the purpose of the Blueprint for an AI Bill of Rights published by the White House Office of Science and Technology Policy?</code> | <code>About this Document <br>The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People was <br>published by the White House Office of Science and Technology Policy in October 2022. This framework was <br>released one year after OSTP announced the launch of a process to develop “a bill of rights for an AI-powered</code> |
419
+ | <code>What initiative did the OSTP announce a year prior to the release of the framework for a bill of rights for an AI-powered world?</code> | <code>released one year after OSTP announced the launch of a process to develop “a bill of rights for an AI-powered <br>world.” Its release follows a year of public engagement to inform this initiative. The framework is available <br>online at: https://www.whitehouse.gov/ostp/ai-bill-of-rights <br>About the Office of Science and Technology Policy <br>The Office of Science and Technology Policy (OSTP) was established by the National Science and Technology</code> |
420
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
421
+ ```json
422
+ {
423
+ "loss": "MultipleNegativesRankingLoss",
424
+ "matryoshka_dims": [
425
+ 1024,
426
+ 512,
427
+ 256,
428
+ 128,
429
+ 64
430
+ ],
431
+ "matryoshka_weights": [
432
+ 1,
433
+ 1,
434
+ 1,
435
+ 1,
436
+ 1
437
+ ],
438
+ "n_dims_per_step": -1
439
+ }
440
+ ```
441
+
442
+ ### Training Hyperparameters
443
+ #### Non-Default Hyperparameters
444
+
445
+ - `eval_strategy`: epoch
446
+ - `per_device_train_batch_size`: 32
447
+ - `per_device_eval_batch_size`: 16
448
+ - `gradient_accumulation_steps`: 16
449
+ - `learning_rate`: 2e-05
450
+ - `num_train_epochs`: 7
451
+ - `lr_scheduler_type`: cosine
452
+ - `warmup_ratio`: 0.1
453
+ - `bf16`: True
454
+ - `tf32`: True
455
+ - `load_best_model_at_end`: True
456
+ - `optim`: adamw_torch_fused
457
+ - `batch_sampler`: no_duplicates
458
+
459
+ #### All Hyperparameters
460
+ <details><summary>Click to expand</summary>
461
+
462
+ - `overwrite_output_dir`: False
463
+ - `do_predict`: False
464
+ - `eval_strategy`: epoch
465
+ - `prediction_loss_only`: True
466
+ - `per_device_train_batch_size`: 32
467
+ - `per_device_eval_batch_size`: 16
468
+ - `per_gpu_train_batch_size`: None
469
+ - `per_gpu_eval_batch_size`: None
470
+ - `gradient_accumulation_steps`: 16
471
+ - `eval_accumulation_steps`: None
472
+ - `torch_empty_cache_steps`: None
473
+ - `learning_rate`: 2e-05
474
+ - `weight_decay`: 0.0
475
+ - `adam_beta1`: 0.9
476
+ - `adam_beta2`: 0.999
477
+ - `adam_epsilon`: 1e-08
478
+ - `max_grad_norm`: 1.0
479
+ - `num_train_epochs`: 7
480
+ - `max_steps`: -1
481
+ - `lr_scheduler_type`: cosine
482
+ - `lr_scheduler_kwargs`: {}
483
+ - `warmup_ratio`: 0.1
484
+ - `warmup_steps`: 0
485
+ - `log_level`: passive
486
+ - `log_level_replica`: warning
487
+ - `log_on_each_node`: True
488
+ - `logging_nan_inf_filter`: True
489
+ - `save_safetensors`: True
490
+ - `save_on_each_node`: False
491
+ - `save_only_model`: False
492
+ - `restore_callback_states_from_checkpoint`: False
493
+ - `no_cuda`: False
494
+ - `use_cpu`: False
495
+ - `use_mps_device`: False
496
+ - `seed`: 42
497
+ - `data_seed`: None
498
+ - `jit_mode_eval`: False
499
+ - `use_ipex`: False
500
+ - `bf16`: True
501
+ - `fp16`: False
502
+ - `fp16_opt_level`: O1
503
+ - `half_precision_backend`: auto
504
+ - `bf16_full_eval`: False
505
+ - `fp16_full_eval`: False
506
+ - `tf32`: True
507
+ - `local_rank`: 0
508
+ - `ddp_backend`: None
509
+ - `tpu_num_cores`: None
510
+ - `tpu_metrics_debug`: False
511
+ - `debug`: []
512
+ - `dataloader_drop_last`: False
513
+ - `dataloader_num_workers`: 0
514
+ - `dataloader_prefetch_factor`: None
515
+ - `past_index`: -1
516
+ - `disable_tqdm`: False
517
+ - `remove_unused_columns`: True
518
+ - `label_names`: None
519
+ - `load_best_model_at_end`: True
520
+ - `ignore_data_skip`: False
521
+ - `fsdp`: []
522
+ - `fsdp_min_num_params`: 0
523
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
524
+ - `fsdp_transformer_layer_cls_to_wrap`: None
525
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
526
+ - `deepspeed`: None
527
+ - `label_smoothing_factor`: 0.0
528
+ - `optim`: adamw_torch_fused
529
+ - `optim_args`: None
530
+ - `adafactor`: False
531
+ - `group_by_length`: False
532
+ - `length_column_name`: length
533
+ - `ddp_find_unused_parameters`: None
534
+ - `ddp_bucket_cap_mb`: None
535
+ - `ddp_broadcast_buffers`: False
536
+ - `dataloader_pin_memory`: True
537
+ - `dataloader_persistent_workers`: False
538
+ - `skip_memory_metrics`: True
539
+ - `use_legacy_prediction_loop`: False
540
+ - `push_to_hub`: False
541
+ - `resume_from_checkpoint`: None
542
+ - `hub_model_id`: None
543
+ - `hub_strategy`: every_save
544
+ - `hub_private_repo`: False
545
+ - `hub_always_push`: False
546
+ - `gradient_checkpointing`: False
547
+ - `gradient_checkpointing_kwargs`: None
548
+ - `include_inputs_for_metrics`: False
549
+ - `eval_do_concat_batches`: True
550
+ - `fp16_backend`: auto
551
+ - `push_to_hub_model_id`: None
552
+ - `push_to_hub_organization`: None
553
+ - `mp_parameters`:
554
+ - `auto_find_batch_size`: False
555
+ - `full_determinism`: False
556
+ - `torchdynamo`: None
557
+ - `ray_scope`: last
558
+ - `ddp_timeout`: 1800
559
+ - `torch_compile`: False
560
+ - `torch_compile_backend`: None
561
+ - `torch_compile_mode`: None
562
+ - `dispatch_batches`: None
563
+ - `split_batches`: None
564
+ - `include_tokens_per_second`: False
565
+ - `include_num_input_tokens_seen`: False
566
+ - `neftune_noise_alpha`: None
567
+ - `optim_target_modules`: None
568
+ - `batch_eval_metrics`: False
569
+ - `eval_on_start`: False
570
+ - `eval_use_gather_object`: False
571
+ - `batch_sampler`: no_duplicates
572
+ - `multi_dataset_batch_sampler`: proportional
573
+
574
+ </details>
575
+
576
+ ### Training Logs
577
+ | Epoch | Step | cosine_map@100 |
578
+ |:----------:|:-----:|:--------------:|
579
+ | 0.7273 | 1 | 0.8548 |
580
+ | 1.4545 | 2 | 0.8811 |
581
+ | 2.9091 | 4 | 0.9233 |
582
+ | **3.6364** | **5** | **0.9311** |
583
+ | 4.3636 | 6 | 0.93 |
584
+ | 5.0909 | 7 | 0.93 |
585
+
586
+ * The bold row denotes the saved checkpoint.
587
+
588
+ ### Framework Versions
589
+ - Python: 3.10.12
590
+ - Sentence Transformers: 3.1.1
591
+ - Transformers: 4.44.2
592
+ - PyTorch: 2.4.1+cu121
593
+ - Accelerate: 0.34.2
594
+ - Datasets: 3.0.1
595
+ - Tokenizers: 0.19.1
596
+
597
+ ## Citation
598
+
599
+ ### BibTeX
600
+
601
+ #### Sentence Transformers
602
+ ```bibtex
603
+ @inproceedings{reimers-2019-sentence-bert,
604
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
605
+ author = "Reimers, Nils and Gurevych, Iryna",
606
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
607
+ month = "11",
608
+ year = "2019",
609
+ publisher = "Association for Computational Linguistics",
610
+ url = "https://arxiv.org/abs/1908.10084",
611
+ }
612
+ ```
613
+
614
+ #### MatryoshkaLoss
615
+ ```bibtex
616
+ @misc{kusupati2024matryoshka,
617
+ title={Matryoshka Representation Learning},
618
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
619
+ year={2024},
620
+ eprint={2205.13147},
621
+ archivePrefix={arXiv},
622
+ primaryClass={cs.LG}
623
+ }
624
+ ```
625
+
626
+ #### MultipleNegativesRankingLoss
627
+ ```bibtex
628
+ @misc{henderson2017efficient,
629
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
630
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
631
+ year={2017},
632
+ eprint={1705.00652},
633
+ archivePrefix={arXiv},
634
+ primaryClass={cs.CL}
635
+ }
636
+ ```
637
+
638
+ <!--
639
+ ## Glossary
640
+
641
+ *Clearly define terms in order to be accessible across audiences.*
642
+ -->
643
+
644
+ <!--
645
+ ## Model Card Authors
646
+
647
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
648
+ -->
649
+
650
+ <!--
651
+ ## Model Card Contact
652
+
653
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
654
+ -->
config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Alibaba-NLP/gte-large-en-v1.5",
3
+ "architectures": [
4
+ "NewModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "Alibaba-NLP/new-impl--configuration.NewConfig",
9
+ "AutoModel": "Alibaba-NLP/new-impl--modeling.NewModel",
10
+ "AutoModelForMaskedLM": "Alibaba-NLP/new-impl--modeling.NewForMaskedLM",
11
+ "AutoModelForMultipleChoice": "Alibaba-NLP/new-impl--modeling.NewForMultipleChoice",
12
+ "AutoModelForQuestionAnswering": "Alibaba-NLP/new-impl--modeling.NewForQuestionAnswering",
13
+ "AutoModelForSequenceClassification": "Alibaba-NLP/new-impl--modeling.NewForSequenceClassification",
14
+ "AutoModelForTokenClassification": "Alibaba-NLP/new-impl--modeling.NewForTokenClassification"
15
+ },
16
+ "classifier_dropout": null,
17
+ "hidden_act": "gelu",
18
+ "hidden_dropout_prob": 0.1,
19
+ "hidden_size": 1024,
20
+ "initializer_range": 0.02,
21
+ "intermediate_size": 4096,
22
+ "layer_norm_eps": 1e-12,
23
+ "layer_norm_type": "layer_norm",
24
+ "logn_attention_clip1": false,
25
+ "logn_attention_scale": false,
26
+ "max_position_embeddings": 8192,
27
+ "model_type": "new",
28
+ "num_attention_heads": 16,
29
+ "num_hidden_layers": 24,
30
+ "pack_qkv": true,
31
+ "pad_token_id": 0,
32
+ "position_embedding_type": "rope",
33
+ "rope_scaling": {
34
+ "factor": 2.0,
35
+ "type": "ntk"
36
+ },
37
+ "rope_theta": 160000,
38
+ "torch_dtype": "float32",
39
+ "transformers_version": "4.44.2",
40
+ "type_vocab_size": 2,
41
+ "unpad_inputs": false,
42
+ "use_memory_efficient_attention": false,
43
+ "vocab_size": 30528
44
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.1",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.4.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da0e4c34ce5110304fcb7db6b01ebfd86ebdea47ea47609af972db0e016e3863
3
+ size 1736585680
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
onnx/config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "policy_gte_large_7/",
3
+ "architectures": [
4
+ "NewModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration.NewConfig",
9
+ "AutoModel": "Alibaba-NLP/new-impl--modeling.NewModel",
10
+ "AutoModelForMaskedLM": "Alibaba-NLP/new-impl--modeling.NewForMaskedLM",
11
+ "AutoModelForMultipleChoice": "Alibaba-NLP/new-impl--modeling.NewForMultipleChoice",
12
+ "AutoModelForQuestionAnswering": "Alibaba-NLP/new-impl--modeling.NewForQuestionAnswering",
13
+ "AutoModelForSequenceClassification": "Alibaba-NLP/new-impl--modeling.NewForSequenceClassification",
14
+ "AutoModelForTokenClassification": "Alibaba-NLP/new-impl--modeling.NewForTokenClassification"
15
+ },
16
+ "classifier_dropout": null,
17
+ "export_model_type": "transformer",
18
+ "hidden_act": "gelu",
19
+ "hidden_dropout_prob": 0.1,
20
+ "hidden_size": 1024,
21
+ "initializer_range": 0.02,
22
+ "intermediate_size": 4096,
23
+ "layer_norm_eps": 1e-12,
24
+ "layer_norm_type": "layer_norm",
25
+ "logn_attention_clip1": false,
26
+ "logn_attention_scale": false,
27
+ "max_position_embeddings": 8192,
28
+ "model_type": "new",
29
+ "num_attention_heads": 16,
30
+ "num_hidden_layers": 24,
31
+ "pack_qkv": true,
32
+ "pad_token_id": 0,
33
+ "position_embedding_type": "rope",
34
+ "rope_scaling": {
35
+ "factor": 2.0,
36
+ "type": "ntk"
37
+ },
38
+ "rope_theta": 160000,
39
+ "torch_dtype": "float32",
40
+ "transformers_version": "4.44.2",
41
+ "type_vocab_size": 2,
42
+ "unpad_inputs": false,
43
+ "use_memory_efficient_attention": false,
44
+ "vocab_size": 30528
45
+ }
onnx/configuration.py ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2024 The GTE Team Authors and Alibaba Group.
3
+ # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """ NEW model configuration"""
17
+ from transformers.configuration_utils import PretrainedConfig
18
+ from transformers.utils import logging
19
+
20
+ logger = logging.get_logger(__name__)
21
+
22
+
23
+ class NewConfig(PretrainedConfig):
24
+ r"""
25
+ This is the configuration class to store the configuration of a [`NewModel`] or a [`TFNewModel`]. It is used to
26
+ instantiate a NEW model according to the specified arguments, defining the model architecture. Instantiating a
27
+ configuration with the defaults will yield a similar configuration to that of the NEW
28
+ [izhx/new-base-en](https://huggingface.co/izhx/new-base-en) architecture.
29
+
30
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
31
+ documentation from [`PretrainedConfig`] for more information.
32
+
33
+
34
+ Args:
35
+ vocab_size (`int`, *optional*, defaults to 30522):
36
+ Vocabulary size of the NEW model. Defines the number of different tokens that can be represented by the
37
+ `inputs_ids` passed when calling [`NewModel`] or [`TFNewModel`].
38
+ hidden_size (`int`, *optional*, defaults to 768):
39
+ Dimensionality of the encoder layers and the pooler layer.
40
+ num_hidden_layers (`int`, *optional*, defaults to 12):
41
+ Number of hidden layers in the Transformer encoder.
42
+ num_attention_heads (`int`, *optional*, defaults to 12):
43
+ Number of attention heads for each attention layer in the Transformer encoder.
44
+ intermediate_size (`int`, *optional*, defaults to 3072):
45
+ Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
46
+ hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
47
+ The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
48
+ `"relu"`, `"silu"` and `"gelu_new"` are supported.
49
+ hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
50
+ The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
51
+ attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
52
+ The dropout ratio for the attention probabilities.
53
+ max_position_embeddings (`int`, *optional*, defaults to 512):
54
+ The maximum sequence length that this model might ever be used with. Typically set this to something large
55
+ just in case (e.g., 512 or 1024 or 2048).
56
+ type_vocab_size (`int`, *optional*, defaults to 2):
57
+ The vocabulary size of the `token_type_ids` passed when calling [`NewModel`] or [`TFNewModel`].
58
+ initializer_range (`float`, *optional*, defaults to 0.02):
59
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
60
+ layer_norm_eps (`float`, *optional*, defaults to 1e-12):
61
+ The epsilon used by the layer normalization layers.
62
+ position_embedding_type (`str`, *optional*, defaults to `"rope"`):
63
+ Type of position embedding. Choose one of `"absolute"`, `"rope"`.
64
+ rope_theta (`float`, *optional*, defaults to 10000.0):
65
+ The base period of the RoPE embeddings.
66
+ rope_scaling (`Dict`, *optional*):
67
+ Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
68
+ strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
69
+ `{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
70
+ `max_position_embeddings` to the expected new maximum. See the following thread for more information on how
71
+ these scaling strategies behave:
72
+ https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
73
+ experimental feature, subject to breaking API changes in future versions.
74
+ classifier_dropout (`float`, *optional*):
75
+ The dropout ratio for the classification head.
76
+
77
+ Examples:
78
+
79
+ ```python
80
+ >>> from transformers import NewConfig, NewModel
81
+
82
+ >>> # Initializing a NEW izhx/new-base-en style configuration
83
+ >>> configuration = NewConfig()
84
+
85
+ >>> # Initializing a model (with random weights) from the izhx/new-base-en style configuration
86
+ >>> model = NewModel(configuration)
87
+
88
+ >>> # Accessing the model configuration
89
+ >>> configuration = model.config
90
+ ```"""
91
+
92
+ model_type = "new"
93
+
94
+ def __init__(
95
+ self,
96
+ vocab_size=30528,
97
+ hidden_size=768,
98
+ num_hidden_layers=12,
99
+ num_attention_heads=12,
100
+ intermediate_size=3072,
101
+ hidden_act="gelu",
102
+ hidden_dropout_prob=0.1,
103
+ attention_probs_dropout_prob=0.0,
104
+ max_position_embeddings=2048,
105
+ type_vocab_size=1,
106
+ initializer_range=0.02,
107
+ layer_norm_type='layer_norm',
108
+ layer_norm_eps=1e-12,
109
+ # pad_token_id=0,
110
+ position_embedding_type="rope",
111
+ rope_theta=10000.0,
112
+ rope_scaling=None,
113
+ classifier_dropout=None,
114
+ pack_qkv=True,
115
+ unpad_inputs=False,
116
+ use_memory_efficient_attention=False,
117
+ logn_attention_scale=False,
118
+ logn_attention_clip1=False,
119
+ **kwargs,
120
+ ):
121
+ super().__init__(**kwargs)
122
+
123
+ self.vocab_size = vocab_size
124
+ self.hidden_size = hidden_size
125
+ self.num_hidden_layers = num_hidden_layers
126
+ self.num_attention_heads = num_attention_heads
127
+ self.hidden_act = hidden_act
128
+ self.intermediate_size = intermediate_size
129
+ self.hidden_dropout_prob = hidden_dropout_prob
130
+ self.attention_probs_dropout_prob = attention_probs_dropout_prob
131
+ self.max_position_embeddings = max_position_embeddings
132
+ self.type_vocab_size = type_vocab_size
133
+ self.initializer_range = initializer_range
134
+ self.layer_norm_type = layer_norm_type
135
+ self.layer_norm_eps = layer_norm_eps
136
+ self.position_embedding_type = position_embedding_type
137
+ self.rope_theta = rope_theta
138
+ self.rope_scaling = rope_scaling
139
+ self.classifier_dropout = classifier_dropout
140
+
141
+ self.pack_qkv = pack_qkv
142
+ self.unpad_inputs = unpad_inputs
143
+ self.use_memory_efficient_attention = use_memory_efficient_attention
144
+ self.logn_attention_scale = logn_attention_scale
145
+ self.logn_attention_clip1 = logn_attention_clip1
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34397daef78e8cdf6ffcbc2ecfc2e098f7c2bcc22f477c2c5bf250f716b9b5fb
3
+ size 1745854634
onnx/special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
onnx/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
onnx/tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "mask_token": "[MASK]",
48
+ "max_length": 8000,
49
+ "model_max_length": 8192,
50
+ "pad_to_multiple_of": null,
51
+ "pad_token": "[PAD]",
52
+ "pad_token_type_id": 0,
53
+ "padding_side": "right",
54
+ "sep_token": "[SEP]",
55
+ "stride": 0,
56
+ "strip_accents": null,
57
+ "tokenize_chinese_chars": true,
58
+ "tokenizer_class": "BertTokenizer",
59
+ "truncation_side": "right",
60
+ "truncation_strategy": "longest_first",
61
+ "unk_token": "[UNK]"
62
+ }
onnx/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "mask_token": "[MASK]",
48
+ "max_length": 8000,
49
+ "model_max_length": 8192,
50
+ "pad_to_multiple_of": null,
51
+ "pad_token": "[PAD]",
52
+ "pad_token_type_id": 0,
53
+ "padding_side": "right",
54
+ "sep_token": "[SEP]",
55
+ "stride": 0,
56
+ "strip_accents": null,
57
+ "tokenize_chinese_chars": true,
58
+ "tokenizer_class": "BertTokenizer",
59
+ "truncation_side": "right",
60
+ "truncation_strategy": "longest_first",
61
+ "unk_token": "[UNK]"
62
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1e43c2561e99487c941aa28feb4e6f280b320cf4e099945c7826028d9262d8a
3
+ size 5496
vocab.txt ADDED
The diff for this file is too large to render. See raw diff