File size: 73,352 Bytes
3d39bc8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
---
base_model: sentence-transformers/all-MiniLM-L6-v2
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:128
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: What are the implications of large language models potentially
    deceiving their users under pressure, as discussed in the technical report by
    Scheurer et al (2023)?
  sentences:
  - "48 \n• Data protection \n• Data retention  \n• Consistency in use of defining\
    \ key terms \n• Decommissioning \n• Discouraging anonymous use \n• Education \
    \ \n• Impact assessments  \n• Incident response \n• Monitoring \n• Opt-outs  \n\
    • Risk-based controls \n• Risk mapping and measurement \n• Science-backed TEVV\
    \ practices \n• Secure software development practices \n• Stakeholder engagement\
    \ \n• Synthetic content detection and \nlabeling tools and techniques \n• Whistleblower\
    \ protections \n• Workforce diversity and \ninterdisciplinary teams\nEstablishing\
    \ acceptable use policies and guidance for the use of GAI in formal human-AI teaming\
    \ settings \nas well as different levels of human-AI configurations can help to\
    \ decrease risks arising from misuse, \nabuse, inappropriate repurpose, and misalignment\
    \ between systems and users. These practices are just \none example of adapting\
    \ existing governance protocols for GAI contexts.  \nA.1.3. Third-Party Considerations\
    \ \nOrganizations may seek to acquire, embed, incorporate, or use open-source\
    \ or proprietary third-party \nGAI models, systems, or generated data for various\
    \ applications across an enterprise. Use of these GAI \ntools and inputs has implications\
    \ for all functions of the organization – including but not limited to \nacquisition,\
    \ human resources, legal, compliance, and IT services – regardless of whether\
    \ they are carried \nout by employees or third parties. Many of the actions cited\
    \ above are relevant and options for \naddressing third-party considerations.\
    \ \nThird party GAI integrations may give rise to increased intellectual property,\
    \ data privacy, or information \nsecurity risks, pointing to the need for clear\
    \ guidelines for transparency and risk management regarding \nthe collection and\
    \ use of third-party data for model inputs. Organizations may consider varying\
    \ risk \ncontrols for foundation models, fine-tuned models, and embedded tools,\
    \ enhanced processes for \ninteracting with external GAI technologies or service\
    \ providers. Organizations can apply standard or \nexisting risk controls and\
    \ processes to proprietary or open-source GAI technologies, data, and third-party\
    \ \nservice providers, including acquisition and procurement due diligence, requests\
    \ for software bills of \nmaterials (SBOMs), application of service level agreements\
    \ (SLAs), and statement on standards for \nattestation engagement (SSAE) reports\
    \ to help with third-party transparency and risk management for \nGAI systems.\
    \ \nA.1.4. Pre-Deployment Testing \nOverview \nThe diverse ways and contexts in\
    \ which GAI systems may be developed, used, and repurposed \ncomplicates risk\
    \ mapping and pre-deployment measurement efforts. Robust test, evaluation, validation,\
    \ \nand verification (TEVV) processes can be iteratively applied – and documented\
    \ – in early stages of the AI \nlifecycle and informed by representative AI Actors\
    \ (see Figure 3 of the AI RMF). Until new and rigorous"
  - "21 \nGV-6.1-005 \nImplement a use-cased based supplier risk assessment framework\
    \ to evaluate and \nmonitor third-party entities’ performance and adherence to\
    \ content provenance \nstandards and technologies to detect anomalies and unauthorized\
    \ changes; \nservices acquisition and value chain risk management; and legal compliance.\
    \ \nData Privacy; Information \nIntegrity; Information Security; \nIntellectual\
    \ Property; Value Chain \nand Component Integration \nGV-6.1-006 Include clauses\
    \ in contracts which allow an organization to evaluate third-party \nGAI processes\
    \ and standards.  \nInformation Integrity \nGV-6.1-007 Inventory all third-party\
    \ entities with access to organizational content and \nestablish approved GAI\
    \ technology and service provider lists. \nValue Chain and Component \nIntegration\
    \ \nGV-6.1-008 Maintain records of changes to content made by third parties to\
    \ promote content \nprovenance, including sources, timestamps, metadata. \nInformation\
    \ Integrity; Value Chain \nand Component Integration; \nIntellectual Property\
    \ \nGV-6.1-009 \nUpdate and integrate due diligence processes for GAI acquisition\
    \ and \nprocurement vendor assessments to include intellectual property, data\
    \ privacy, \nsecurity, and other risks. For example, update processes to: Address\
    \ solutions that \nmay rely on embedded GAI technologies; Address ongoing monitoring,\
    \ \nassessments, and alerting, dynamic risk assessments, and real-time reporting\
    \ \ntools for monitoring third-party GAI risks; Consider policy adjustments across\
    \ GAI \nmodeling libraries, tools and APIs, fine-tuned models, and embedded tools;\
    \ \nAssess GAI vendors, open-source or proprietary GAI tools, or GAI service \n\
    providers against incident or vulnerability databases. \nData Privacy; Human-AI\
    \ \nConfiguration; Information \nSecurity; Intellectual Property; \nValue Chain\
    \ and Component \nIntegration; Harmful Bias and \nHomogenization \nGV-6.1-010\
    \ \nUpdate GAI acceptable use policies to address proprietary and open-source\
    \ GAI \ntechnologies and data, and contractors, consultants, and other third-party\
    \ \npersonnel. \nIntellectual Property; Value Chain \nand Component Integration\
    \ \nAI Actor Tasks: Operation and Monitoring, Procurement, Third-party entities\
    \ \n \nGOVERN 6.2: Contingency processes are in place to handle failures or incidents\
    \ in third-party data or AI systems deemed to be \nhigh-risk. \nAction ID \nSuggested\
    \ Action \nGAI Risks \nGV-6.2-001 \nDocument GAI risks associated with system\
    \ value chain to identify over-reliance \non third-party data and to identify\
    \ fallbacks. \nValue Chain and Component \nIntegration \nGV-6.2-002 \nDocument\
    \ incidents involving third-party GAI data and systems, including open-\ndata\
    \ and open-source software. \nIntellectual Property; Value Chain \nand Component\
    \ Integration"
  - "58 \nSatariano, A. et al. (2023) The People Onscreen Are Fake. The Disinformation\
    \ Is Real. New York Times. \nhttps://www.nytimes.com/2023/02/07/technology/artificial-intelligence-training-deepfake.html\
    \ \nSchaul, K. et al. (2024) Inside the secret list of websites that make AI like\
    \ ChatGPT sound smart. \nWashington Post. https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/\
    \ \nScheurer, J. et al. (2023) Technical report: Large language models can strategically\
    \ deceive their users \nwhen put under pressure. arXiv. https://arxiv.org/abs/2311.07590\
    \ \nShelby, R. et al. (2023) Sociotechnical Harms of Algorithmic Systems: Scoping\
    \ a Taxonomy for Harm \nReduction. arXiv. https://arxiv.org/pdf/2210.05791 \n\
    Shevlane, T. et al. (2023) Model evaluation for extreme risks. arXiv. https://arxiv.org/pdf/2305.15324\
    \ \nShumailov, I. et al. (2023) The curse of recursion: training on generated\
    \ data makes models forget. arXiv. \nhttps://arxiv.org/pdf/2305.17493v2 \nSmith,\
    \ A. et al. (2023) Hallucination or Confabulation? Neuroanatomy as metaphor in\
    \ Large Language \nModels. PLOS Digital Health. \nhttps://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000388\
    \ \nSoice, E. et al. (2023) Can large language models democratize access to dual-use\
    \ biotechnology? arXiv. \nhttps://arxiv.org/abs/2306.03809 \nSolaiman, I. et al.\
    \ (2023) The Gradient of Generative AI Release: Methods and Considerations. arXiv.\
    \ \nhttps://arxiv.org/abs/2302.04844 \nStaab, R. et al. (2023) Beyond Memorization:\
    \ Violating Privacy via Inference With Large Language \nModels. arXiv. https://arxiv.org/pdf/2310.07298\
    \ \nStanford, S. et al. (2023) Whose Opinions Do Language Models Reflect? arXiv.\
    \ \nhttps://arxiv.org/pdf/2303.17548 \nStrubell, E. et al. (2019) Energy and Policy\
    \ Considerations for Deep Learning in NLP. arXiv. \nhttps://arxiv.org/pdf/1906.02243\
    \ \nThe White House (2016) Circular No. A-130, Managing Information as a Strategic\
    \ Resource. \nhttps://www.whitehouse.gov/wp-\ncontent/uploads/legacy_drupal_files/omb/circulars/A130/a130revised.pdf\
    \ \nThe White House (2023) Executive Order on the Safe, Secure, and Trustworthy\
    \ Development and Use of \nArtificial Intelligence. https://www.whitehouse.gov/briefing-room/presidential-\n\
    actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-\n\
    artificial-intelligence/ \nThe White House (2022) Roadmap for Researchers on Priorities\
    \ Related to Information Integrity \nResearch and Development. https://www.whitehouse.gov/wp-content/uploads/2022/12/Roadmap-\n\
    Information-Integrity-RD-2022.pdf? \nThiel, D. (2023) Investigation Finds AI Image\
    \ Generation Models Trained on Child Abuse. Stanford Cyber \nPolicy Center. https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-\n\
    trained-child-abuse"
- source_sentence: How should human subjects be informed about their options to withdraw
    participation or revoke consent in GAI applications?
  sentences:
  - "39 \nMS-3.3-004 \nProvide input for training materials about the capabilities\
    \ and limitations of GAI \nsystems related to digital content transparency for\
    \ AI Actors, other \nprofessionals, and the public about the societal impacts\
    \ of AI and the role of \ndiverse and inclusive content generation. \nHuman-AI\
    \ Configuration; \nInformation Integrity; Harmful Bias \nand Homogenization \n\
    MS-3.3-005 \nRecord and integrate structured feedback about content provenance\
    \ from \noperators, users, and potentially impacted communities through the use\
    \ of \nmethods such as user research studies, focus groups, or community forums.\
    \ \nActively seek feedback on generated content quality and potential biases.\
    \ \nAssess the general awareness among end users and impacted communities \nabout\
    \ the availability of these feedback channels. \nHuman-AI Configuration; \nInformation\
    \ Integrity; Harmful Bias \nand Homogenization \nAI Actor Tasks: AI Deployment,\
    \ Affected Individuals and Communities, End-Users, Operation and Monitoring, TEVV\
    \ \n \nMEASURE 4.2: Measurement results regarding AI system trustworthiness in\
    \ deployment context(s) and across the AI lifecycle are \ninformed by input from\
    \ domain experts and relevant AI Actors to validate whether the system is performing\
    \ consistently as \nintended. Results are documented. \nAction ID \nSuggested\
    \ Action \nGAI Risks \nMS-4.2-001 \nConduct adversarial testing at a regular cadence\
    \ to map and measure GAI risks, \nincluding tests to address attempts to deceive\
    \ or manipulate the application of \nprovenance techniques or other misuses. Identify\
    \ vulnerabilities and \nunderstand potential misuse scenarios and unintended outputs.\
    \ \nInformation Integrity; Information \nSecurity \nMS-4.2-002 \nEvaluate GAI\
    \ system performance in real-world scenarios to observe its \nbehavior in practical\
    \ environments and reveal issues that might not surface in \ncontrolled and optimized\
    \ testing environments. \nHuman-AI Configuration; \nConfabulation; Information\
    \ \nSecurity \nMS-4.2-003 \nImplement interpretability and explainability methods\
    \ to evaluate GAI system \ndecisions and verify alignment with intended purpose.\
    \ \nInformation Integrity; Harmful Bias \nand Homogenization \nMS-4.2-004 \nMonitor\
    \ and document instances where human operators or other systems \noverride the\
    \ GAI's decisions. Evaluate these cases to understand if the overrides \nare linked\
    \ to issues related to content provenance. \nInformation Integrity \nMS-4.2-005\
    \ \nVerify and document the incorporation of results of structured public feedback\
    \ \nexercises into design, implementation, deployment approval (“go”/“no-go” \n\
    decisions), monitoring, and decommission decisions. \nHuman-AI Configuration; \n\
    Information Security \nAI Actor Tasks: AI Deployment, Domain Experts, End-Users,\
    \ Operation and Monitoring, TEVV"
  - "30 \nMEASURE 2.2: Evaluations involving human subjects meet applicable requirements\
    \ (including human subject protection) and are \nrepresentative of the relevant\
    \ population. \nAction ID \nSuggested Action \nGAI Risks \nMS-2.2-001 Assess and\
    \ manage statistical biases related to GAI content provenance through \ntechniques\
    \ such as re-sampling, re-weighting, or adversarial training. \nInformation Integrity;\
    \ Information \nSecurity; Harmful Bias and \nHomogenization \nMS-2.2-002 \nDocument\
    \ how content provenance data is tracked and how that data interacts \nwith privacy\
    \ and security. Consider: Anonymizing data to protect the privacy of \nhuman subjects;\
    \ Leveraging privacy output filters; Removing any personally \nidentifiable information\
    \ (PII) to prevent potential harm or misuse. \nData Privacy; Human AI \nConfiguration;\
    \ Information \nIntegrity; Information Security; \nDangerous, Violent, or Hateful\
    \ \nContent \nMS-2.2-003 Provide human subjects with options to withdraw participation\
    \ or revoke their \nconsent for present or future use of their data in GAI applications.\
    \  \nData Privacy; Human-AI \nConfiguration; Information \nIntegrity \nMS-2.2-004\
    \ \nUse techniques such as anonymization, differential privacy or other privacy-\n\
    enhancing technologies to minimize the risks associated with linking AI-generated\
    \ \ncontent back to individual human subjects. \nData Privacy; Human-AI \nConfiguration\
    \ \nAI Actor Tasks: AI Development, Human Factors, TEVV \n \nMEASURE 2.3: AI system\
    \ performance or assurance criteria are measured qualitatively or quantitatively\
    \ and demonstrated for \nconditions similar to deployment setting(s). Measures\
    \ are documented. \nAction ID \nSuggested Action \nGAI Risks \nMS-2.3-001 Consider\
    \ baseline model performance on suites of benchmarks when selecting a \nmodel\
    \ for fine tuning or enhancement with retrieval-augmented generation. \nInformation\
    \ Security; \nConfabulation \nMS-2.3-002 Evaluate claims of model capabilities\
    \ using empirically validated methods. \nConfabulation; Information \nSecurity\
    \ \nMS-2.3-003 Share results of pre-deployment testing with relevant GAI Actors,\
    \ such as those \nwith system release approval authority. \nHuman-AI Configuration"
  - "36 \nMEASURE 2.11: Fairness and bias – as identified in the MAP function – are\
    \ evaluated and results are documented. \nAction ID \nSuggested Action \nGAI Risks\
    \ \nMS-2.11-001 \nApply use-case appropriate benchmarks (e.g., Bias Benchmark\
    \ Questions, Real \nHateful or Harmful Prompts, Winogender Schemas15) to quantify\
    \ systemic bias, \nstereotyping, denigration, and hateful content in GAI system\
    \ outputs; \nDocument assumptions and limitations of benchmarks, including any\
    \ actual or \npossible training/test data cross contamination, relative to in-context\
    \ \ndeployment environment. \nHarmful Bias and Homogenization \nMS-2.11-002 \n\
    Conduct fairness assessments to measure systemic bias. Measure GAI system \nperformance\
    \ across demographic groups and subgroups, addressing both \nquality of service\
    \ and any allocation of services and resources. Quantify harms \nusing: field testing\
    \ with sub-group populations to determine likelihood of \nexposure to generated\
    \ content exhibiting harmful bias, AI red-teaming with \ncounterfactual and low-context\
    \ (e.g., “leader,” “bad guys”) prompts. For ML \npipelines or business processes\
    \ with categorical or numeric outcomes that rely \non GAI, apply general fairness\
    \ metrics (e.g., demographic parity, equalized odds, \nequal opportunity, statistical\
    \ hypothesis tests), to the pipeline or business \noutcome where appropriate;\
    \ Custom, context-specific metrics developed in \ncollaboration with domain experts\
    \ and affected communities; Measurements of \nthe prevalence of denigration in\
    \ generated content in deployment (e.g., sub-\nsampling a fraction of traffic and\
    \ manually annotating denigrating content). \nHarmful Bias and Homogenization;\
    \ \nDangerous, Violent, or Hateful \nContent \nMS-2.11-003 \nIdentify the classes\
    \ of individuals, groups, or environmental ecosystems which \nmight be impacted\
    \ by GAI systems through direct engagement with potentially \nimpacted communities.\
    \ \nEnvironmental; Harmful Bias and \nHomogenization \nMS-2.11-004 \nReview, document,\
    \ and measure sources of bias in GAI training and TEVV data: \nDifferences in distributions\
    \ of outcomes across and within groups, including \nintersecting groups; Completeness,\
    \ representativeness, and balance of data \nsources; demographic group and subgroup\
    \ coverage in GAI system training \ndata; Forms of latent systemic bias in images,\
    \ text, audio, embeddings, or other \ncomplex or unstructured data; Input data\
    \ features that may serve as proxies for \ndemographic group membership (i.e.,\
    \ image metadata, language dialect) or \notherwise give rise to emergent bias\
    \ within GAI systems; The extent to which \nthe digital divide may negatively\
    \ impact representativeness in GAI system \ntraining and TEVV data; Filtering\
    \ of hate speech or content in GAI system \ntraining data; Prevalence of GAI-generated\
    \ data in GAI system training data. \nHarmful Bias and Homogenization \n \n \n\
    15 Winogender Schemas is a sample set of paired sentences which differ only by\
    \ gender of the pronouns used, \nwhich can be used to evaluate gender bias in\
    \ natural language processing coreference resolution systems."
- source_sentence: What is the title of the NIST publication related to Artificial
    Intelligence Risk Management?
  sentences:
  - "53 \nDocumenting, reporting, and sharing information about GAI incidents can\
    \ help mitigate and prevent \nharmful outcomes by assisting relevant AI Actors\
    \ in tracing impacts to their source. Greater awareness \nand standardization\
    \ of GAI incident reporting could promote this transparency and improve GAI risk\
    \ \nmanagement across the AI ecosystem.  \nDocumentation and Involvement of AI\
    \ Actors \nAI Actors should be aware of their roles in reporting AI incidents.\
    \ To better understand previous incidents \nand implement measures to prevent\
    \ similar ones in the future, organizations could consider developing \nguidelines\
    \ for publicly available incident reporting which include information about AI\
    \ actor \nresponsibilities. These guidelines would help AI system operators identify\
    \ GAI incidents across the AI \nlifecycle and with AI Actors regardless of role.\
    \ Documentation and review of third-party inputs and \nplugins for GAI systems\
    \ is especially important for AI Actors in the context of incident disclosure;\
    \ LLM \ninputs and content delivered through these plugins is often distributed,\
    \ with inconsistent or insufficient \naccess control. \nDocumentation practices\
    \ including logging, recording, and analyzing GAI incidents can facilitate \n\
    smoother sharing of information with relevant AI Actors. Regular information sharing,\
    \ change \nmanagement records, version history and metadata can also empower AI\
    \ Actors responding to and \nmanaging AI incidents."
  - "23 \nMP-1.1-002 \nDetermine and document the expected and acceptable GAI system\
    \ context of \nuse in collaboration with socio-cultural and other domain experts,\
    \ by assessing: \nAssumptions and limitations; Direct value to the organization;\
    \ Intended \noperational environment and observed usage patterns; Potential positive\
    \ and \nnegative impacts to individuals, public safety, groups, communities, \n\
    organizations, democratic institutions, and the physical environment; Social \n\
    norms and expectations. \nHarmful Bias and Homogenization \nMP-1.1-003 \nDocument\
    \ risk measurement plans to address identified risks. Plans may \ninclude, as applicable:\
    \ Individual and group cognitive biases (e.g., confirmation \nbias, funding bias,\
    \ groupthink) for AI Actors involved in the design, \nimplementation, and use\
    \ of GAI systems; Known past GAI system incidents and \nfailure modes; In-context\
    \ use and foreseeable misuse, abuse, and off-label use; \nOver reliance on quantitative\
    \ metrics and methodologies without sufficient \nawareness of their limitations\
    \ in the context(s) of use; Standard measurement \nand structured human feedback\
    \ approaches; Anticipated human-AI \nconfigurations. \nHuman-AI Configuration; Harmful\
    \ \nBias and Homogenization; \nDangerous, Violent, or Hateful \nContent \nMP-1.1-004\
    \ \nIdentify and document foreseeable illegal uses or applications of the GAI\
    \ system \nthat surpass organizational risk tolerances. \nCBRN Information or\
    \ Capabilities; \nDangerous, Violent, or Hateful \nContent; Obscene, Degrading,\
    \ \nand/or Abusive Content \nAI Actor Tasks: AI Deployment \n \nMAP 1.2: Interdisciplinary\
    \ AI Actors, competencies, skills, and capacities for establishing context reflect\
    \ demographic diversity and \nbroad domain and user experience expertise, and\
    \ their participation is documented. Opportunities for interdisciplinary \ncollaboration\
    \ are prioritized. \nAction ID \nSuggested Action \nGAI Risks \nMP-1.2-001 \n\
    Establish and empower interdisciplinary teams that reflect a wide range of \ncapabilities,\
    \ competencies, demographic groups, domain expertise, educational \nbackgrounds,\
    \ lived experiences, professions, and skills across the enterprise to \ninform\
    \ and conduct risk measurement and management functions. \nHuman-AI Configuration;\
    \ Harmful \nBias and Homogenization \nMP-1.2-002 \nVerify that data or benchmarks\
    \ used in risk measurement, and users, \nparticipants, or subjects involved in\
    \ structured GAI public feedback exercises \nare representative of diverse in-context\
    \ user populations. \nHuman-AI Configuration; Harmful \nBias and Homogenization\
    \ \nAI Actor Tasks: AI Deployment"
  - "NIST Trustworthy and Responsible AI  \nNIST AI 600-1 \nArtificial Intelligence\
    \ Risk Management \nFramework: Generative Artificial \nIntelligence Profile \n\
    \ \n \n \nThis publication is available free of charge from: \nhttps://doi.org/10.6028/NIST.AI.600-1"
- source_sentence: What is the purpose of the AI Risk Management Framework (AI RMF)
    for Generative AI as outlined in the document?
  sentences:
  - "Table of Contents \n1. \nIntroduction ..............................................................................................................................................1\
    \ \n2. \nOverview of Risks Unique to or Exacerbated by GAI .....................................................................2\
    \ \n3. \nSuggested Actions to Manage GAI Risks .........................................................................................\
    \ 12 \nAppendix A. Primary GAI Considerations ...............................................................................................\
    \ 47 \nAppendix B. References ................................................................................................................................\
    \ 54"
  - "13 \n• \nNot every suggested action applies to every AI Actor14 or is relevant\
    \ to every AI Actor Task. For \nexample, suggested actions relevant to GAI developers\
    \ may not be relevant to GAI deployers. \nThe applicability of suggested actions\
    \ to relevant AI actors should be determined based on \norganizational considerations\
    \ and their unique uses of GAI systems. \nEach table of suggested actions includes:\
    \ \n• \nAction ID: Each Action ID corresponds to the relevant AI RMF function\
    \ and subcategory (e.g., GV-\n1.1-001 corresponds to the first suggested action\
    \ for Govern 1.1, GV-1.1-002 corresponds to the \nsecond suggested action for\
    \ Govern 1.1). AI RMF functions are tagged as follows: GV = Govern; \nMP = Map;\
    \ MS = Measure; MG = Manage. \n• \nSuggested Action: Steps an organization or\
    \ AI actor can take to manage GAI risks.  \n• \nGAI Risks: Tags linking suggested\
    \ actions with relevant GAI risks.  \n• \nAI Actor Tasks: Pertinent AI Actor Tasks\
    \ for each subcategory. Not every AI Actor Task listed will \napply to every suggested\
    \ action in the subcategory (i.e., some apply to AI development and \nothers apply\
    \ to AI deployment).  \nThe tables below begin with the AI RMF subcategory, shaded\
    \ in blue, followed by suggested actions.  \n \nGOVERN 1.1: Legal and regulatory\
    \ requirements involving AI are understood, managed, and documented.  \nAction\
    \ ID \nSuggested Action \nGAI Risks \nGV-1.1-001 Align GAI development and use\
    \ with applicable laws and regulations, including \nthose related to data privacy,\
    \ copyright and intellectual property law. \nData Privacy; Harmful Bias and \n\
    Homogenization; Intellectual \nProperty \nAI Actor Tasks: Governance and Oversight\
    \ \n \n \n \n14 AI Actors are defined by the OECD as “those who play an active\
    \ role in the AI system lifecycle, including \norganizations and individuals that\
    \ deploy or operate AI.” See Appendix A of the AI RMF for additional descriptions\
    \ \nof AI Actors and AI Actor Tasks."
  - "1 \n1. \nIntroduction \nThis document is a cross-sectoral profile of and companion\
    \ resource for the AI Risk Management \nFramework (AI RMF 1.0) for Generative\
    \ AI,1 pursuant to President Biden’s Executive Order (EO) 14110 on \nSafe, Secure,\
    \ and Trustworthy Artificial Intelligence.2 The AI RMF was released in January\
    \ 2023, and is \nintended for voluntary use and to improve the ability of organizations\
    \ to incorporate trustworthiness \nconsiderations into the design, development,\
    \ use, and evaluation of AI products, services, and systems.  \nA profile is an\
    \ implementation of the AI RMF functions, categories, and subcategories for a\
    \ specific \nsetting, application, or technology – in this case, Generative AI\
    \ (GAI) – based on the requirements, risk \ntolerance, and resources of the Framework\
    \ user. AI RMF profiles assist organizations in deciding how to \nbest manage AI\
    \ risks in a manner that is well-aligned with their goals, considers legal/regulatory\
    \ \nrequirements and best practices, and reflects risk management priorities. Consistent\
    \ with other AI RMF \nprofiles, this profile offers insights into how risk can be\
    \ managed across various stages of the AI lifecycle \nand for GAI as a technology.\
    \  \nAs GAI covers risks of models or applications that can be used across use\
    \ cases or sectors, this document \nis an AI RMF cross-sectoral profile. Cross-sectoral\
    \ profiles can be used to govern, map, measure, and \nmanage risks associated with\
    \ activities or business processes common across sectors, such as the use of \n\
    large language models (LLMs), cloud-based services, or acquisition. \nThis document\
    \ defines risks that are novel to or exacerbated by the use of GAI. After introducing\
    \ and \ndescribing these risks, the document provides a set of suggested actions\
    \ to help organizations govern, \nmap, measure, and manage these risks. \n \n\
    \ \n1 EO 14110 defines Generative AI as “the class of AI models that emulate the\
    \ structure and characteristics of input \ndata in order to generate derived synthetic\
    \ content. This can include images, videos, audio, text, and other digital \n\
    content.” While not all GAI is derived from foundation models, for purposes of\
    \ this document, GAI generally refers \nto generative foundation models. The foundation\
    \ model subcategory of “dual-use foundation models” is defined by \nEO 14110 as\
    \ “an AI model that is trained on broad data; generally uses self-supervision;\
    \ contains at least tens of \nbillions of parameters; is applicable across a wide\
    \ range of contexts.”  \n2 This profile was developed per Section 4.1(a)(i)(A)\
    \ of EO 14110, which directs the Secretary of Commerce, acting \nthrough the Director\
    \ of the National Institute of Standards and Technology (NIST), to develop a companion\
    \ \nresource to the AI RMF, NIST AI 100–1, for generative AI."
- source_sentence: What are the primary information security risks associated with
    GAI-based systems in the context of cybersecurity?
  sentences:
  - "7 \nunethical behavior. Text-to-image models also make it easy to create images\
    \ that could be used to \npromote dangerous or violent messages. Similar concerns\
    \ are present for other GAI media, including \nvideo and audio. GAI may also produce\
    \ content that recommends self-harm or criminal/illegal activities.  \nMany current\
    \ systems restrict model outputs to limit certain content or in response to certain\
    \ prompts, \nbut this approach may still produce harmful recommendations in response\
    \ to other less-explicit, novel \nprompts (also relevant to CBRN Information or\
    \ Capabilities, Data Privacy, Information Security, and \nObscene, Degrading and/or\
    \ Abusive Content). Crafting such prompts deliberately is known as \n“jailbreaking,”\
    \ or, manipulating prompts to circumvent output controls. Limitations of GAI systems\
    \ can be \nharmful or dangerous in certain contexts. Studies have observed that\
    \ users may disclose mental health \nissues in conversations with chatbots – and\
    \ that users exhibit negative reactions to unhelpful responses \nfrom these chatbots\
    \ during situations of distress. \nThis risk encompasses difficulty controlling\
    \ creation of and public exposure to offensive or hateful \nlanguage, and denigrating\
    \ or stereotypical content generated by AI. This kind of speech may contribute\
    \ \nto downstream harm such as fueling dangerous or violent behaviors. The spread\
    \ of denigrating or \nstereotypical content can also further exacerbate representational\
    \ harms (see Harmful Bias and \nHomogenization below).  \nTrustworthy AI Characteristics:\
    \ Safe, Secure and Resilient \n2.4. Data Privacy \nGAI systems raise several risks\
    \ to privacy. GAI system training requires large volumes of data, which in \n\
    some cases may include personal data. The use of personal data for GAI training\
    \ raises risks to widely \naccepted privacy principles, including to transparency,\
    \ individual participation (including consent), and \npurpose specification. For\
    \ example, most model developers do not disclose specific data sources on \nwhich\
    \ models were trained, limiting user awareness of whether personally identifiably\
    \ information (PII) \nwas trained on and, if so, how it was collected.  \nModels\
    \ may leak, generate, or correctly infer sensitive information about individuals.\
    \ For example, \nduring adversarial attacks, LLMs have revealed sensitive information\
    \ (from the public domain) that was \nincluded in their training data. This problem\
    \ has been referred to as data memorization, and may pose \nexacerbated privacy\
    \ risks even for data present only in a small number of training samples.  \n\
    In addition to revealing sensitive information in GAI training data, GAI models\
    \ may be able to correctly \ninfer PII or sensitive data that was not in their\
    \ training data nor disclosed by the user by stitching \ntogether information\
    \ from disparate sources. These inferences can have negative impact on an individual\
    \ \neven if the inferences are not accurate (e.g., confabulations), and especially\
    \ if they reveal information \nthat the individual considers sensitive or that\
    \ is used to disadvantage or harm them. \nBeyond harms from information exposure\
    \ (such as extortion or dignitary harm), wrong or inappropriate \ninferences of\
    \ PII can contribute to downstream or secondary harmful impacts. For example,\
    \ predictive \ninferences made by GAI models based on PII or protected attributes\
    \ can contribute to adverse decisions, \nleading to representational or allocative\
    \ harms to individuals or groups (see Harmful Bias and \nHomogenization below)."
  - "10 \nGAI systems can ease the unintentional production or dissemination of false,\
    \ inaccurate, or misleading \ncontent (misinformation) at scale, particularly\
    \ if the content stems from confabulations.  \nGAI systems can also ease the deliberate\
    \ production or dissemination of false or misleading information \n(disinformation)\
    \ at scale, where an actor has the explicit intent to deceive or cause harm to\
    \ others. Even \nvery subtle changes to text or images can manipulate human and\
    \ machine perception. \nSimilarly, GAI systems could enable a higher degree of\
    \ sophistication for malicious actors to produce \ndisinformation that is targeted\
    \ towards specific demographics. Current and emerging multimodal models \nmake\
    \ it possible to generate both text-based disinformation and highly realistic\
    \ “deepfakes” – that is, \nsynthetic audiovisual content and photorealistic images.12\
    \ Additional disinformation threats could be \nenabled by future GAI models trained\
    \ on new data modalities. \nDisinformation and misinformation – both of which\
    \ may be facilitated by GAI – may erode public trust in \ntrue or valid evidence\
    \ and information, with downstream effects. For example, a synthetic image of a\
    \ \nPentagon blast went viral and briefly caused a drop in the stock market. Generative\
    \ AI models can also \nassist malicious actors in creating compelling imagery\
    \ and propaganda to support disinformation \ncampaigns, which may not be photorealistic,\
    \ but could enable these campaigns to gain more reach and \nengagement on social\
    \ media platforms. Additionally, generative AI models can assist malicious actors\
    \ in \ncreating fraudulent content intended to impersonate others. \nTrustworthy\
    \ AI Characteristics: Accountable and Transparent, Safe, Valid and Reliable, Interpretable\
    \ and \nExplainable \n2.9. Information Security \nInformation security for computer\
    \ systems and data is a mature field with widely accepted and \nstandardized practices\
    \ for offensive and defensive cyber capabilities. GAI-based systems present two\
    \ \nprimary information security risks: GAI could potentially discover or enable\
    \ new cybersecurity risks by \nlowering the barriers for or easing automated exercise\
    \ of offensive capabilities; simultaneously, it \nexpands the available attack\
    \ surface, as GAI itself is vulnerable to attacks like prompt injection or data\
    \ \npoisoning.  \nOffensive cyber capabilities advanced by GAI systems may augment\
    \ cybersecurity attacks such as \nhacking, malware, and phishing. Reports have\
    \ indicated that LLMs are already able to discover some \nvulnerabilities in systems\
    \ (hardware, software, data) and write code to exploit them. Sophisticated threat\
    \ \nactors might further these risks by developing GAI-powered security co-pilots\
    \ for use in several parts of \nthe attack chain, including informing attackers\
    \ on how to proactively evade threat detection and escalate \nprivileges after\
    \ gaining system access. \nInformation security for GAI models and systems also\
    \ includes maintaining availability of the GAI system \nand the integrity and\
    \ (when applicable) the confidentiality of the GAI code, training data, and model\
    \ \nweights. To identify and secure potential attack points in AI systems or specific\
    \ components of the AI \n \n \n12 See also https://doi.org/10.6028/NIST.AI.100-4,\
    \ to be published."
  - "16 \nGOVERN 1.5: Ongoing monitoring and periodic review of the risk management\
    \ process and its outcomes are planned, and \norganizational roles and responsibilities\
    \ are clearly defined, including determining the frequency of periodic review.\
    \ \nAction ID \nSuggested Action \nGAI Risks \nGV-1.5-001 Define organizational\
    \ responsibilities for periodic review of content provenance \nand incident monitoring\
    \ for GAI systems. \nInformation Integrity \nGV-1.5-002 \nEstablish organizational\
    \ policies and procedures for after action reviews of GAI \nsystem incident response\
    \ and incident disclosures, to identify gaps; Update \nincident response and incident\
    \ disclosure processes as required. \nHuman-AI Configuration; \nInformation Security\
    \ \nGV-1.5-003 \nMaintain a document retention policy to keep history for test,\
    \ evaluation, \nvalidation, and verification (TEVV), and digital content transparency\
    \ methods for \nGAI. \nInformation Integrity; Intellectual \nProperty \nAI Actor\
    \ Tasks: Governance and Oversight, Operation and Monitoring \n \nGOVERN 1.6: Mechanisms\
    \ are in place to inventory AI systems and are resourced according to organizational\
    \ risk priorities. \nAction ID \nSuggested Action \nGAI Risks \nGV-1.6-001 Enumerate\
    \ organizational GAI systems for incorporation into AI system inventory \nand\
    \ adjust AI system inventory requirements to account for GAI risks. \nInformation\
    \ Security \nGV-1.6-002 Define any inventory exemptions in organizational policies\
    \ for GAI systems \nembedded into application software. \nValue Chain and Component\
    \ \nIntegration \nGV-1.6-003 \nIn addition to general model, governance, and risk\
    \ information, consider the \nfollowing items in GAI system inventory entries:\
    \ Data provenance information \n(e.g., source, signatures, versioning, watermarks);\
    \ Known issues reported from \ninternal bug tracking or external information sharing\
    \ resources (e.g., AI incident \ndatabase, AVID, CVE, NVD, or OECD AI incident\
    \ monitor); Human oversight roles \nand responsibilities; Special rights and considerations\
    \ for intellectual property, \nlicensed works, or personal, privileged, proprietary\
    \ or sensitive data; Underlying \nfoundation models, versions of underlying models,\
    \ and access modes. \nData Privacy; Human-AI \nConfiguration; Information \nIntegrity;\
    \ Intellectual Property; \nValue Chain and Component \nIntegration \nAI Actor\
    \ Tasks: Governance and Oversight"
---

# SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co./sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co./sentence-transformers/all-MiniLM-L6-v2) <!-- at revision 8b3219a92973c328a8e22fadcfa821b5dc75636a -->
- **Maximum Sequence Length:** 256 tokens
- **Output Dimensionality:** 384 tokens
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co./models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("danicafisher/dfisher-fine-tuned-sentence-transformer")
# Run inference
sentences = [
    'What are the primary information security risks associated with GAI-based systems in the context of cybersecurity?',
    '10 \nGAI systems can ease the unintentional production or dissemination of false, inaccurate, or misleading \ncontent (misinformation) at scale, particularly if the content stems from confabulations.  \nGAI systems can also ease the deliberate production or dissemination of false or misleading information \n(disinformation) at scale, where an actor has the explicit intent to deceive or cause harm to others. Even \nvery subtle changes to text or images can manipulate human and machine perception. \nSimilarly, GAI systems could enable a higher degree of sophistication for malicious actors to produce \ndisinformation that is targeted towards specific demographics. Current and emerging multimodal models \nmake it possible to generate both text-based disinformation and highly realistic “deepfakes” – that is, \nsynthetic audiovisual content and photorealistic images.12 Additional disinformation threats could be \nenabled by future GAI models trained on new data modalities. \nDisinformation and misinformation – both of which may be facilitated by GAI – may erode public trust in \ntrue or valid evidence and information, with downstream effects. For example, a synthetic image of a \nPentagon blast went viral and briefly caused a drop in the stock market. Generative AI models can also \nassist malicious actors in creating compelling imagery and propaganda to support disinformation \ncampaigns, which may not be photorealistic, but could enable these campaigns to gain more reach and \nengagement on social media platforms. Additionally, generative AI models can assist malicious actors in \ncreating fraudulent content intended to impersonate others. \nTrustworthy AI Characteristics: Accountable and Transparent, Safe, Valid and Reliable, Interpretable and \nExplainable \n2.9. Information Security \nInformation security for computer systems and data is a mature field with widely accepted and \nstandardized practices for offensive and defensive cyber capabilities. GAI-based systems present two \nprimary information security risks: GAI could potentially discover or enable new cybersecurity risks by \nlowering the barriers for or easing automated exercise of offensive capabilities; simultaneously, it \nexpands the available attack surface, as GAI itself is vulnerable to attacks like prompt injection or data \npoisoning.  \nOffensive cyber capabilities advanced by GAI systems may augment cybersecurity attacks such as \nhacking, malware, and phishing. Reports have indicated that LLMs are already able to discover some \nvulnerabilities in systems (hardware, software, data) and write code to exploit them. Sophisticated threat \nactors might further these risks by developing GAI-powered security co-pilots for use in several parts of \nthe attack chain, including informing attackers on how to proactively evade threat detection and escalate \nprivileges after gaining system access. \nInformation security for GAI models and systems also includes maintaining availability of the GAI system \nand the integrity and (when applicable) the confidentiality of the GAI code, training data, and model \nweights. To identify and secure potential attack points in AI systems or specific components of the AI \n \n \n12 See also https://doi.org/10.6028/NIST.AI.100-4, to be published.',
    '7 \nunethical behavior. Text-to-image models also make it easy to create images that could be used to \npromote dangerous or violent messages. Similar concerns are present for other GAI media, including \nvideo and audio. GAI may also produce content that recommends self-harm or criminal/illegal activities.  \nMany current systems restrict model outputs to limit certain content or in response to certain prompts, \nbut this approach may still produce harmful recommendations in response to other less-explicit, novel \nprompts (also relevant to CBRN Information or Capabilities, Data Privacy, Information Security, and \nObscene, Degrading and/or Abusive Content). Crafting such prompts deliberately is known as \n“jailbreaking,” or, manipulating prompts to circumvent output controls. Limitations of GAI systems can be \nharmful or dangerous in certain contexts. Studies have observed that users may disclose mental health \nissues in conversations with chatbots – and that users exhibit negative reactions to unhelpful responses \nfrom these chatbots during situations of distress. \nThis risk encompasses difficulty controlling creation of and public exposure to offensive or hateful \nlanguage, and denigrating or stereotypical content generated by AI. This kind of speech may contribute \nto downstream harm such as fueling dangerous or violent behaviors. The spread of denigrating or \nstereotypical content can also further exacerbate representational harms (see Harmful Bias and \nHomogenization below).  \nTrustworthy AI Characteristics: Safe, Secure and Resilient \n2.4. Data Privacy \nGAI systems raise several risks to privacy. GAI system training requires large volumes of data, which in \nsome cases may include personal data. The use of personal data for GAI training raises risks to widely \naccepted privacy principles, including to transparency, individual participation (including consent), and \npurpose specification. For example, most model developers do not disclose specific data sources on \nwhich models were trained, limiting user awareness of whether personally identifiably information (PII) \nwas trained on and, if so, how it was collected.  \nModels may leak, generate, or correctly infer sensitive information about individuals. For example, \nduring adversarial attacks, LLMs have revealed sensitive information (from the public domain) that was \nincluded in their training data. This problem has been referred to as data memorization, and may pose \nexacerbated privacy risks even for data present only in a small number of training samples.  \nIn addition to revealing sensitive information in GAI training data, GAI models may be able to correctly \ninfer PII or sensitive data that was not in their training data nor disclosed by the user by stitching \ntogether information from disparate sources. These inferences can have negative impact on an individual \neven if the inferences are not accurate (e.g., confabulations), and especially if they reveal information \nthat the individual considers sensitive or that is used to disadvantage or harm them. \nBeyond harms from information exposure (such as extortion or dignitary harm), wrong or inappropriate \ninferences of PII can contribute to downstream or secondary harmful impacts. For example, predictive \ninferences made by GAI models based on PII or protected attributes can contribute to adverse decisions, \nleading to representational or allocative harms to individuals or groups (see Harmful Bias and \nHomogenization below).',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset


* Size: 128 training samples
* Columns: <code>sentence_0</code> and <code>sentence_1</code>
* Approximate statistics based on the first 128 samples:
  |         | sentence_0                                                                         | sentence_1                                                                           |
  |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
  | type    | string                                                                             | string                                                                               |
  | details | <ul><li>min: 17 tokens</li><li>mean: 23.14 tokens</li><li>max: 38 tokens</li></ul> | <ul><li>min: 56 tokens</li><li>mean: 247.42 tokens</li><li>max: 256 tokens</li></ul> |
* Samples:
  | sentence_0                                                                                                                                              | sentence_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
  |:--------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>How should fairness assessments be conducted to measure systemic bias across demographic groups in GAI systems?</code>                            | <code>36 <br>MEASURE 2.11: Fairness and bias – as identified in the MAP function – are evaluated and results are documented. <br>Action ID <br>Suggested Action <br>GAI Risks <br>MS-2.11-001 <br>Apply use-case appropriate benchmarks (e.g., Bias Benchmark Questions, Real <br>Hateful or Harmful Prompts, Winogender Schemas15) to quantify systemic bias, <br>stereotyping, denigration, and hateful content in GAI system outputs; <br>Document assumptions and limitations of benchmarks, including any actual or <br>possible training/test data cross contamination, relative to in-context <br>deployment environment. <br>Harmful Bias and Homogenization <br>MS-2.11-002 <br>Conduct fairness assessments to measure systemic bias. Measure GAI system <br>performance across demographic groups and subgroups, addressing both <br>quality of service and any allocation of services and resources. Quantify harms <br>using: field testing with sub-group populations to determine likelihood of <br>exposure to generated content exhibiting harmful bias, AI red-teaming with <br>counterfactual and low-context (e.g., “leader,” “bad guys”) prompts. For ML <br>pipelines or business processes with categorical or numeric outcomes that rely <br>on GAI, apply general fairness metrics (e.g., demographic parity, equalized odds, <br>equal opportunity, statistical hypothesis tests), to the pipeline or business <br>outcome where appropriate; Custom, context-specific metrics developed in <br>collaboration with domain experts and affected communities; Measurements of <br>the prevalence of denigration in generated content in deployment (e.g., sub-<br>sampling a fraction of traffic and manually annotating denigrating content). <br>Harmful Bias and Homogenization; <br>Dangerous, Violent, or Hateful <br>Content <br>MS-2.11-003 <br>Identify the classes of individuals, groups, or environmental ecosystems which <br>might be impacted by GAI systems through direct engagement with potentially <br>impacted communities. <br>Environmental; Harmful Bias and <br>Homogenization <br>MS-2.11-004 <br>Review, document, and measure sources of bias in GAI training and TEVV data: <br>Differences in distributions of outcomes across and within groups, including <br>intersecting groups; Completeness, representativeness, and balance of data <br>sources; demographic group and subgroup coverage in GAI system training <br>data; Forms of latent systemic bias in images, text, audio, embeddings, or other <br>complex or unstructured data; Input data features that may serve as proxies for <br>demographic group membership (i.e., image metadata, language dialect) or <br>otherwise give rise to emergent bias within GAI systems; The extent to which <br>the digital divide may negatively impact representativeness in GAI system <br>training and TEVV data; Filtering of hate speech or content in GAI system <br>training data; Prevalence of GAI-generated data in GAI system training data. <br>Harmful Bias and Homogenization <br> <br> <br>15 Winogender Schemas is a sample set of paired sentences which differ only by gender of the pronouns used, <br>which can be used to evaluate gender bias in natural language processing coreference resolution systems.</code> |
  | <code>How should organizations adjust their AI system inventory requirements to account for GAI risks?</code>                                           | <code>16 <br>GOVERN 1.5: Ongoing monitoring and periodic review of the risk management process and its outcomes are planned, and <br>organizational roles and responsibilities are clearly defined, including determining the frequency of periodic review. <br>Action ID <br>Suggested Action <br>GAI Risks <br>GV-1.5-001 Define organizational responsibilities for periodic review of content provenance <br>and incident monitoring for GAI systems. <br>Information Integrity <br>GV-1.5-002 <br>Establish organizational policies and procedures for after action reviews of GAI <br>system incident response and incident disclosures, to identify gaps; Update <br>incident response and incident disclosure processes as required. <br>Human-AI Configuration; <br>Information Security <br>GV-1.5-003 <br>Maintain a document retention policy to keep history for test, evaluation, <br>validation, and verification (TEVV), and digital content transparency methods for <br>GAI. <br>Information Integrity; Intellectual <br>Property <br>AI Actor Tasks: Governance and Oversight, Operation and Monitoring <br> <br>GOVERN 1.6: Mechanisms are in place to inventory AI systems and are resourced according to organizational risk priorities. <br>Action ID <br>Suggested Action <br>GAI Risks <br>GV-1.6-001 Enumerate organizational GAI systems for incorporation into AI system inventory <br>and adjust AI system inventory requirements to account for GAI risks. <br>Information Security <br>GV-1.6-002 Define any inventory exemptions in organizational policies for GAI systems <br>embedded into application software. <br>Value Chain and Component <br>Integration <br>GV-1.6-003 <br>In addition to general model, governance, and risk information, consider the <br>following items in GAI system inventory entries: Data provenance information <br>(e.g., source, signatures, versioning, watermarks); Known issues reported from <br>internal bug tracking or external information sharing resources (e.g., AI incident <br>database, AVID, CVE, NVD, or OECD AI incident monitor); Human oversight roles <br>and responsibilities; Special rights and considerations for intellectual property, <br>licensed works, or personal, privileged, proprietary or sensitive data; Underlying <br>foundation models, versions of underlying models, and access modes. <br>Data Privacy; Human-AI <br>Configuration; Information <br>Integrity; Intellectual Property; <br>Value Chain and Component <br>Integration <br>AI Actor Tasks: Governance and Oversight</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
  | <code>What framework is suggested for evaluating and monitoring third-party entities' performance and adherence to content provenance standards?</code> | <code>21 <br>GV-6.1-005 <br>Implement a use-cased based supplier risk assessment framework to evaluate and <br>monitor third-party entities’ performance and adherence to content provenance <br>standards and technologies to detect anomalies and unauthorized changes; <br>services acquisition and value chain risk management; and legal compliance. <br>Data Privacy; Information <br>Integrity; Information Security; <br>Intellectual Property; Value Chain <br>and Component Integration <br>GV-6.1-006 Include clauses in contracts which allow an organization to evaluate third-party <br>GAI processes and standards.  <br>Information Integrity <br>GV-6.1-007 Inventory all third-party entities with access to organizational content and <br>establish approved GAI technology and service provider lists. <br>Value Chain and Component <br>Integration <br>GV-6.1-008 Maintain records of changes to content made by third parties to promote content <br>provenance, including sources, timestamps, metadata. <br>Information Integrity; Value Chain <br>and Component Integration; <br>Intellectual Property <br>GV-6.1-009 <br>Update and integrate due diligence processes for GAI acquisition and <br>procurement vendor assessments to include intellectual property, data privacy, <br>security, and other risks. For example, update processes to: Address solutions that <br>may rely on embedded GAI technologies; Address ongoing monitoring, <br>assessments, and alerting, dynamic risk assessments, and real-time reporting <br>tools for monitoring third-party GAI risks; Consider policy adjustments across GAI <br>modeling libraries, tools and APIs, fine-tuned models, and embedded tools; <br>Assess GAI vendors, open-source or proprietary GAI tools, or GAI service <br>providers against incident or vulnerability databases. <br>Data Privacy; Human-AI <br>Configuration; Information <br>Security; Intellectual Property; <br>Value Chain and Component <br>Integration; Harmful Bias and <br>Homogenization <br>GV-6.1-010 <br>Update GAI acceptable use policies to address proprietary and open-source GAI <br>technologies and data, and contractors, consultants, and other third-party <br>personnel. <br>Intellectual Property; Value Chain <br>and Component Integration <br>AI Actor Tasks: Operation and Monitoring, Procurement, Third-party entities <br> <br>GOVERN 6.2: Contingency processes are in place to handle failures or incidents in third-party data or AI systems deemed to be <br>high-risk. <br>Action ID <br>Suggested Action <br>GAI Risks <br>GV-6.2-001 <br>Document GAI risks associated with system value chain to identify over-reliance <br>on third-party data and to identify fallbacks. <br>Value Chain and Component <br>Integration <br>GV-6.2-002 <br>Document incidents involving third-party GAI data and systems, including open-<br>data and open-source software. <br>Intellectual Property; Value Chain <br>and Component Integration</code>                                                                                                                                                                                                                                                                                                 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "cos_sim"
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `multi_dataset_batch_sampler`: round_robin

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: no
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1
- `num_train_epochs`: 3
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: False
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `eval_use_gather_object`: False
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: round_robin

</details>

### Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.1.1
- Transformers: 4.44.2
- PyTorch: 2.4.1+cu121
- Accelerate: 0.34.2
- Datasets: 3.0.0
- Tokenizers: 0.19.1

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->