metadata
base_model: sentence-transformers/all-MiniLM-L6-v2
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:128
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: >-
What is the title of the publication released by NIST in July 2024
regarding artificial intelligence?
sentences:
- |-
NIST Trustworthy and Responsible AI
NIST AI 600-1
Artificial Intelligence Risk Management
Framework: Generative Artificial
Intelligence Profile
This publication is available free of charge from:
https://doi.org/10.6028/NIST.AI.600-1
- >-
NIST Trustworthy and Responsible AI
NIST AI 600-1
Artificial Intelligence Risk Management
Framework: Generative Artificial
Intelligence Profile
This publication is available free of charge from:
https://doi.org/10.6028/NIST.AI.600-1
July 2024
U.S. Department of Commerce
Gina M. Raimondo, Secretary
National Institute of Standards and Technology
Laurie E. Locascio, NIST Director and Under Secretary of Commerce for
Standards and Technology
- >-
37
MS-2.11-005
Assess the proportion of synthetic to non-synthetic training data and
verify
training data is not overly homogenous or GAI-produced to mitigate
concerns of
model collapse.
Harmful Bias and Homogenization
AI Actor Tasks: AI Deployment, AI Impact Assessment, Affected Individuals
and Communities, Domain Experts, End-Users,
Operation and Monitoring, TEVV
MEASURE 2.12: Environmental impact and sustainability of AI model
training and management activities – as identified in the MAP
function – are assessed and documented.
Action ID
Suggested Action
GAI Risks
MS-2.12-001 Assess safety to physical environments when deploying GAI
systems.
Dangerous, Violent, or Hateful
Content
MS-2.12-002 Document anticipated environmental impacts of model
development,
maintenance, and deployment in product design decisions.
Environmental
MS-2.12-003
Measure or estimate environmental impacts (e.g., energy and water
consumption) for training, fine tuning, and deploying models: Verify
tradeoffs
between resources used at inference time versus additional resources
required
at training time.
Environmental
MS-2.12-004 Verify effectiveness of carbon capture or offset programs for
GAI training and
applications, and address green-washing concerns.
Environmental
AI Actor Tasks: AI Deployment, AI Impact Assessment, Domain Experts,
Operation and Monitoring, TEVV
- source_sentence: >-
What are the four primary considerations relevant to Generative AI (GAI)
that the GAI Public Working Group focused on?
sentences:
- >-
23
MP-1.1-002
Determine and document the expected and acceptable GAI system context
of
use in collaboration with socio-cultural and other domain experts, by
assessing:
Assumptions and limitations; Direct value to the organization; Intended
operational environment and observed usage patterns; Potential positive
and
negative impacts to individuals, public safety, groups, communities,
organizations, democratic institutions, and the physical environment;
Social
norms and expectations.
Harmful Bias and Homogenization
MP-1.1-003
Document risk measurement plans to address identified risks. Plans may
include, as applicable: Individual and group cognitive biases (e.g.,
confirmation
bias, funding bias, groupthink) for AI Actors involved in the design,
implementation, and use of GAI systems; Known past GAI system incidents
and
failure modes; In-context use and foreseeable misuse, abuse, and
off-label use;
Over reliance on quantitative metrics and methodologies without
sufficient
awareness of their limitations in the context(s) of use; Standard
measurement
and structured human feedback approaches; Anticipated human-AI
configurations.
Human-AI Configuration; Harmful
Bias and Homogenization;
Dangerous, Violent, or Hateful
Content
MP-1.1-004
Identify and document foreseeable illegal uses or applications of the
GAI system
that surpass organizational risk tolerances.
CBRN Information or Capabilities;
Dangerous, Violent, or Hateful
Content; Obscene, Degrading,
and/or Abusive Content
AI Actor Tasks: AI Deployment
MAP 1.2: Interdisciplinary AI Actors, competencies, skills, and
capacities for establishing context reflect demographic diversity and
broad domain and user experience expertise, and their participation is
documented. Opportunities for interdisciplinary
collaboration are prioritized.
Action ID
Suggested Action
GAI Risks
MP-1.2-001
Establish and empower interdisciplinary teams that reflect a wide range
of
capabilities, competencies, demographic groups, domain expertise,
educational
backgrounds, lived experiences, professions, and skills across the
enterprise to
inform and conduct risk measurement and management functions.
Human-AI Configuration; Harmful
Bias and Homogenization
MP-1.2-002
Verify that data or benchmarks used in risk measurement, and users,
participants, or subjects involved in structured GAI public feedback
exercises
are representative of diverse in-context user populations.
Human-AI Configuration; Harmful
Bias and Homogenization
AI Actor Tasks: AI Deployment
- >-
2
This work was informed by public feedback and consultations with diverse
stakeholder groups as part of NIST’s
Generative AI Public Working Group (GAI PWG). The GAI PWG was an open,
transparent, and collaborative
process, facilitated via a virtual workspace, to obtain multistakeholder
input on GAI risk management and to
inform NIST’s approach.
The focus of the GAI PWG was limited to four primary considerations
relevant to GAI: Governance, Content
Provenance, Pre-deployment Testing, and Incident Disclosure (further
described in Appendix A). As such, the
suggested actions in this document primarily address these
considerations.
Future revisions of this profile will include additional AI RMF
subcategories, risks, and suggested actions based
on additional considerations of GAI as the space evolves and empirical
evidence indicates additional risks. A
glossary of terms pertinent to GAI risk management will be developed and
hosted on NIST’s Trustworthy &
Responsible AI Resource Center (AIRC), and added to The Language of
Trustworthy AI: An In-Depth Glossary of
Terms.
This document was also informed by public comments and consultations
from several Requests for Information.
2.
Overview of Risks Unique to or Exacerbated by GAI
In the context of the AI RMF, risk refers to the composite measure of an
event’s probability (or
likelihood) of occurring and the magnitude or degree of the consequences
of the corresponding event.
Some risks can be assessed as likely to materialize in a given context,
particularly those that have been
empirically demonstrated in similar contexts. Other risks may be
unlikely to materialize in a given
context, or may be more speculative and therefore uncertain.
AI risks can differ from or intensify traditional software risks.
Likewise, GAI can exacerbate existing AI
risks, and creates unique risks. GAI risks can vary along many
dimensions:
•
Stage of the AI lifecycle: Risks can arise during design, development,
deployment, operation,
and/or decommissioning.
•
Scope: Risks may exist at individual model or system levels, at the
application or implementation
levels (i.e., for a specific use case), or at the ecosystem level – that
is, beyond a single system or
organizational context. Examples of the latter include the expansion of
“algorithmic
monocultures,3” resulting from repeated use of the same model, or
impacts on access to
opportunity, labor markets, and the creative economies.4
•
Source of risk: Risks may emerge from factors related to the design,
training, or operation of the
GAI model itself, stemming in some cases from GAI model or system
inputs, and in other cases,
from GAI system outputs. Many GAI risks, however, originate from human
behavior, including
3 “Algorithmic monocultures” refers to the phenomenon in which repeated
use of the same model or algorithm in
consequential decision-making settings like employment and lending can
result in increased susceptibility by
systems to correlated failures (like unexpected shocks), due to multiple
actors relying on the same algorithm.
4 Many studies have projected the impact of AI on the workforce and
labor markets. Fewer studies have examined
the impact of GAI on the labor market, though some industry surveys
indicate that that both employees and
employers are pondering this disruption.
- >-
44
MG-3.2-007
Leverage feedback and recommendations from organizational boards or
committees related to the deployment of GAI applications and content
provenance when using third-party pre-trained models.
Information Integrity; Value Chain
and Component Integration
MG-3.2-008
Use human moderation systems where appropriate to review generated
content
in accordance with human-AI configuration policies established in the
Govern
function, aligned with socio-cultural norms in the context of use, and
for settings
where AI models are demonstrated to perform poorly.
Human-AI Configuration
MG-3.2-009
Use organizational risk tolerance to evaluate acceptable risks and
performance
metrics and decommission or retrain pre-trained models that perform
outside of
defined limits.
CBRN Information or Capabilities;
Confabulation
AI Actor Tasks: AI Deployment, Operation and Monitoring, Third-party
entities
MANAGE 4.1: Post-deployment AI system monitoring plans are implemented,
including mechanisms for capturing and evaluating
input from users and other relevant AI Actors, appeal and override,
decommissioning, incident response, recovery, and change
management.
Action ID
Suggested Action
GAI Risks
MG-4.1-001
Collaborate with external researchers, industry experts, and community
representatives to maintain awareness of emerging best practices and
technologies in measuring and managing identified risks.
Information Integrity; Harmful Bias
and Homogenization
MG-4.1-002
Establish, maintain, and evaluate effectiveness of organizational
processes and
procedures for post-deployment monitoring of GAI systems, particularly
for
potential confabulation, CBRN, or cyber risks.
CBRN Information or Capabilities;
Confabulation; Information
Security
MG-4.1-003
Evaluate the use of sentiment analysis to gauge user sentiment regarding
GAI
content performance and impact, and work in collaboration with AI
Actors
experienced in user research and experience.
Human-AI Configuration
MG-4.1-004 Implement active learning techniques to identify instances
where the model fails
or produces unexpected outputs.
Confabulation
MG-4.1-005
Share transparency reports with internal and external stakeholders that
detail
steps taken to update the GAI system to enhance transparency and
accountability.
Human-AI Configuration; Harmful
Bias and Homogenization
MG-4.1-006
Track dataset modifications for provenance by monitoring data deletions,
rectification requests, and other changes that may impact the
verifiability of
content origins.
Information Integrity
- source_sentence: >-
What techniques should be deployed to verify the accuracy and veracity of
information generated by GAI systems?
sentences:
- >-
10
GAI systems can ease the unintentional production or dissemination of
false, inaccurate, or misleading
content (misinformation) at scale, particularly if the content stems
from confabulations.
GAI systems can also ease the deliberate production or dissemination of
false or misleading information
(disinformation) at scale, where an actor has the explicit intent to
deceive or cause harm to others. Even
very subtle changes to text or images can manipulate human and machine
perception.
Similarly, GAI systems could enable a higher degree of sophistication
for malicious actors to produce
disinformation that is targeted towards specific demographics. Current
and emerging multimodal models
make it possible to generate both text-based disinformation and highly
realistic “deepfakes” – that is,
synthetic audiovisual content and photorealistic images.12 Additional
disinformation threats could be
enabled by future GAI models trained on new data modalities.
Disinformation and misinformation – both of which may be facilitated by
GAI – may erode public trust in
true or valid evidence and information, with downstream effects. For
example, a synthetic image of a
Pentagon blast went viral and briefly caused a drop in the stock market.
Generative AI models can also
assist malicious actors in creating compelling imagery and propaganda to
support disinformation
campaigns, which may not be photorealistic, but could enable these
campaigns to gain more reach and
engagement on social media platforms. Additionally, generative AI models
can assist malicious actors in
creating fraudulent content intended to impersonate others.
Trustworthy AI Characteristics: Accountable and Transparent, Safe, Valid
and Reliable, Interpretable and
Explainable
2.9. Information Security
Information security for computer systems and data is a mature field with
widely accepted and
standardized practices for offensive and defensive cyber capabilities.
GAI-based systems present two
primary information security risks: GAI could potentially discover or
enable new cybersecurity risks by
lowering the barriers for or easing automated exercise of offensive
capabilities; simultaneously, it
expands the available attack surface, as GAI itself is vulnerable to
attacks like prompt injection or data
poisoning.
Offensive cyber capabilities advanced by GAI systems may augment
cybersecurity attacks such as
hacking, malware, and phishing. Reports have indicated that LLMs are
already able to discover some
vulnerabilities in systems (hardware, software, data) and write code to
exploit them. Sophisticated threat
actors might further these risks by developing GAI-powered security
co-pilots for use in several parts of
the attack chain, including informing attackers on how to proactively
evade threat detection and escalate
privileges after gaining system access.
Information security for GAI models and systems also includes
maintaining availability of the GAI system
and the integrity and (when applicable) the confidentiality of the GAI
code, training data, and model
weights. To identify and secure potential attack points in AI systems or
specific components of the AI
12 See also https://doi.org/10.6028/NIST.AI.100-4, to be published.
- >-
25
MP-2.3-002 Review and document accuracy, representativeness, relevance,
suitability of data
used at different stages of AI life cycle.
Harmful Bias and Homogenization;
Intellectual Property
MP-2.3-003
Deploy and document fact-checking techniques to verify the accuracy and
veracity of information generated by GAI systems, especially when the
information comes from multiple (or unknown) sources.
Information Integrity
MP-2.3-004 Develop and implement testing techniques to identify GAI
produced content (e.g.,
synthetic media) that might be indistinguishable from human-generated
content. Information Integrity
MP-2.3-005 Implement plans for GAI systems to undergo regular
adversarial testing to identify
vulnerabilities and potential manipulation or misuse.
Information Security
AI Actor Tasks: AI Development, Domain Experts, TEVV
MAP 3.4: Processes for operator and practitioner proficiency with AI
system performance and trustworthiness – and relevant
technical standards and certifications – are defined, assessed, and
documented.
Action ID
Suggested Action
GAI Risks
MP-3.4-001
Evaluate whether GAI operators and end-users can accurately understand
content lineage and origin.
Human-AI Configuration;
Information Integrity
MP-3.4-002 Adapt existing training programs to include modules on
digital content
transparency.
Information Integrity
MP-3.4-003 Develop certification programs that test proficiency in
managing GAI risks and
interpreting content provenance, relevant to specific industry and
context.
Information Integrity
MP-3.4-004 Delineate human proficiency tests from tests of GAI
capabilities.
Human-AI Configuration
MP-3.4-005 Implement systems to continually monitor and track the
outcomes of human-GAI
configurations for future refinement and improvements.
Human-AI Configuration;
Information Integrity
MP-3.4-006
Involve the end-users, practitioners, and operators in GAI system in
prototyping
and testing activities. Make sure these tests cover various scenarios,
such as crisis
situations or ethically sensitive contexts.
Human-AI Configuration;
Information Integrity; Harmful Bias
and Homogenization; Dangerous,
Violent, or Hateful Content
AI Actor Tasks: AI Design, AI Development, Domain Experts, End-Users,
Human Factors, Operation and Monitoring
- >-
27
MP-4.1-010
Conduct appropriate diligence on training data use to assess
intellectual property,
and privacy, risks, including to examine whether use of proprietary or
sensitive
training data is consistent with applicable laws.
Intellectual Property; Data Privacy
AI Actor Tasks: Governance and Oversight, Operation and Monitoring,
Procurement, Third-party entities
MAP 5.1: Likelihood and magnitude of each identified impact (both
potentially beneficial and harmful) based on expected use, past
uses of AI systems in similar contexts, public incident reports,
feedback from those external to the team that developed or deployed
the AI system, or other data are identified and documented.
Action ID
Suggested Action
GAI Risks
MP-5.1-001 Apply TEVV practices for content provenance (e.g., probing a
system's synthetic
data generation capabilities for potential misuse or vulnerabilities.
Information Integrity; Information
Security
MP-5.1-002
Identify potential content provenance harms of GAI, such as
misinformation or
disinformation, deepfakes, including NCII, or tampered content.
Enumerate and
rank risks based on their likelihood and potential impact, and determine
how well
provenance solutions address specific risks and/or harms.
Information Integrity; Dangerous,
Violent, or Hateful Content;
Obscene, Degrading, and/or
Abusive Content
MP-5.1-003
Consider disclosing use of GAI to end users in relevant contexts, while
considering
the objective of disclosure, the context of use, the likelihood and
magnitude of the
risk posed, the audience of the disclosure, as well as the frequency of
the
disclosures.
Human-AI Configuration
MP-5.1-004 Prioritize GAI structured public feedback processes based on
risk assessment
estimates.
Information Integrity; CBRN
Information or Capabilities;
Dangerous, Violent, or Hateful
Content; Harmful Bias and
Homogenization
MP-5.1-005 Conduct adversarial role-playing exercises, GAI red-teaming,
or chaos testing to
identify anomalous or unforeseen failure modes.
Information Security
MP-5.1-006
Profile threats and negative impacts arising from GAI systems interacting
with,
manipulating, or generating content, and outlining known and potential
vulnerabilities and the likelihood of their occurrence.
Information Security
AI Actor Tasks: AI Deployment, AI Design, AI Development, AI Impact
Assessment, Affected Individuals and Communities, End-
Users, Operation and Monitoring
- source_sentence: What is the phenomenon referred to as "confabulation" in GAI systems?
sentences:
- >-
50
Participatory Engagement Methods
On an ad hoc or more structured basis, organizations can design and use
a variety of channels to engage
external stakeholders in product development or review. Focus groups
with select experts can provide
feedback on a range of issues. Small user studies can provide feedback
from representative groups or
populations. Anonymous surveys can be used to poll or gauge reactions to
specific features. Participatory
engagement methods are often less structured than field testing or red
teaming, and are more
commonly used in early stages of AI or product development.
Field Testing
Field testing involves structured settings to evaluate risks and impacts
and to simulate the conditions
under which the GAI system will be deployed. Field style tests can be
adapted from a focus on user
preferences and experiences towards AI risks and impacts – both negative
and positive. When carried
out with large groups of users, these tests can provide estimations of
the likelihood of risks and impacts
in real world interactions.
Organizations may also collect feedback on outcomes, harms, and user
experience directly from users in
the production environment after a model has been released, in
accordance with human subject
standards such as informed consent and compensation. Organizations
should follow applicable human
subjects research requirements, and best practices such as informed
consent and subject compensation,
when implementing feedback activities.
AI Red-teaming
AI red-teaming is an evolving practice that references exercises often
conducted in a controlled
environment and in collaboration with AI developers building AI models
to identify potential adverse
behavior or outcomes of a GAI model or system, how they could occur, and
stress test safeguards”. AI
red-teaming can be performed before or after AI models or systems are
made available to the broader
public; this section focuses on red-teaming in pre-deployment
contexts.
The quality of AI red-teaming outputs is related to the background and
expertise of the AI red team
itself. Demographically and interdisciplinarily diverse AI red teams can
be used to identify flaws in the
varying contexts where GAI will be used. For best results, AI red teams
should demonstrate domain
expertise, and awareness of socio-cultural aspects within the deployment
context. AI red-teaming results
should be given additional analysis before they are incorporated into
organizational governance and
decision making, policy and procedural updates, and AI risk management
efforts.
Various types of AI red-teaming may be appropriate, depending on the use
case:
•
General Public: Performed by general users (not necessarily AI or
technical experts) who are
expected to use the model or interact with its outputs, and who bring
their own lived
experiences and perspectives to the task of AI red-teaming. These
individuals may have been
provided instructions and material to complete tasks which may elicit
harmful model behaviors.
This type of exercise can be more effective with large groups of AI
red-teamers.
•
Expert: Performed by specialists with expertise in the domain or specific
AI red-teaming context
of use (e.g., medicine, biotech, cybersecurity).
•
Combination: In scenarios when it is difficult to identify and recruit
specialists with sufficient
domain and contextual expertise, AI red-teaming exercises may leverage
both expert and
- >-
54
Appendix B. References
Acemoglu, D. (2024) The Simple Macroeconomics of AI
https://www.nber.org/papers/w32487
AI Incident Database. https://incidentdatabase.ai/
Atherton, D. (2024) Deepfakes and Child Safety: A Survey and Analysis of
2023 Incidents and Responses.
AI Incident Database.
https://incidentdatabase.ai/blog/deepfakes-and-child-safety/
Badyal, N. et al. (2023) Intentional Biases in LLM Responses. arXiv.
https://arxiv.org/pdf/2311.07611
Bing Chat: Data Exfiltration Exploit Explained. Embrace The Red.
https://embracethered.com/blog/posts/2023/bing-chat-data-exfiltration-poc-and-fix/
Bommasani, R. et al. (2022) Picking on the Same Person: Does Algorithmic
Monoculture lead to Outcome
Homogenization? arXiv. https://arxiv.org/pdf/2211.13972
Boyarskaya, M. et al. (2020) Overcoming Failures of Imagination in AI
Infused System Development and
Deployment. arXiv. https://arxiv.org/pdf/2011.13416
Browne, D. et al. (2023) Securing the AI Pipeline. Mandiant.
https://www.mandiant.com/resources/blog/securing-ai-pipeline
Burgess, M. (2024) Generative AI’s Biggest Security Flaw Is Not Easy to
Fix. WIRED.
https://www.wired.com/story/generative-ai-prompt-injection-hacking/
Burtell, M. et al. (2024) The Surprising Power of Next Word Prediction:
Large Language Models
Explained, Part 1. Georgetown Center for Security and Emerging
Technology.
https://cset.georgetown.edu/article/the-surprising-power-of-next-word-prediction-large-language-
models-explained-part-1/
Canadian Centre for Cyber Security (2023) Generative artificial
intelligence (AI) - ITSAP.00.041.
https://www.cyber.gc.ca/en/guidance/generative-artificial-intelligence-ai-itsap00041
Carlini, N., et al. (2021) Extracting Training Data from Large Language
Models. Usenix.
https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting
Carlini, N. et al. (2023) Quantifying Memorization Across Neural
Language Models. ICLR 2023.
https://arxiv.org/pdf/2202.07646
Carlini, N. et al. (2024) Stealing Part of a Production Language Model.
arXiv.
https://arxiv.org/abs/2403.06634
Chandra, B. et al. (2023) Dismantling the Disinformation Business of
Chinese Influence Operations.
RAND.
https://www.rand.org/pubs/commentary/2023/10/dismantling-the-disinformation-business-of-
chinese.html
Ciriello, R. et al. (2024) Ethical Tensions in Human-AI Companionship: A
Dialectical Inquiry into Replika.
ResearchGate.
https://www.researchgate.net/publication/374505266_Ethical_Tensions_in_Human-
AI_Companionship_A_Dialectical_Inquiry_into_Replika
Dahl, M. et al. (2024) Large Legal Fictions: Profiling Legal
Hallucinations in Large Language Models. arXiv.
https://arxiv.org/abs/2401.01301
- >-
6
2.2. Confabulation
“Confabulation” refers to a phenomenon in which GAI systems generate and
confidently present
erroneous or false content in response to prompts. Confabulations also
include generated outputs that
diverge from the prompts or other input or that contradict previously
generated statements in the same
context. These phenomena are colloquially also referred to as
“hallucinations” or “fabrications.”
Confabulations can occur across GAI outputs and contexts.9,10
Confabulations are a natural result of the
way generative models are designed: they generate outputs that
approximate the statistical distribution
of their training data; for example, LLMs predict the next token or word
in a sentence or phrase. While
such statistical prediction can produce factually accurate and
consistent outputs, it can also produce
outputs that are factually inaccurate or internally inconsistent. This
dynamic is particularly relevant when
it comes to open-ended prompts for long-form responses and in domains
which require highly
contextual and/or domain expertise.
Risks from confabulations may arise when users believe false content –
often due to the confident nature
of the response – leading users to act upon or promote the false
information. This poses a challenge for
many real-world applications, such as in healthcare, where a
confabulated summary of patient
information reports could cause doctors to make incorrect diagnoses
and/or recommend the wrong
treatments. Risks of confabulated content may be especially important to
monitor when integrating GAI
into applications involving consequential decision making.
GAI outputs may also include confabulated logic or citations that
purport to justify or explain the
system’s answer, which may further mislead humans into inappropriately
trusting the system’s output.
For instance, LLMs sometimes provide logical steps for how they arrived
at an answer even when the
answer itself is incorrect. Similarly, an LLM could falsely assert that
it is human or has human traits,
potentially deceiving humans into believing they are speaking with
another human.
The extent to which humans can be deceived by LLMs, the mechanisms by
which this may occur, and the
potential risks from adversarial prompting of such behavior are emerging
areas of study. Given the wide
range of downstream impacts of GAI, it is difficult to estimate the
downstream scale and impact of
confabulations.
Trustworthy AI Characteristics: Fair with Harmful Bias Managed, Safe,
Valid and Reliable, Explainable
and Interpretable
2.3. Dangerous, Violent, or Hateful Content
GAI systems can produce content that is inciting, radicalizing, or
threatening, or that glorifies violence,
with greater ease and scale than other technologies. LLMs have been
reported to generate dangerous or
violent recommendations, and some models have generated actionable
instructions for dangerous or
9 Confabulations of falsehoods are most commonly a problem for
text-based outputs; for audio, image, or video
content, creative generation of non-factual content can be a desired
behavior.
10 For example, legal confabulations have been shown to be pervasive in
current state-of-the-art LLMs. See also,
e.g.,
- source_sentence: >-
How can organizations address risks associated with the use of third-party
data for GAI model inputs?
sentences:
- >-
48
• Data protection
• Data retention
• Consistency in use of defining key terms
• Decommissioning
• Discouraging anonymous use
• Education
• Impact assessments
• Incident response
• Monitoring
• Opt-outs
• Risk-based controls
• Risk mapping and measurement
• Science-backed TEVV practices
• Secure software development practices
• Stakeholder engagement
• Synthetic content detection and
labeling tools and techniques
• Whistleblower protections
• Workforce diversity and
interdisciplinary teams
Establishing acceptable use policies and guidance for the use of GAI in
formal human-AI teaming settings
as well as different levels of human-AI configurations can help to
decrease risks arising from misuse,
abuse, inappropriate repurpose, and misalignment between systems and
users. These practices are just
one example of adapting existing governance protocols for GAI
contexts.
A.1.3. Third-Party Considerations
Organizations may seek to acquire, embed, incorporate, or use
open-source or proprietary third-party
GAI models, systems, or generated data for various applications across
an enterprise. Use of these GAI
tools and inputs has implications for all functions of the organization
– including but not limited to
acquisition, human resources, legal, compliance, and IT services –
regardless of whether they are carried
out by employees or third parties. Many of the actions cited above are
relevant and options for
addressing third-party considerations.
Third party GAI integrations may give rise to increased intellectual
property, data privacy, or information
security risks, pointing to the need for clear guidelines for
transparency and risk management regarding
the collection and use of third-party data for model inputs.
Organizations may consider varying risk
controls for foundation models, fine-tuned models, and embedded tools,
enhanced processes for
interacting with external GAI technologies or service providers.
Organizations can apply standard or
existing risk controls and processes to proprietary or open-source GAI
technologies, data, and third-party
service providers, including acquisition and procurement due diligence,
requests for software bills of
materials (SBOMs), application of service level agreements (SLAs), and
statement on standards for
attestation engagement (SSAE) reports to help with third-party
transparency and risk management for
GAI systems.
A.1.4. Pre-Deployment Testing
Overview
The diverse ways and contexts in which GAI systems may be developed,
used, and repurposed
complicates risk mapping and pre-deployment measurement efforts. Robust
test, evaluation, validation,
and verification (TEVV) processes can be iteratively applied – and
documented – in early stages of the AI
lifecycle and informed by representative AI Actors (see Figure 3 of the
AI RMF). Until new and rigorous
- >-
About AI at NIST: The National Institute of Standards and Technology
(NIST) develops measurements,
technology, tools, and standards to advance reliable, safe, transparent,
explainable, privacy-enhanced,
and fair artificial intelligence (AI) so that its full commercial and
societal benefits can be realized without
harm to people or the planet. NIST, which has conducted both fundamental
and applied work on AI for
more than a decade, is also helping to fulfill the 2023 Executive Order
on Safe, Secure, and Trustworthy
AI. NIST established the U.S. AI Safety Institute and the companion AI
Safety Institute Consortium to
continue the efforts set in motion by the E.O. to build the science
necessary for safe, secure, and
trustworthy development and use of AI.
Acknowledgments: This report was accomplished with the many helpful
comments and contributions
from the community, including the NIST Generative AI Public Working
Group, and NIST staff and guest
researchers: Chloe Autio, Jesse Dunietz, Patrick Hall, Shomik Jain,
Kamie Roberts, Reva Schwartz, Martin
Stanley, and Elham Tabassi.
NIST Technical Series Policies
Copyright, Use, and Licensing Statements
NIST Technical Series Publication Identifier Syntax
Publication History
Approved by the NIST Editorial Review Board on 07-25-2024
Contact Information
[email protected]
National Institute of Standards and Technology
Attn: NIST AI Innovation Lab, Information Technology Laboratory
100 Bureau Drive (Mail Stop 8900) Gaithersburg, MD 20899-8900
Additional Information
Additional information about this publication and other NIST AI
publications are available at
https://airc.nist.gov/Home.
Disclaimer: Certain commercial entities, equipment, or materials may be
identified in this document in
order to adequately describe an experimental procedure or concept. Such
identification is not intended to
imply recommendation or endorsement by the National Institute of
Standards and Technology, nor is it
intended to imply that the entities, materials, or equipment are
necessarily the best available for the
purpose. Any mention of commercial, non-profit, academic partners, or
their products, or references is
for information only; it is not intended to imply endorsement or
recommendation by any U.S.
Government agency.
- >-
8
Trustworthy AI Characteristics: Accountable and Transparent, Privacy
Enhanced, Safe, Secure and
Resilient
2.5. Environmental Impacts
Training, maintaining, and operating (running inference on) GAI systems
are resource-intensive activities,
with potentially large energy and environmental footprints. Energy and
carbon emissions vary based on
what is being done with the GAI model (i.e., pre-training, fine-tuning,
inference), the modality of the
content, hardware used, and type of task or application.
Current estimates suggest that training a single transformer LLM can
emit as much carbon as 300 round-
trip flights between San Francisco and New York. In a study comparing
energy consumption and carbon
emissions for LLM inference, generative tasks (e.g., text summarization)
were found to be more energy-
and carbon-intensive than discriminative or non-generative tasks (e.g.,
text classification).
Methods for creating smaller versions of trained models, such as model
distillation or compression,
could reduce environmental impacts at inference time, but training and
tuning such models may still
contribute to their environmental impacts. Currently there is no agreed
upon method to estimate
environmental impacts from GAI.
Trustworthy AI Characteristics: Accountable and Transparent, Safe
2.6. Harmful Bias and Homogenization
Bias exists in many forms and can become ingrained in automated systems.
AI systems, including GAI
systems, can increase the speed and scale at which harmful biases
manifest and are acted upon,
potentially perpetuating and amplifying harms to individuals, groups,
communities, organizations, and
society. For example, when prompted to generate images of CEOs, doctors,
lawyers, and judges, current
text-to-image models underrepresent women and/or racial minorities, and
people with disabilities.
Image generator models have also produced biased or stereotyped output
for various demographic
groups and have difficulty producing non-stereotyped content even when the
prompt specifically
requests image features that are inconsistent with the stereotypes.
Harmful bias in GAI models, which
may stem from their training data, can also cause representational harms
or perpetuate or exacerbate
bias based on race, gender, disability, or other protected classes.
Harmful bias in GAI systems can also lead to harms via disparities
between how a model performs for
different subgroups or languages (e.g., an LLM may perform less well for
non-English languages or
certain dialects). Such disparities can contribute to discriminatory
decision-making or amplification of
existing societal biases. In addition, GAI systems may be
inappropriately trusted to perform similarly
across all subgroups, which could leave the groups facing
underperformance with worse outcomes than
if no GAI system were used. Disparate or reduced performance for
lower-resource languages also
presents challenges to model adoption, inclusion, and accessibility, and
may make preservation of
endangered languages more difficult if GAI systems become embedded in
everyday processes that would
otherwise have been opportunities to use these languages.
Bias is mutually reinforcing with the problem of undesired
homogenization, in which GAI systems
produce skewed distributions of outputs that are overly uniform (for
example, repetitive aesthetic styles
SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("danicafisher/dfisher-base-sentence-transformer")
# Run inference
sentences = [
'How can organizations address risks associated with the use of third-party data for GAI model inputs?',
'48 \n• Data protection \n• Data retention \n• Consistency in use of defining key terms \n• Decommissioning \n• Discouraging anonymous use \n• Education \n• Impact assessments \n• Incident response \n• Monitoring \n• Opt-outs \n• Risk-based controls \n• Risk mapping and measurement \n• Science-backed TEVV practices \n• Secure software development practices \n• Stakeholder engagement \n• Synthetic content detection and \nlabeling tools and techniques \n• Whistleblower protections \n• Workforce diversity and \ninterdisciplinary teams\nEstablishing acceptable use policies and guidance for the use of GAI in formal human-AI teaming settings \nas well as different levels of human-AI configurations can help to decrease risks arising from misuse, \nabuse, inappropriate repurpose, and misalignment between systems and users. These practices are just \none example of adapting existing governance protocols for GAI contexts. \nA.1.3. Third-Party Considerations \nOrganizations may seek to acquire, embed, incorporate, or use open-source or proprietary third-party \nGAI models, systems, or generated data for various applications across an enterprise. Use of these GAI \ntools and inputs has implications for all functions of the organization – including but not limited to \nacquisition, human resources, legal, compliance, and IT services – regardless of whether they are carried \nout by employees or third parties. Many of the actions cited above are relevant and options for \naddressing third-party considerations. \nThird party GAI integrations may give rise to increased intellectual property, data privacy, or information \nsecurity risks, pointing to the need for clear guidelines for transparency and risk management regarding \nthe collection and use of third-party data for model inputs. Organizations may consider varying risk \ncontrols for foundation models, fine-tuned models, and embedded tools, enhanced processes for \ninteracting with external GAI technologies or service providers. Organizations can apply standard or \nexisting risk controls and processes to proprietary or open-source GAI technologies, data, and third-party \nservice providers, including acquisition and procurement due diligence, requests for software bills of \nmaterials (SBOMs), application of service level agreements (SLAs), and statement on standards for \nattestation engagement (SSAE) reports to help with third-party transparency and risk management for \nGAI systems. \nA.1.4. Pre-Deployment Testing \nOverview \nThe diverse ways and contexts in which GAI systems may be developed, used, and repurposed \ncomplicates risk mapping and pre-deployment measurement efforts. Robust test, evaluation, validation, \nand verification (TEVV) processes can be iteratively applied – and documented – in early stages of the AI \nlifecycle and informed by representative AI Actors (see Figure 3 of the AI RMF). Until new and rigorous',
'8 \nTrustworthy AI Characteristics: Accountable and Transparent, Privacy Enhanced, Safe, Secure and \nResilient \n2.5. Environmental Impacts \nTraining, maintaining, and operating (running inference on) GAI systems are resource-intensive activities, \nwith potentially large energy and environmental footprints. Energy and carbon emissions vary based on \nwhat is being done with the GAI model (i.e., pre-training, fine-tuning, inference), the modality of the \ncontent, hardware used, and type of task or application. \nCurrent estimates suggest that training a single transformer LLM can emit as much carbon as 300 round-\ntrip flights between San Francisco and New York. In a study comparing energy consumption and carbon \nemissions for LLM inference, generative tasks (e.g., text summarization) were found to be more energy- \nand carbon-intensive than discriminative or non-generative tasks (e.g., text classification). \nMethods for creating smaller versions of trained models, such as model distillation or compression, \ncould reduce environmental impacts at inference time, but training and tuning such models may still \ncontribute to their environmental impacts. Currently there is no agreed upon method to estimate \nenvironmental impacts from GAI. \nTrustworthy AI Characteristics: Accountable and Transparent, Safe \n2.6. Harmful Bias and Homogenization \nBias exists in many forms and can become ingrained in automated systems. AI systems, including GAI \nsystems, can increase the speed and scale at which harmful biases manifest and are acted upon, \npotentially perpetuating and amplifying harms to individuals, groups, communities, organizations, and \nsociety. For example, when prompted to generate images of CEOs, doctors, lawyers, and judges, current \ntext-to-image models underrepresent women and/or racial minorities, and people with disabilities. \nImage generator models have also produced biased or stereotyped output for various demographic \ngroups and have difficulty producing non-stereotyped content even when the prompt specifically \nrequests image features that are inconsistent with the stereotypes. Harmful bias in GAI models, which \nmay stem from their training data, can also cause representational harms or perpetuate or exacerbate \nbias based on race, gender, disability, or other protected classes. \nHarmful bias in GAI systems can also lead to harms via disparities between how a model performs for \ndifferent subgroups or languages (e.g., an LLM may perform less well for non-English languages or \ncertain dialects). Such disparities can contribute to discriminatory decision-making or amplification of \nexisting societal biases. In addition, GAI systems may be inappropriately trusted to perform similarly \nacross all subgroups, which could leave the groups facing underperformance with worse outcomes than \nif no GAI system were used. Disparate or reduced performance for lower-resource languages also \npresents challenges to model adoption, inclusion, and accessibility, and may make preservation of \nendangered languages more difficult if GAI systems become embedded in everyday processes that would \notherwise have been opportunities to use these languages. \nBias is mutually reinforcing with the problem of undesired homogenization, in which GAI systems \nproduce skewed distributions of outputs that are overly uniform (for example, repetitive aesthetic styles',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 128 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 128 samples:
sentence_0 sentence_1 type string string details - min: 17 tokens
- mean: 23.14 tokens
- max: 38 tokens
- min: 56 tokens
- mean: 247.42 tokens
- max: 256 tokens
- Samples:
sentence_0 sentence_1 What measures are suggested to assess the environmental impact of AI model training and management activities?
37
MS-2.11-005
Assess the proportion of synthetic to non-synthetic training data and verify
training data is not overly homogenous or GAI-produced to mitigate concerns of
model collapse.
Harmful Bias and Homogenization
AI Actor Tasks: AI Deployment, AI Impact Assessment, Affected Individuals and Communities, Domain Experts, End-Users,
Operation and Monitoring, TEVV
MEASURE 2.12: Environmental impact and sustainability of AI model training and management activities – as identified in the MAP
function – are assessed and documented.
Action ID
Suggested Action
GAI Risks
MS-2.12-001 Assess safety to physical environments when deploying GAI systems.
Dangerous, Violent, or Hateful
Content
MS-2.12-002 Document anticipated environmental impacts of model development,
maintenance, and deployment in product design decisions.
Environmental
MS-2.12-003
Measure or estimate environmental impacts (e.g., energy and water
consumption) for training, fine tuning, and deploying models: Verify tradeoffs
between resources used at inference time versus additional resources required
at training time.
Environmental
MS-2.12-004 Verify effectiveness of carbon capture or offset programs for GAI training and
applications, and address green-washing concerns.
Environmental
AI Actor Tasks: AI Deployment, AI Impact Assessment, Domain Experts, Operation and Monitoring, TEVVWhat are some limitations of current pre-deployment testing approaches for GAI applications?
49
early lifecycle TEVV approaches are developed and matured for GAI, organizations may use
recommended “pre-deployment testing” practices to measure performance, capabilities, limits, risks,
and impacts. This section describes risk measurement and estimation as part of pre-deployment TEVV,
and examines the state of play for pre-deployment testing methodologies.
Limitations of Current Pre-deployment Test Approaches
Currently available pre-deployment TEVV processes used for GAI applications may be inadequate, non-
systematically applied, or fail to reflect or mismatched to deployment contexts. For example, the
anecdotal testing of GAI system capabilities through video games or standardized tests designed for
humans (e.g., intelligence tests, professional licensing exams) does not guarantee GAI system validity or
reliability in those domains. Similarly, jailbreaking or prompt engineering tests may not systematically
assess validity or reliability risks.
Measurement gaps can arise from mismatches between laboratory and real-world settings. Current
testing approaches often remain focused on laboratory conditions or restricted to benchmark test
datasets and in silico techniques that may not extrapolate well to—or directly assess GAI impacts in real-
world conditions. For example, current measurement gaps for GAI make it difficult to precisely estimate
its potential ecosystem-level or longitudinal risks and related political, social, and economic impacts.
Gaps between benchmarks and real-world use of GAI systems may likely be exacerbated due to prompt
sensitivity and broad heterogeneity of contexts of use.
A.1.5. Structured Public Feedback
Structured public feedback can be used to evaluate whether GAI systems are performing as intended
and to calibrate and verify traditional measurement methods. Examples of structured feedback include,
but are not limited to:
•
Participatory Engagement Methods: Methods used to solicit feedback from civil society groups,
affected communities, and users, including focus groups, small user studies, and surveys.
•
Field Testing: Methods used to determine how people interact with, consume, use, and make
sense of AI-generated information, and subsequent actions and effects, including UX, usability,
and other structured, randomized experiments.
•
AI Red-teaming: A structured testing exercise used to probe an AI system to find flaws and
vulnerabilities such as inaccurate, harmful, or discriminatory outputs, often in a controlled
environment and in collaboration with system developers.
Information gathered from structured public feedback can inform design, implementation, deployment
approval, maintenance, or decommissioning decisions. Results and insights gleaned from these exercises
can serve multiple purposes, including improving data quality and preprocessing, bolstering governance
decision making, and enhancing system documentation and debugging practices. When implementing
feedback activities, organizations should follow human subjects research requirements and best
practices such as informed consent and subject compensation.How can organizations adjust their governance regimes to effectively manage the unique risks associated with generative AI?
47
Appendix A. Primary GAI Considerations
The following primary considerations were derived as overarching themes from the GAI PWG
consultation process. These considerations (Governance, Pre-Deployment Testing, Content Provenance,
and Incident Disclosure) are relevant for voluntary use by any organization designing, developing, and
using GAI and also inform the Actions to Manage GAI risks. Information included about the primary
considerations is not exhaustive, but highlights the most relevant topics derived from the GAI PWG.
Acknowledgments: These considerations could not have been surfaced without the helpful analysis and
contributions from the community and NIST staff GAI PWG leads: George Awad, Luca Belli, Harold Booth,
Mat Heyman, Yooyoung Lee, Mark Pryzbocki, Reva Schwartz, Martin Stanley, and Kyra Yee.
A.1. Governance
A.1.1. Overview
Like any other technology system, governance principles and techniques can be used to manage risks
related to generative AI models, capabilities, and applications. Organizations may choose to apply their
existing risk tiering to GAI systems, or they may opt to revise or update AI system risk levels to address
these unique GAI risks. This section describes how organizational governance regimes may be re-
evaluated and adjusted for GAI contexts. It also addresses third-party considerations for governing across
the AI value chain.
A.1.2. Organizational Governance
GAI opportunities, risks and long-term performance characteristics are typically less well-understood
than non-generative AI tools and may be perceived and acted upon by humans in ways that vary greatly.
Accordingly, GAI may call for different levels of oversight from AI Actors or different human-AI
configurations in order to manage their risks effectively. Organizations’ use of GAI systems may also
warrant additional human review, tracking and documentation, and greater management oversight.
AI technology can produce varied outputs in multiple modalities and present many classes of user
interfaces. This leads to a broader set of AI Actors interacting with GAI systems for widely differing
applications and contexts of use. These can include data labeling and preparation, development of GAI
models, content moderation, code generation and review, text generation and editing, image and video
generation, summarization, search, and chat. These activities can take place within organizational
settings or in the public domain.
Organizations can restrict AI applications that cause harm, exceed stated risk tolerances, or that conflict
with their tolerances or values. Governance tools and protocols that are applied to other types of AI
systems can be applied to GAI systems. These plans and actions include:
• Accessibility and reasonable
accommodations
• AI actor credentials and qualifications
• Alignment to organizational values
• Auditing and assessment
• Change-management controls
• Commercial use
• Data provenance - Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 20per_device_eval_batch_size
: 20num_train_epochs
: 10multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 20per_device_eval_batch_size
: 20per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 10max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.1.1
- Transformers: 4.44.2
- PyTorch: 2.4.1+cu121
- Accelerate: 0.34.2
- Datasets: 3.0.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}