--- base_model: sentence-transformers/all-MiniLM-L6-v2 library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:128 - loss:MultipleNegativesRankingLoss widget: - source_sentence: What is the title of the publication released by NIST in July 2024 regarding artificial intelligence? sentences: - "NIST Trustworthy and Responsible AI \nNIST AI 600-1 \nArtificial Intelligence\ \ Risk Management \nFramework: Generative Artificial \nIntelligence Profile \n\ \ \n \n \nThis publication is available free of charge from: \nhttps://doi.org/10.6028/NIST.AI.600-1" - "NIST Trustworthy and Responsible AI \nNIST AI 600-1 \nArtificial Intelligence\ \ Risk Management \nFramework: Generative Artificial \nIntelligence Profile \n\ \ \n \n \nThis publication is available free of charge from: \nhttps://doi.org/10.6028/NIST.AI.600-1\ \ \n \nJuly 2024 \n \n \n \n \nU.S. Department of Commerce \nGina M. Raimondo,\ \ Secretary \nNational Institute of Standards and Technology \nLaurie E. Locascio,\ \ NIST Director and Under Secretary of Commerce for Standards and Technology" - "37 \nMS-2.11-005 \nAssess the proportion of synthetic to non-synthetic training\ \ data and verify \ntraining data is not overly homogenous or GAI-produced to\ \ mitigate concerns of \nmodel collapse. \nHarmful Bias and Homogenization \n\ AI Actor Tasks: AI Deployment, AI Impact Assessment, Affected Individuals and Communities,\ \ Domain Experts, End-Users, \nOperation and Monitoring, TEVV \n \nMEASURE 2.12:\ \ Environmental impact and sustainability of AI model training and management\ \ activities – as identified in the MAP \nfunction – are assessed and documented.\ \ \nAction ID \nSuggested Action \nGAI Risks \nMS-2.12-001 Assess safety to physical\ \ environments when deploying GAI systems. \nDangerous, Violent, or Hateful \n\ Content \nMS-2.12-002 Document anticipated environmental impacts of model development,\ \ \nmaintenance, and deployment in product design decisions. \nEnvironmental \n\ MS-2.12-003 \nMeasure or estimate environmental impacts (e.g., energy and water\ \ \nconsumption) for training, fine tuning, and deploying models: Verify tradeoffs\ \ \nbetween resources used at inference time versus additional resources required\ \ \nat training time. \nEnvironmental \nMS-2.12-004 Verify effectiveness of carbon\ \ capture or offset programs for GAI training and \napplications, and address green-washing\ \ concerns. \nEnvironmental \nAI Actor Tasks: AI Deployment, AI Impact Assessment,\ \ Domain Experts, Operation and Monitoring, TEVV" - source_sentence: What are the four primary considerations relevant to Generative AI (GAI) that the GAI Public Working Group focused on? sentences: - "23 \nMP-1.1-002 \nDetermine and document the expected and acceptable GAI system\ \ context of \nuse in collaboration with socio-cultural and other domain experts,\ \ by assessing: \nAssumptions and limitations; Direct value to the organization;\ \ Intended \noperational environment and observed usage patterns; Potential positive\ \ and \nnegative impacts to individuals, public safety, groups, communities, \n\ organizations, democratic institutions, and the physical environment; Social \n\ norms and expectations. \nHarmful Bias and Homogenization \nMP-1.1-003 \nDocument\ \ risk measurement plans to address identified risks. Plans may \ninclude, as applicable:\ \ Individual and group cognitive biases (e.g., confirmation \nbias, funding bias,\ \ groupthink) for AI Actors involved in the design, \nimplementation, and use\ \ of GAI systems; Known past GAI system incidents and \nfailure modes; In-context\ \ use and foreseeable misuse, abuse, and off-label use; \nOver reliance on quantitative\ \ metrics and methodologies without sufficient \nawareness of their limitations\ \ in the context(s) of use; Standard measurement \nand structured human feedback\ \ approaches; Anticipated human-AI \nconfigurations. \nHuman-AI Configuration; Harmful\ \ \nBias and Homogenization; \nDangerous, Violent, or Hateful \nContent \nMP-1.1-004\ \ \nIdentify and document foreseeable illegal uses or applications of the GAI\ \ system \nthat surpass organizational risk tolerances. \nCBRN Information or\ \ Capabilities; \nDangerous, Violent, or Hateful \nContent; Obscene, Degrading,\ \ \nand/or Abusive Content \nAI Actor Tasks: AI Deployment \n \nMAP 1.2: Interdisciplinary\ \ AI Actors, competencies, skills, and capacities for establishing context reflect\ \ demographic diversity and \nbroad domain and user experience expertise, and\ \ their participation is documented. Opportunities for interdisciplinary \ncollaboration\ \ are prioritized. \nAction ID \nSuggested Action \nGAI Risks \nMP-1.2-001 \n\ Establish and empower interdisciplinary teams that reflect a wide range of \ncapabilities,\ \ competencies, demographic groups, domain expertise, educational \nbackgrounds,\ \ lived experiences, professions, and skills across the enterprise to \ninform\ \ and conduct risk measurement and management functions. \nHuman-AI Configuration;\ \ Harmful \nBias and Homogenization \nMP-1.2-002 \nVerify that data or benchmarks\ \ used in risk measurement, and users, \nparticipants, or subjects involved in\ \ structured GAI public feedback exercises \nare representative of diverse in-context\ \ user populations. \nHuman-AI Configuration; Harmful \nBias and Homogenization\ \ \nAI Actor Tasks: AI Deployment" - "2 \nThis work was informed by public feedback and consultations with diverse\ \ stakeholder groups as part of NIST’s \nGenerative AI Public Working Group (GAI\ \ PWG). The GAI PWG was an open, transparent, and collaborative \nprocess, facilitated\ \ via a virtual workspace, to obtain multistakeholder input on GAI risk management\ \ and to \ninform NIST’s approach. \nThe focus of the GAI PWG was limited to four\ \ primary considerations relevant to GAI: Governance, Content \nProvenance, Pre-deployment\ \ Testing, and Incident Disclosure (further described in Appendix A). As such,\ \ the \nsuggested actions in this document primarily address these considerations.\ \ \nFuture revisions of this profile will include additional AI RMF subcategories,\ \ risks, and suggested actions based \non additional considerations of GAI as\ \ the space evolves and empirical evidence indicates additional risks. A \nglossary\ \ of terms pertinent to GAI risk management will be developed and hosted on NIST’s\ \ Trustworthy & \nResponsible AI Resource Center (AIRC), and added to The Language\ \ of Trustworthy AI: An In-Depth Glossary of \nTerms. \nThis document was also\ \ informed by public comments and consultations from several Requests for Information.\ \ \n \n2. \nOverview of Risks Unique to or Exacerbated by GAI \nIn the context\ \ of the AI RMF, risk refers to the composite measure of an event’s probability\ \ (or \nlikelihood) of occurring and the magnitude or degree of the consequences\ \ of the corresponding event. \nSome risks can be assessed as likely to materialize\ \ in a given context, particularly those that have been \nempirically demonstrated\ \ in similar contexts. Other risks may be unlikely to materialize in a given \n\ context, or may be more speculative and therefore uncertain. \nAI risks can differ\ \ from or intensify traditional software risks. Likewise, GAI can exacerbate existing\ \ AI \nrisks, and creates unique risks. GAI risks can vary along many dimensions:\ \ \n• \nStage of the AI lifecycle: Risks can arise during design, development,\ \ deployment, operation, \nand/or decommissioning. \n• \nScope: Risks may exist\ \ at individual model or system levels, at the application or implementation \n\ levels (i.e., for a specific use case), or at the ecosystem level – that is, beyond\ \ a single system or \norganizational context. Examples of the latter include\ \ the expansion of “algorithmic \nmonocultures,3” resulting from repeated use\ \ of the same model, or impacts on access to \nopportunity, labor markets, and\ \ the creative economies.4 \n• \nSource of risk: Risks may emerge from factors\ \ related to the design, training, or operation of the \nGAI model itself, stemming\ \ in some cases from GAI model or system inputs, and in other cases, \nfrom GAI\ \ system outputs. Many GAI risks, however, originate from human behavior, including\ \ \n \n \n3 “Algorithmic monocultures” refers to the phenomenon in which repeated\ \ use of the same model or algorithm in \nconsequential decision-making settings\ \ like employment and lending can result in increased susceptibility by \nsystems\ \ to correlated failures (like unexpected shocks), due to multiple actors relying\ \ on the same algorithm. \n4 Many studies have projected the impact of AI on\ \ the workforce and labor markets. Fewer studies have examined \nthe impact of\ \ GAI on the labor market, though some industry surveys indicate that that both\ \ employees and \nemployers are pondering this disruption." - "44 \nMG-3.2-007 \nLeverage feedback and recommendations from organizational boards\ \ or \ncommittees related to the deployment of GAI applications and content \n\ provenance when using third-party pre-trained models. \nInformation Integrity;\ \ Value Chain \nand Component Integration \nMG-3.2-008 \nUse human moderation\ \ systems where appropriate to review generated content \nin accordance with human-AI\ \ configuration policies established in the Govern \nfunction, aligned with socio-cultural\ \ norms in the context of use, and for settings \nwhere AI models are demonstrated\ \ to perform poorly. \nHuman-AI Configuration \nMG-3.2-009 \nUse organizational\ \ risk tolerance to evaluate acceptable risks and performance \nmetrics and decommission\ \ or retrain pre-trained models that perform outside of \ndefined limits. \nCBRN\ \ Information or Capabilities; \nConfabulation \nAI Actor Tasks: AI Deployment,\ \ Operation and Monitoring, Third-party entities \n \nMANAGE 4.1: Post-deployment\ \ AI system monitoring plans are implemented, including mechanisms for capturing\ \ and evaluating \ninput from users and other relevant AI Actors, appeal and override,\ \ decommissioning, incident response, recovery, and change \nmanagement. \nAction\ \ ID \nSuggested Action \nGAI Risks \nMG-4.1-001 \nCollaborate with external researchers,\ \ industry experts, and community \nrepresentatives to maintain awareness of emerging\ \ best practices and \ntechnologies in measuring and managing identified risks.\ \ \nInformation Integrity; Harmful Bias \nand Homogenization \nMG-4.1-002 \nEstablish,\ \ maintain, and evaluate effectiveness of organizational processes and \nprocedures\ \ for post-deployment monitoring of GAI systems, particularly for \npotential\ \ confabulation, CBRN, or cyber risks. \nCBRN Information or Capabilities; \n\ Confabulation; Information \nSecurity \nMG-4.1-003 \nEvaluate the use of sentiment\ \ analysis to gauge user sentiment regarding GAI \ncontent performance and impact,\ \ and work in collaboration with AI Actors \nexperienced in user research and\ \ experience. \nHuman-AI Configuration \nMG-4.1-004 Implement active learning techniques\ \ to identify instances where the model fails \nor produces unexpected outputs.\ \ \nConfabulation \nMG-4.1-005 \nShare transparency reports with internal and\ \ external stakeholders that detail \nsteps taken to update the GAI system to\ \ enhance transparency and \naccountability. \nHuman-AI Configuration; Harmful\ \ \nBias and Homogenization \nMG-4.1-006 \nTrack dataset modifications for provenance\ \ by monitoring data deletions, \nrectification requests, and other changes that\ \ may impact the verifiability of \ncontent origins. \nInformation Integrity" - source_sentence: What techniques should be deployed to verify the accuracy and veracity of information generated by GAI systems? sentences: - "10 \nGAI systems can ease the unintentional production or dissemination of false,\ \ inaccurate, or misleading \ncontent (misinformation) at scale, particularly\ \ if the content stems from confabulations. \nGAI systems can also ease the deliberate\ \ production or dissemination of false or misleading information \n(disinformation)\ \ at scale, where an actor has the explicit intent to deceive or cause harm to\ \ others. Even \nvery subtle changes to text or images can manipulate human and\ \ machine perception. \nSimilarly, GAI systems could enable a higher degree of\ \ sophistication for malicious actors to produce \ndisinformation that is targeted\ \ towards specific demographics. Current and emerging multimodal models \nmake\ \ it possible to generate both text-based disinformation and highly realistic\ \ “deepfakes” – that is, \nsynthetic audiovisual content and photorealistic images.12\ \ Additional disinformation threats could be \nenabled by future GAI models trained\ \ on new data modalities. \nDisinformation and misinformation – both of which\ \ may be facilitated by GAI – may erode public trust in \ntrue or valid evidence\ \ and information, with downstream effects. For example, a synthetic image of a\ \ \nPentagon blast went viral and briefly caused a drop in the stock market. Generative\ \ AI models can also \nassist malicious actors in creating compelling imagery\ \ and propaganda to support disinformation \ncampaigns, which may not be photorealistic,\ \ but could enable these campaigns to gain more reach and \nengagement on social\ \ media platforms. Additionally, generative AI models can assist malicious actors\ \ in \ncreating fraudulent content intended to impersonate others. \nTrustworthy\ \ AI Characteristics: Accountable and Transparent, Safe, Valid and Reliable, Interpretable\ \ and \nExplainable \n2.9. Information Security \nInformation security for computer\ \ systems and data is a mature field with widely accepted and \nstandardized practices\ \ for offensive and defensive cyber capabilities. GAI-based systems present two\ \ \nprimary information security risks: GAI could potentially discover or enable\ \ new cybersecurity risks by \nlowering the barriers for or easing automated exercise\ \ of offensive capabilities; simultaneously, it \nexpands the available attack\ \ surface, as GAI itself is vulnerable to attacks like prompt injection or data\ \ \npoisoning. \nOffensive cyber capabilities advanced by GAI systems may augment\ \ cybersecurity attacks such as \nhacking, malware, and phishing. Reports have\ \ indicated that LLMs are already able to discover some \nvulnerabilities in systems\ \ (hardware, software, data) and write code to exploit them. Sophisticated threat\ \ \nactors might further these risks by developing GAI-powered security co-pilots\ \ for use in several parts of \nthe attack chain, including informing attackers\ \ on how to proactively evade threat detection and escalate \nprivileges after\ \ gaining system access. \nInformation security for GAI models and systems also\ \ includes maintaining availability of the GAI system \nand the integrity and\ \ (when applicable) the confidentiality of the GAI code, training data, and model\ \ \nweights. To identify and secure potential attack points in AI systems or specific\ \ components of the AI \n \n \n12 See also https://doi.org/10.6028/NIST.AI.100-4,\ \ to be published." - "25 \nMP-2.3-002 Review and document accuracy, representativeness, relevance,\ \ suitability of data \nused at different stages of AI life cycle. \nHarmful Bias\ \ and Homogenization; \nIntellectual Property \nMP-2.3-003 \nDeploy and document\ \ fact-checking techniques to verify the accuracy and \nveracity of information\ \ generated by GAI systems, especially when the \ninformation comes from multiple\ \ (or unknown) sources. \nInformation Integrity \nMP-2.3-004 Develop and implement\ \ testing techniques to identify GAI produced content (e.g., \nsynthetic media)\ \ that might be indistinguishable from human-generated content. Information Integrity\ \ \nMP-2.3-005 Implement plans for GAI systems to undergo regular adversarial\ \ testing to identify \nvulnerabilities and potential manipulation or misuse.\ \ \nInformation Security \nAI Actor Tasks: AI Development, Domain Experts, TEVV\ \ \n \nMAP 3.4: Processes for operator and practitioner proficiency with AI system\ \ performance and trustworthiness – and relevant \ntechnical standards and certifications\ \ – are defined, assessed, and documented. \nAction ID \nSuggested Action \nGAI\ \ Risks \nMP-3.4-001 \nEvaluate whether GAI operators and end-users can accurately\ \ understand \ncontent lineage and origin. \nHuman-AI Configuration; \nInformation\ \ Integrity \nMP-3.4-002 Adapt existing training programs to include modules on\ \ digital content \ntransparency. \nInformation Integrity \nMP-3.4-003 Develop\ \ certification programs that test proficiency in managing GAI risks and \ninterpreting\ \ content provenance, relevant to specific industry and context. \nInformation\ \ Integrity \nMP-3.4-004 Delineate human proficiency tests from tests of GAI capabilities.\ \ \nHuman-AI Configuration \nMP-3.4-005 Implement systems to continually monitor\ \ and track the outcomes of human-GAI \nconfigurations for future refinement and\ \ improvements. \nHuman-AI Configuration; \nInformation Integrity \nMP-3.4-006\ \ \nInvolve the end-users, practitioners, and operators in GAI system in prototyping\ \ \nand testing activities. Make sure these tests cover various scenarios, such\ \ as crisis \nsituations or ethically sensitive contexts. \nHuman-AI Configuration;\ \ \nInformation Integrity; Harmful Bias \nand Homogenization; Dangerous, \nViolent,\ \ or Hateful Content \nAI Actor Tasks: AI Design, AI Development, Domain Experts,\ \ End-Users, Human Factors, Operation and Monitoring" - "27 \nMP-4.1-010 \nConduct appropriate diligence on training data use to assess\ \ intellectual property, \nand privacy, risks, including to examine whether use\ \ of proprietary or sensitive \ntraining data is consistent with applicable laws.\ \ \nIntellectual Property; Data Privacy \nAI Actor Tasks: Governance and Oversight,\ \ Operation and Monitoring, Procurement, Third-party entities \n \nMAP 5.1: Likelihood\ \ and magnitude of each identified impact (both potentially beneficial and harmful)\ \ based on expected use, past \nuses of AI systems in similar contexts, public\ \ incident reports, feedback from those external to the team that developed or\ \ deployed \nthe AI system, or other data are identified and documented. \nAction\ \ ID \nSuggested Action \nGAI Risks \nMP-5.1-001 Apply TEVV practices for content\ \ provenance (e.g., probing a system's synthetic \ndata generation capabilities\ \ for potential misuse or vulnerabilities. \nInformation Integrity; Information\ \ \nSecurity \nMP-5.1-002 \nIdentify potential content provenance harms of GAI,\ \ such as misinformation or \ndisinformation, deepfakes, including NCII, or tampered\ \ content. Enumerate and \nrank risks based on their likelihood and potential\ \ impact, and determine how well \nprovenance solutions address specific risks\ \ and/or harms. \nInformation Integrity; Dangerous, \nViolent, or Hateful Content;\ \ \nObscene, Degrading, and/or \nAbusive Content \nMP-5.1-003 \nConsider disclosing\ \ use of GAI to end users in relevant contexts, while considering \nthe objective\ \ of disclosure, the context of use, the likelihood and magnitude of the \nrisk\ \ posed, the audience of the disclosure, as well as the frequency of the \ndisclosures.\ \ \nHuman-AI Configuration \nMP-5.1-004 Prioritize GAI structured public feedback\ \ processes based on risk assessment \nestimates. \nInformation Integrity; CBRN\ \ \nInformation or Capabilities; \nDangerous, Violent, or Hateful \nContent; Harmful\ \ Bias and \nHomogenization \nMP-5.1-005 Conduct adversarial role-playing exercises,\ \ GAI red-teaming, or chaos testing to \nidentify anomalous or unforeseen failure\ \ modes. \nInformation Security \nMP-5.1-006 \nProfile threats and negative impacts\ \ arising from GAI systems interacting with, \nmanipulating, or generating content,\ \ and outlining known and potential \nvulnerabilities and the likelihood of their\ \ occurrence. \nInformation Security \nAI Actor Tasks: AI Deployment, AI Design,\ \ AI Development, AI Impact Assessment, Affected Individuals and Communities, End-\n\ Users, Operation and Monitoring" - source_sentence: What is the phenomenon referred to as "confabulation" in GAI systems? sentences: - "50 \nParticipatory Engagement Methods \nOn an ad hoc or more structured basis,\ \ organizations can design and use a variety of channels to engage \nexternal\ \ stakeholders in product development or review. Focus groups with select experts\ \ can provide \nfeedback on a range of issues. Small user studies can provide\ \ feedback from representative groups or \npopulations. Anonymous surveys can\ \ be used to poll or gauge reactions to specific features. Participatory \nengagement\ \ methods are often less structured than field testing or red teaming, and are\ \ more \ncommonly used in early stages of AI or product development. \nField\ \ Testing \nField testing involves structured settings to evaluate risks and impacts\ \ and to simulate the conditions \nunder which the GAI system will be deployed.\ \ Field style tests can be adapted from a focus on user \npreferences and experiences\ \ towards AI risks and impacts – both negative and positive. When carried \nout\ \ with large groups of users, these tests can provide estimations of the likelihood\ \ of risks and impacts \nin real world interactions. \nOrganizations may also\ \ collect feedback on outcomes, harms, and user experience directly from users\ \ in \nthe production environment after a model has been released, in accordance\ \ with human subject \nstandards such as informed consent and compensation. Organizations\ \ should follow applicable human \nsubjects research requirements, and best practices\ \ such as informed consent and subject compensation, \nwhen implementing feedback\ \ activities. \nAI Red-teaming \nAI red-teaming is an evolving practice that references\ \ exercises often conducted in a controlled \nenvironment and in collaboration\ \ with AI developers building AI models to identify potential adverse \nbehavior\ \ or outcomes of a GAI model or system, how they could occur, and stress test\ \ safeguards”. AI \nred-teaming can be performed before or after AI models or\ \ systems are made available to the broader \npublic; this section focuses on\ \ red-teaming in pre-deployment contexts. \nThe quality of AI red-teaming outputs\ \ is related to the background and expertise of the AI red team \nitself. Demographically\ \ and interdisciplinarily diverse AI red teams can be used to identify flaws in\ \ the \nvarying contexts where GAI will be used. For best results, AI red teams\ \ should demonstrate domain \nexpertise, and awareness of socio-cultural aspects\ \ within the deployment context. AI red-teaming results \nshould be given additional\ \ analysis before they are incorporated into organizational governance and \n\ decision making, policy and procedural updates, and AI risk management efforts.\ \ \nVarious types of AI red-teaming may be appropriate, depending on the use case:\ \ \n• \nGeneral Public: Performed by general users (not necessarily AI or technical\ \ experts) who are \nexpected to use the model or interact with its outputs, and\ \ who bring their own lived \nexperiences and perspectives to the task of AI red-teaming.\ \ These individuals may have been \nprovided instructions and material to complete\ \ tasks which may elicit harmful model behaviors. \nThis type of exercise can\ \ be more effective with large groups of AI red-teamers. \n• \nExpert: Performed\ \ by specialists with expertise in the domain or specific AI red-teaming context\ \ \nof use (e.g., medicine, biotech, cybersecurity). \n• \nCombination: In scenarios\ \ when it is difficult to identify and recruit specialists with sufficient \ndomain\ \ and contextual expertise, AI red-teaming exercises may leverage both expert\ \ and" - "54 \nAppendix B. References \nAcemoglu, D. (2024) The Simple Macroeconomics of\ \ AI https://www.nber.org/papers/w32487 \nAI Incident Database. https://incidentdatabase.ai/\ \ \nAtherton, D. (2024) Deepfakes and Child Safety: A Survey and Analysis of 2023\ \ Incidents and Responses. \nAI Incident Database. https://incidentdatabase.ai/blog/deepfakes-and-child-safety/\ \ \nBadyal, N. et al. (2023) Intentional Biases in LLM Responses. arXiv. https://arxiv.org/pdf/2311.07611\ \ \nBing Chat: Data Exfiltration Exploit Explained. Embrace The Red. \nhttps://embracethered.com/blog/posts/2023/bing-chat-data-exfiltration-poc-and-fix/\ \ \nBommasani, R. et al. (2022) Picking on the Same Person: Does Algorithmic Monoculture\ \ lead to Outcome \nHomogenization? arXiv. https://arxiv.org/pdf/2211.13972 \n\ Boyarskaya, M. et al. (2020) Overcoming Failures of Imagination in AI Infused\ \ System Development and \nDeployment. arXiv. https://arxiv.org/pdf/2011.13416\ \ \nBrowne, D. et al. (2023) Securing the AI Pipeline. Mandiant. \nhttps://www.mandiant.com/resources/blog/securing-ai-pipeline\ \ \nBurgess, M. (2024) Generative AI’s Biggest Security Flaw Is Not Easy to Fix.\ \ WIRED. \nhttps://www.wired.com/story/generative-ai-prompt-injection-hacking/\ \ \nBurtell, M. et al. (2024) The Surprising Power of Next Word Prediction: Large\ \ Language Models \nExplained, Part 1. Georgetown Center for Security and Emerging\ \ Technology. \nhttps://cset.georgetown.edu/article/the-surprising-power-of-next-word-prediction-large-language-\n\ models-explained-part-1/ \nCanadian Centre for Cyber Security (2023) Generative\ \ artificial intelligence (AI) - ITSAP.00.041. \nhttps://www.cyber.gc.ca/en/guidance/generative-artificial-intelligence-ai-itsap00041\ \ \nCarlini, N., et al. (2021) Extracting Training Data from Large Language Models.\ \ Usenix. \nhttps://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting\ \ \nCarlini, N. et al. (2023) Quantifying Memorization Across Neural Language\ \ Models. ICLR 2023. \nhttps://arxiv.org/pdf/2202.07646 \nCarlini, N. et al. (2024)\ \ Stealing Part of a Production Language Model. arXiv. \nhttps://arxiv.org/abs/2403.06634\ \ \nChandra, B. et al. (2023) Dismantling the Disinformation Business of Chinese\ \ Influence Operations. \nRAND. https://www.rand.org/pubs/commentary/2023/10/dismantling-the-disinformation-business-of-\n\ chinese.html \nCiriello, R. et al. (2024) Ethical Tensions in Human-AI Companionship:\ \ A Dialectical Inquiry into Replika. \nResearchGate. https://www.researchgate.net/publication/374505266_Ethical_Tensions_in_Human-\n\ AI_Companionship_A_Dialectical_Inquiry_into_Replika \nDahl, M. et al. (2024) Large\ \ Legal Fictions: Profiling Legal Hallucinations in Large Language Models. arXiv.\ \ \nhttps://arxiv.org/abs/2401.01301" - "6 \n2.2. Confabulation \n“Confabulation” refers to a phenomenon in which GAI\ \ systems generate and confidently present \nerroneous or false content in response\ \ to prompts. Confabulations also include generated outputs that \ndiverge from\ \ the prompts or other input or that contradict previously generated statements\ \ in the same \ncontext. These phenomena are colloquially also referred to as\ \ “hallucinations” or “fabrications.” \nConfabulations can occur across GAI outputs\ \ and contexts.9,10 Confabulations are a natural result of the \nway generative\ \ models are designed: they generate outputs that approximate the statistical\ \ distribution \nof their training data; for example, LLMs predict the next token\ \ or word in a sentence or phrase. While \nsuch statistical prediction can produce\ \ factually accurate and consistent outputs, it can also produce \noutputs that\ \ are factually inaccurate or internally inconsistent. This dynamic is particularly\ \ relevant when \nit comes to open-ended prompts for long-form responses and in\ \ domains which require highly \ncontextual and/or domain expertise. \nRisks\ \ from confabulations may arise when users believe false content – often due to\ \ the confident nature \nof the response – leading users to act upon or promote\ \ the false information. This poses a challenge for \nmany real-world applications,\ \ such as in healthcare, where a confabulated summary of patient \ninformation\ \ reports could cause doctors to make incorrect diagnoses and/or recommend the\ \ wrong \ntreatments. Risks of confabulated content may be especially important\ \ to monitor when integrating GAI \ninto applications involving consequential\ \ decision making. \nGAI outputs may also include confabulated logic or citations\ \ that purport to justify or explain the \nsystem’s answer, which may further\ \ mislead humans into inappropriately trusting the system’s output. \nFor instance,\ \ LLMs sometimes provide logical steps for how they arrived at an answer even\ \ when the \nanswer itself is incorrect. Similarly, an LLM could falsely assert\ \ that it is human or has human traits, \npotentially deceiving humans into believing\ \ they are speaking with another human. \nThe extent to which humans can be deceived\ \ by LLMs, the mechanisms by which this may occur, and the \npotential risks from\ \ adversarial prompting of such behavior are emerging areas of study. Given the\ \ wide \nrange of downstream impacts of GAI, it is difficult to estimate the downstream\ \ scale and impact of \nconfabulations. \nTrustworthy AI Characteristics: Fair\ \ with Harmful Bias Managed, Safe, Valid and Reliable, Explainable \nand Interpretable\ \ \n2.3. Dangerous, Violent, or Hateful Content \nGAI systems can produce content\ \ that is inciting, radicalizing, or threatening, or that glorifies violence, \n\ with greater ease and scale than other technologies. LLMs have been reported to\ \ generate dangerous or \nviolent recommendations, and some models have generated\ \ actionable instructions for dangerous or \n \n \n9 Confabulations of falsehoods\ \ are most commonly a problem for text-based outputs; for audio, image, or video\ \ \ncontent, creative generation of non-factual content can be a desired behavior.\ \ \n10 For example, legal confabulations have been shown to be pervasive in current\ \ state-of-the-art LLMs. See also, \ne.g.," - source_sentence: How can organizations address risks associated with the use of third-party data for GAI model inputs? sentences: - "48 \n• Data protection \n• Data retention \n• Consistency in use of defining\ \ key terms \n• Decommissioning \n• Discouraging anonymous use \n• Education \ \ \n• Impact assessments \n• Incident response \n• Monitoring \n• Opt-outs \n\ • Risk-based controls \n• Risk mapping and measurement \n• Science-backed TEVV\ \ practices \n• Secure software development practices \n• Stakeholder engagement\ \ \n• Synthetic content detection and \nlabeling tools and techniques \n• Whistleblower\ \ protections \n• Workforce diversity and \ninterdisciplinary teams\nEstablishing\ \ acceptable use policies and guidance for the use of GAI in formal human-AI teaming\ \ settings \nas well as different levels of human-AI configurations can help to\ \ decrease risks arising from misuse, \nabuse, inappropriate repurpose, and misalignment\ \ between systems and users. These practices are just \none example of adapting\ \ existing governance protocols for GAI contexts. \nA.1.3. Third-Party Considerations\ \ \nOrganizations may seek to acquire, embed, incorporate, or use open-source\ \ or proprietary third-party \nGAI models, systems, or generated data for various\ \ applications across an enterprise. Use of these GAI \ntools and inputs has implications\ \ for all functions of the organization – including but not limited to \nacquisition,\ \ human resources, legal, compliance, and IT services – regardless of whether\ \ they are carried \nout by employees or third parties. Many of the actions cited\ \ above are relevant and options for \naddressing third-party considerations.\ \ \nThird party GAI integrations may give rise to increased intellectual property,\ \ data privacy, or information \nsecurity risks, pointing to the need for clear\ \ guidelines for transparency and risk management regarding \nthe collection and\ \ use of third-party data for model inputs. Organizations may consider varying\ \ risk \ncontrols for foundation models, fine-tuned models, and embedded tools,\ \ enhanced processes for \ninteracting with external GAI technologies or service\ \ providers. Organizations can apply standard or \nexisting risk controls and\ \ processes to proprietary or open-source GAI technologies, data, and third-party\ \ \nservice providers, including acquisition and procurement due diligence, requests\ \ for software bills of \nmaterials (SBOMs), application of service level agreements\ \ (SLAs), and statement on standards for \nattestation engagement (SSAE) reports\ \ to help with third-party transparency and risk management for \nGAI systems.\ \ \nA.1.4. Pre-Deployment Testing \nOverview \nThe diverse ways and contexts in\ \ which GAI systems may be developed, used, and repurposed \ncomplicates risk\ \ mapping and pre-deployment measurement efforts. Robust test, evaluation, validation,\ \ \nand verification (TEVV) processes can be iteratively applied – and documented\ \ – in early stages of the AI \nlifecycle and informed by representative AI Actors\ \ (see Figure 3 of the AI RMF). Until new and rigorous" - "About AI at NIST: The National Institute of Standards and Technology (NIST) develops\ \ measurements, \ntechnology, tools, and standards to advance reliable, safe,\ \ transparent, explainable, privacy-enhanced, \nand fair artificial intelligence\ \ (AI) so that its full commercial and societal benefits can be realized without\ \ \nharm to people or the planet. NIST, which has conducted both fundamental and\ \ applied work on AI for \nmore than a decade, is also helping to fulfill the 2023\ \ Executive Order on Safe, Secure, and Trustworthy \nAI. NIST established the\ \ U.S. AI Safety Institute and the companion AI Safety Institute Consortium to\ \ \ncontinue the efforts set in motion by the E.O. to build the science necessary\ \ for safe, secure, and \ntrustworthy development and use of AI. \nAcknowledgments:\ \ This report was accomplished with the many helpful comments and contributions\ \ \nfrom the community, including the NIST Generative AI Public Working Group,\ \ and NIST staff and guest \nresearchers: Chloe Autio, Jesse Dunietz, Patrick Hall,\ \ Shomik Jain, Kamie Roberts, Reva Schwartz, Martin \nStanley, and Elham Tabassi.\ \ \nNIST Technical Series Policies \nCopyright, Use, and Licensing Statements\ \ \nNIST Technical Series Publication Identifier Syntax \nPublication History\ \ \nApproved by the NIST Editorial Review Board on 07-25-2024 \nContact Information\ \ \nai-inquiries@nist.gov \nNational Institute of Standards and Technology \n\ Attn: NIST AI Innovation Lab, Information Technology Laboratory \n100 Bureau Drive\ \ (Mail Stop 8900) Gaithersburg, MD 20899-8900 \nAdditional Information \nAdditional\ \ information about this publication and other NIST AI publications are available\ \ at \nhttps://airc.nist.gov/Home. \n \nDisclaimer: Certain commercial entities,\ \ equipment, or materials may be identified in this document in \norder to adequately\ \ describe an experimental procedure or concept. Such identification is not intended\ \ to \nimply recommendation or endorsement by the National Institute of Standards\ \ and Technology, nor is it \nintended to imply that the entities, materials,\ \ or equipment are necessarily the best available for the \npurpose. Any mention\ \ of commercial, non-profit, academic partners, or their products, or references\ \ is \nfor information only; it is not intended to imply endorsement or recommendation\ \ by any U.S. \nGovernment agency." - "8 \nTrustworthy AI Characteristics: Accountable and Transparent, Privacy Enhanced,\ \ Safe, Secure and \nResilient \n2.5. Environmental Impacts \nTraining, maintaining,\ \ and operating (running inference on) GAI systems are resource-intensive activities,\ \ \nwith potentially large energy and environmental footprints. Energy and carbon\ \ emissions vary based on \nwhat is being done with the GAI model (i.e., pre-training,\ \ fine-tuning, inference), the modality of the \ncontent, hardware used, and type\ \ of task or application. \nCurrent estimates suggest that training a single transformer\ \ LLM can emit as much carbon as 300 round-\ntrip flights between San Francisco\ \ and New York. In a study comparing energy consumption and carbon \nemissions\ \ for LLM inference, generative tasks (e.g., text summarization) were found to\ \ be more energy- \nand carbon-intensive than discriminative or non-generative\ \ tasks (e.g., text classification). \nMethods for creating smaller versions of\ \ trained models, such as model distillation or compression, \ncould reduce environmental\ \ impacts at inference time, but training and tuning such models may still \n\ contribute to their environmental impacts. Currently there is no agreed upon method\ \ to estimate \nenvironmental impacts from GAI. \nTrustworthy AI Characteristics:\ \ Accountable and Transparent, Safe \n2.6. Harmful Bias and Homogenization \n\ Bias exists in many forms and can become ingrained in automated systems. AI systems,\ \ including GAI \nsystems, can increase the speed and scale at which harmful biases\ \ manifest and are acted upon, \npotentially perpetuating and amplifying harms\ \ to individuals, groups, communities, organizations, and \nsociety. For example,\ \ when prompted to generate images of CEOs, doctors, lawyers, and judges, current\ \ \ntext-to-image models underrepresent women and/or racial minorities, and people\ \ with disabilities. \nImage generator models have also produced biased or stereotyped\ \ output for various demographic \ngroups and have difficulty producing non-stereotyped\ \ content even when the prompt specifically \nrequests image features that are\ \ inconsistent with the stereotypes. Harmful bias in GAI models, which \nmay stem\ \ from their training data, can also cause representational harms or perpetuate\ \ or exacerbate \nbias based on race, gender, disability, or other protected classes.\ \ \nHarmful bias in GAI systems can also lead to harms via disparities between\ \ how a model performs for \ndifferent subgroups or languages (e.g., an LLM may\ \ perform less well for non-English languages or \ncertain dialects). Such disparities\ \ can contribute to discriminatory decision-making or amplification of \nexisting\ \ societal biases. In addition, GAI systems may be inappropriately trusted to\ \ perform similarly \nacross all subgroups, which could leave the groups facing\ \ underperformance with worse outcomes than \nif no GAI system were used. Disparate\ \ or reduced performance for lower-resource languages also \npresents challenges\ \ to model adoption, inclusion, and accessibility, and may make preservation of\ \ \nendangered languages more difficult if GAI systems become embedded in everyday\ \ processes that would \notherwise have been opportunities to use these languages.\ \ \nBias is mutually reinforcing with the problem of undesired homogenization,\ \ in which GAI systems \nproduce skewed distributions of outputs that are overly\ \ uniform (for example, repetitive aesthetic styles" --- # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co./sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co./sentence-transformers/all-MiniLM-L6-v2) - **Maximum Sequence Length:** 256 tokens - **Output Dimensionality:** 384 tokens - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co./models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("danicafisher/dfisher-base-sentence-transformer") # Run inference sentences = [ 'How can organizations address risks associated with the use of third-party data for GAI model inputs?', '48 \n• Data protection \n• Data retention \n• Consistency in use of defining key terms \n• Decommissioning \n• Discouraging anonymous use \n• Education \n• Impact assessments \n• Incident response \n• Monitoring \n• Opt-outs \n• Risk-based controls \n• Risk mapping and measurement \n• Science-backed TEVV practices \n• Secure software development practices \n• Stakeholder engagement \n• Synthetic content detection and \nlabeling tools and techniques \n• Whistleblower protections \n• Workforce diversity and \ninterdisciplinary teams\nEstablishing acceptable use policies and guidance for the use of GAI in formal human-AI teaming settings \nas well as different levels of human-AI configurations can help to decrease risks arising from misuse, \nabuse, inappropriate repurpose, and misalignment between systems and users. These practices are just \none example of adapting existing governance protocols for GAI contexts. \nA.1.3. Third-Party Considerations \nOrganizations may seek to acquire, embed, incorporate, or use open-source or proprietary third-party \nGAI models, systems, or generated data for various applications across an enterprise. Use of these GAI \ntools and inputs has implications for all functions of the organization – including but not limited to \nacquisition, human resources, legal, compliance, and IT services – regardless of whether they are carried \nout by employees or third parties. Many of the actions cited above are relevant and options for \naddressing third-party considerations. \nThird party GAI integrations may give rise to increased intellectual property, data privacy, or information \nsecurity risks, pointing to the need for clear guidelines for transparency and risk management regarding \nthe collection and use of third-party data for model inputs. Organizations may consider varying risk \ncontrols for foundation models, fine-tuned models, and embedded tools, enhanced processes for \ninteracting with external GAI technologies or service providers. Organizations can apply standard or \nexisting risk controls and processes to proprietary or open-source GAI technologies, data, and third-party \nservice providers, including acquisition and procurement due diligence, requests for software bills of \nmaterials (SBOMs), application of service level agreements (SLAs), and statement on standards for \nattestation engagement (SSAE) reports to help with third-party transparency and risk management for \nGAI systems. \nA.1.4. Pre-Deployment Testing \nOverview \nThe diverse ways and contexts in which GAI systems may be developed, used, and repurposed \ncomplicates risk mapping and pre-deployment measurement efforts. Robust test, evaluation, validation, \nand verification (TEVV) processes can be iteratively applied – and documented – in early stages of the AI \nlifecycle and informed by representative AI Actors (see Figure 3 of the AI RMF). Until new and rigorous', '8 \nTrustworthy AI Characteristics: Accountable and Transparent, Privacy Enhanced, Safe, Secure and \nResilient \n2.5. Environmental Impacts \nTraining, maintaining, and operating (running inference on) GAI systems are resource-intensive activities, \nwith potentially large energy and environmental footprints. Energy and carbon emissions vary based on \nwhat is being done with the GAI model (i.e., pre-training, fine-tuning, inference), the modality of the \ncontent, hardware used, and type of task or application. \nCurrent estimates suggest that training a single transformer LLM can emit as much carbon as 300 round-\ntrip flights between San Francisco and New York. In a study comparing energy consumption and carbon \nemissions for LLM inference, generative tasks (e.g., text summarization) were found to be more energy- \nand carbon-intensive than discriminative or non-generative tasks (e.g., text classification). \nMethods for creating smaller versions of trained models, such as model distillation or compression, \ncould reduce environmental impacts at inference time, but training and tuning such models may still \ncontribute to their environmental impacts. Currently there is no agreed upon method to estimate \nenvironmental impacts from GAI. \nTrustworthy AI Characteristics: Accountable and Transparent, Safe \n2.6. Harmful Bias and Homogenization \nBias exists in many forms and can become ingrained in automated systems. AI systems, including GAI \nsystems, can increase the speed and scale at which harmful biases manifest and are acted upon, \npotentially perpetuating and amplifying harms to individuals, groups, communities, organizations, and \nsociety. For example, when prompted to generate images of CEOs, doctors, lawyers, and judges, current \ntext-to-image models underrepresent women and/or racial minorities, and people with disabilities. \nImage generator models have also produced biased or stereotyped output for various demographic \ngroups and have difficulty producing non-stereotyped content even when the prompt specifically \nrequests image features that are inconsistent with the stereotypes. Harmful bias in GAI models, which \nmay stem from their training data, can also cause representational harms or perpetuate or exacerbate \nbias based on race, gender, disability, or other protected classes. \nHarmful bias in GAI systems can also lead to harms via disparities between how a model performs for \ndifferent subgroups or languages (e.g., an LLM may perform less well for non-English languages or \ncertain dialects). Such disparities can contribute to discriminatory decision-making or amplification of \nexisting societal biases. In addition, GAI systems may be inappropriately trusted to perform similarly \nacross all subgroups, which could leave the groups facing underperformance with worse outcomes than \nif no GAI system were used. Disparate or reduced performance for lower-resource languages also \npresents challenges to model adoption, inclusion, and accessibility, and may make preservation of \nendangered languages more difficult if GAI systems become embedded in everyday processes that would \notherwise have been opportunities to use these languages. \nBias is mutually reinforcing with the problem of undesired homogenization, in which GAI systems \nproduce skewed distributions of outputs that are overly uniform (for example, repetitive aesthetic styles', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 384] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 128 training samples * Columns: sentence_0 and sentence_1 * Approximate statistics based on the first 128 samples: | | sentence_0 | sentence_1 | |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | sentence_0 | sentence_1 | |:-----------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What measures are suggested to assess the environmental impact of AI model training and management activities? | 37
MS-2.11-005
Assess the proportion of synthetic to non-synthetic training data and verify
training data is not overly homogenous or GAI-produced to mitigate concerns of
model collapse.
Harmful Bias and Homogenization
AI Actor Tasks: AI Deployment, AI Impact Assessment, Affected Individuals and Communities, Domain Experts, End-Users,
Operation and Monitoring, TEVV

MEASURE 2.12: Environmental impact and sustainability of AI model training and management activities – as identified in the MAP
function – are assessed and documented.
Action ID
Suggested Action
GAI Risks
MS-2.12-001 Assess safety to physical environments when deploying GAI systems.
Dangerous, Violent, or Hateful
Content
MS-2.12-002 Document anticipated environmental impacts of model development,
maintenance, and deployment in product design decisions.
Environmental
MS-2.12-003
Measure or estimate environmental impacts (e.g., energy and water
consumption) for training, fine tuning, and deploying models: Verify tradeoffs
between resources used at inference time versus additional resources required
at training time.
Environmental
MS-2.12-004 Verify effectiveness of carbon capture or offset programs for GAI training and
applications, and address green-washing concerns.
Environmental
AI Actor Tasks: AI Deployment, AI Impact Assessment, Domain Experts, Operation and Monitoring, TEVV
| | What are some limitations of current pre-deployment testing approaches for GAI applications? | 49
early lifecycle TEVV approaches are developed and matured for GAI, organizations may use
recommended “pre-deployment testing” practices to measure performance, capabilities, limits, risks,
and impacts. This section describes risk measurement and estimation as part of pre-deployment TEVV,
and examines the state of play for pre-deployment testing methodologies.
Limitations of Current Pre-deployment Test Approaches
Currently available pre-deployment TEVV processes used for GAI applications may be inadequate, non-
systematically applied, or fail to reflect or mismatched to deployment contexts. For example, the
anecdotal testing of GAI system capabilities through video games or standardized tests designed for
humans (e.g., intelligence tests, professional licensing exams) does not guarantee GAI system validity or
reliability in those domains. Similarly, jailbreaking or prompt engineering tests may not systematically
assess validity or reliability risks.
Measurement gaps can arise from mismatches between laboratory and real-world settings. Current
testing approaches often remain focused on laboratory conditions or restricted to benchmark test
datasets and in silico techniques that may not extrapolate well to—or directly assess GAI impacts in real-
world conditions. For example, current measurement gaps for GAI make it difficult to precisely estimate
its potential ecosystem-level or longitudinal risks and related political, social, and economic impacts.
Gaps between benchmarks and real-world use of GAI systems may likely be exacerbated due to prompt
sensitivity and broad heterogeneity of contexts of use.
A.1.5. Structured Public Feedback
Structured public feedback can be used to evaluate whether GAI systems are performing as intended
and to calibrate and verify traditional measurement methods. Examples of structured feedback include,
but are not limited to:

Participatory Engagement Methods: Methods used to solicit feedback from civil society groups,
affected communities, and users, including focus groups, small user studies, and surveys.

Field Testing: Methods used to determine how people interact with, consume, use, and make
sense of AI-generated information, and subsequent actions and effects, including UX, usability,
and other structured, randomized experiments.

AI Red-teaming: A structured testing exercise used to probe an AI system to find flaws and
vulnerabilities such as inaccurate, harmful, or discriminatory outputs, often in a controlled
environment and in collaboration with system developers.
Information gathered from structured public feedback can inform design, implementation, deployment
approval, maintenance, or decommissioning decisions. Results and insights gleaned from these exercises
can serve multiple purposes, including improving data quality and preprocessing, bolstering governance
decision making, and enhancing system documentation and debugging practices. When implementing
feedback activities, organizations should follow human subjects research requirements and best
practices such as informed consent and subject compensation.
| | How can organizations adjust their governance regimes to effectively manage the unique risks associated with generative AI? | 47
Appendix A. Primary GAI Considerations
The following primary considerations were derived as overarching themes from the GAI PWG
consultation process. These considerations (Governance, Pre-Deployment Testing, Content Provenance,
and Incident Disclosure) are relevant for voluntary use by any organization designing, developing, and
using GAI and also inform the Actions to Manage GAI risks. Information included about the primary
considerations is not exhaustive, but highlights the most relevant topics derived from the GAI PWG.
Acknowledgments: These considerations could not have been surfaced without the helpful analysis and
contributions from the community and NIST staff GAI PWG leads: George Awad, Luca Belli, Harold Booth,
Mat Heyman, Yooyoung Lee, Mark Pryzbocki, Reva Schwartz, Martin Stanley, and Kyra Yee.
A.1. Governance
A.1.1. Overview
Like any other technology system, governance principles and techniques can be used to manage risks
related to generative AI models, capabilities, and applications. Organizations may choose to apply their
existing risk tiering to GAI systems, or they may opt to revise or update AI system risk levels to address
these unique GAI risks. This section describes how organizational governance regimes may be re-
evaluated and adjusted for GAI contexts. It also addresses third-party considerations for governing across
the AI value chain.
A.1.2. Organizational Governance
GAI opportunities, risks and long-term performance characteristics are typically less well-understood
than non-generative AI tools and may be perceived and acted upon by humans in ways that vary greatly.
Accordingly, GAI may call for different levels of oversight from AI Actors or different human-AI
configurations in order to manage their risks effectively. Organizations’ use of GAI systems may also
warrant additional human review, tracking and documentation, and greater management oversight.
AI technology can produce varied outputs in multiple modalities and present many classes of user
interfaces. This leads to a broader set of AI Actors interacting with GAI systems for widely differing
applications and contexts of use. These can include data labeling and preparation, development of GAI
models, content moderation, code generation and review, text generation and editing, image and video
generation, summarization, search, and chat. These activities can take place within organizational
settings or in the public domain.
Organizations can restrict AI applications that cause harm, exceed stated risk tolerances, or that conflict
with their tolerances or values. Governance tools and protocols that are applied to other types of AI
systems can be applied to GAI systems. These plans and actions include:
• Accessibility and reasonable
accommodations
• AI actor credentials and qualifications
• Alignment to organizational values
• Auditing and assessment
• Change-management controls
• Commercial use
• Data provenance
| * Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim" } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `per_device_train_batch_size`: 20 - `per_device_eval_batch_size`: 20 - `num_train_epochs`: 10 - `multi_dataset_batch_sampler`: round_robin #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: no - `prediction_loss_only`: True - `per_device_train_batch_size`: 20 - `per_device_eval_batch_size`: 20 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1 - `num_train_epochs`: 10 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.0 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `eval_use_gather_object`: False - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: round_robin
### Framework Versions - Python: 3.10.12 - Sentence Transformers: 3.1.1 - Transformers: 4.44.2 - PyTorch: 2.4.1+cu121 - Accelerate: 0.34.2 - Datasets: 3.0.0 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```