lw2134's picture
Add new SentenceTransformer model.
197d3cd verified
metadata
base_model: Alibaba-NLP/gte-large-en-v1.5
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
  - dot_accuracy@1
  - dot_accuracy@3
  - dot_accuracy@5
  - dot_accuracy@10
  - dot_precision@1
  - dot_precision@3
  - dot_precision@5
  - dot_precision@10
  - dot_recall@1
  - dot_recall@3
  - dot_recall@5
  - dot_recall@10
  - dot_ndcg@10
  - dot_mrr@10
  - dot_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:586
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      Explain the spectrum of openness in AI systems as described in the
      document. How do open-source AI systems differ from fully closed AI
      systems in terms of accessibility and innovation?
    sentences:
      - "targets of cyber attacks; or\n\_ \_ \_ \_ \_\_(iii)\_ permitting the evasion of human control or oversight through\nmeans of deception or obfuscation.\nModels meet this definition even if they are provided to end users with\ntechnical safeguards that attempt to prevent users from taking advantage of\nthe relevant unsafe capabilities.\_\n\_ \_ \_(l)\_ The term “Federal law enforcement agency” has the meaning set forth\nin section 21(a) of Executive Order 14074 of May 25, 2022 (Advancing\nEffective, Accountable Policing and Criminal Justice Practices To Enhance\nPublic Trust and Public Safety).\n\_ \_ \_(m)\_ The term “floating-point operation” means any mathematical\noperation or assignment involving floating-point numbers, which are a\nsubset of the real numbers typically represented on computers by an integer\nof fixed precision scaled by an integer exponent of a fixed base.\n\_ \_ \_(n)\_ The term “foreign person” has the meaning set forth in section 5(c) of\nExecutive Order 13984 of January 19, 2021 (Taking Additional Steps To\nAddress the National Emergency With Respect to Significant Malicious\nCyber-Enabled Activities).\n\_ \_ \_(o)\_ The terms “foreign reseller” and “foreign reseller of United States\nInfrastructure as a Service Products” mean a foreign person who has\nestablished an Infrastructure as a Service Account to provide Infrastructure\nas a Service Products subsequently, in whole or in part, to a third party.\n\_ \_ \_(p)\_ The term “generative AI” means the class of AI models that emulate\nthe structure and characteristics of input data in order to generate derived\nsynthetic content.\_ This can include images, videos, audio, text, and other\ndigital content.\n\_ \_ \_(q)\_ The terms “Infrastructure as a Service Product,” “United States\nInfrastructure as a Service Product,” “United States Infrastructure as a\nService Provider,” and “Infrastructure as a Service Account” each have the\nrespective meanings given to those terms in section 5 of Executive Order\n13984.\n\_ \_ \_(r)\_ The term “integer operation” means any mathematical operation or\nassignment involving only integers, or whole numbers expressed without a\ndecimal point.05/10/2024, 16:36 Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence | The White House\nhttps://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artific… 7/59"
      - "AI safety, enable next-generation medical diagnoses and further other\ncritical AI priorities.\n\0\0 Released a  for designing safe, secure, and trustworthy AI tools\nfor use in education. The Department of Education’s guide discusses\nhow developers of educational technologies can design AI that benefits\nstudents and teachers while advancing equity, civil rights, trust, and\ntransparency. This work builds on the Department’s 2023 \noutlining recommendations for the use of AI in teaching and learning.\n\0\0 Published guidance on evaluating the eligibility of patent claims\ninvolving inventions related to AI technology,\_as well as other\nemerging technologies. The guidance by the U.S. Patent and Trademark\nOffice will guide those inventing in the AI space to protect their AI\ninventions and assist patent examiners reviewing applications for\npatents on AI inventions.\n\0\0 Issued a  on federal research and development (R&D) to\nadvance trustworthy AI over the past four years. The report by the\nNational Science and Technology Council examines an annual federal AI\nR&D budget of nearly $3 billion.\n\0\0 Launched a $23 million initiative to promote the use of privacy-\nenhancing technologies to solve real-world problems, including\nrelated to AI.\_Working with industry and agency partners, NSF will\ninvest through its new Privacy-preserving Data Sharing in Practice\nprogram in efforts to apply, mature, and scale privacy-enhancing\ntechnologies for specific use cases and establish testbeds to accelerate\ntheir adoption.\n\0\0 Announced millions of dollars in further investments to advance\nresponsible AI development and use throughout our society. These\ninclude $30 million invested through NSF’s Experiential Learning in\nEmerging and Novel Technologies program—which supports inclusive\nexperiential learning in fields like AI—and $10 million through NSF’s\nExpandAI program, which helps build capacity in AI research at\nminority-serving institutions while fostering the development of a\ndiverse, AI-ready workforce.\nAdvancing U.S. Leadership Abroad\nPresident Biden’s Executive Order emphasized that the United States lead\nglobal efforts to unlock AI’s potential and meet its challenges. To advance\nU.S. leadership on AI, agencies have:guide\nreport\nreport05/10/2024, 16:35 FACT SHEET: Biden-Harris Administration Announces New AI Actions and Receives Additional Major Voluntary Commitment on AI | The…\nhttps://www.whitehouse.gov/briefing-room/statements-releases/2024/07/26/fact-sheet-biden-harris-administration-announces-new-ai-actions-and-receives-addit… 4/10"
      - >-
        50   Governing AI for Humanity  processes such as the recent scientific
        report 

        on the risks of advanced AI commissioned by 

        the United Kingdom,25 and relevant regional 

        organizations.

        e. A steering committee would develop a research 

        agenda ensuring the inclusivity of views and 

        incorporation of ethical considerations, oversee 

        the allocation of resources, foster collaboration 

        with a network of academic institutions and 

        other stakeholders, and review the panel’s 

        activities and deliverables.100 By drawing on the unique convening power
        of the 

        United Nations and inclusive global reach across 

        stakeholder groups, an international scientific panel 

        can deliver trusted scientific collaboration processes 

        and outputs and correct information asymmetries 

        in ways that address the representation and 

        coordination gaps identified in paragraphs 66 and 

        73, thereby promoting equitable and effective 

        international AI governance.

        Among the topics discussed in our consultations was the ongoing debate
        over open versus closed AI systems. 

        AI systems that are open in varying degrees are often referred to as
        “open-source AI”, but this is somewhat of a 

        misnomer when compared with open-source software (code). It is important
        to recognize that openness in AI 

        systems is more of a spectrum than a single attribute.

        One article explained that a “fully closed AI system is only accessible
        to a particular group. It could be an AI 

        developer company or a specific group within it, mainly for internal
        research and development purposes. On the 

        other hand, more open systems may allow public access or make available
        certain parts, such as data, code, or 

        model characteristics, to facilitate external AI development.”a

        Open-source AI systems in the generative AI field present both risks and
        opportunities. Companies often cite “AI 

        safety” as a reason for not disclosing system specifications, reflecting
        the ongoing tension between open and 

        closed approaches in the industry. Debates typically revolve around two
        extremes: full openness, which entails 

        sharing all model components and data sets; and partial openness, which
        involves disclosing only model weights. 

        Open-source AI systems encourage innovation and are often a requirement
        for public funding. On the open 

        extreme of the spectrum, when the underlying code is made freely
        available, developers around the world can 

        experiment, improve and create new applications. This fosters a
        collaborative environment where ideas and 

        expertise are readily shared. Some industry leaders argue that this
        openness is vital to innovation and economic 

        growth.

        However, in most cases, open-source AI models are available as
        application programming interfaces. In this case, 

        the original code is not shared, the original weights are never changed
        and model updates become new models. 

        Additionally, open-source models tend to be smaller and more
        transparent. This transparency can build trust, 

        allow for ethical considerations to be proactively addressed, and
        support validation and replication because users 

        can examine the inner workings of the AI system, understand its
        decision-making process and identify potential 

        biases.Box 9: Open versus closed AI systems

        a Angela Luna, “The open or closed AI dilemma”, 2 May 2024. Available at
        https://bipartisanpolicy.org/blog/the-open-or-closed-ai-dilemma .

        25 International Scientific Report on the Safety of Advanced AI: Interim
        Report. Available at
        https://gov.uk/government/publications/international-scientific-report-

        on-the-safety-of-advanced-ai .
  - source_sentence: >-
      What role does the report propose for the United Nations in establishing a
      governance regime for AI, and how does it envision this regime
      contributing to a new social contract that protects vulnerable
      populations?
    sentences:
      - >-
        HUMAN ALTERNATIVES, 

        CONSIDERATION, AND 

        FALLBACK 

        HOW THESE PRINCIPLES CAN MOVE INTO PRACTICE

        Real-life examples of how these principles can become reality, through
        laws, policies, and practical 

        technical and sociotechnical approaches to protecting rights,
        opportunities, and access. 

        Healthcare “navigators” help people find their way through online signup
        forms to choose 

        and obtain healthcare. A Navigator is “an individual or organization
        that's trained and able to help 

        consumers, small businesses, and their employees as they look for health
        coverage options through the 

        Marketplace (a government web site), including completing eligibility
        and enrollment forms.”106 For 

        the 2022 plan year, the Biden-Harris Administration increased funding so
        that grantee organizations could 

        “train and certify more than 1,500 Navigators to help uninsured
        consumers find affordable and comprehensive 

        health coverage. ”107

        The customer service industry has successfully integrated automated
        services such as 

        chat-bots and AI-driven call response systems with escalation to a human
        support team.

        108 Many businesses now use partially automated customer service
        platforms that help answer customer 

        questions and compile common problems for human agents to review. These
        integrated human-AI 

        systems allow companies to provide faster customer care while
        maintaining human agents to answer 

        calls or otherwise respond to complicated requests. Using both AI and
        human agents is viewed as key to 

        successful customer service.109

        Ballot curing laws in at least 24 states require a fallback system that
        allows voters to 

        correct their ballot and have it counted in the case that a voter
        signature matching algorithm incorrectly flags their ballot as invalid
        or there is another issue with their ballot, and review by an election
        official does not rectify the problem. Some federal courts have found
        that such cure procedures are constitutionally required.

        110 Ballot 

        curing processes vary among states, and include direct phone calls,
        emails, or mail contact by election 

        officials.111 Voters are asked to provide alternative information or a
        new signature to verify the validity of their 

        ballot. 

        52
      - >-
        SECTION  TITLE

        HUMAN  ALTERNATIVES , C ONSIDERATION , AND FALLBACK

        You should be able to opt out, where appropriate, and have access to a
        person who can quickly 

        consider and remedy problems you encounter. You should be able to opt
        out from automated systems in 

        favor of a human alternative, where appropriate. Appropriateness should
        be determined based on reasonable expectations in a given context and
        with a focus on ensuring broad accessibility and protecting the public
        from especially harmful impacts. In some cases, a human or other
        alternative may be required by law. You should have access to timely
        human consideration and remedy by a fallback and escalation process if
        an automated system fails, it produces an error, or you would like to
        appeal or contest its impacts on you. Human consideration and fallback
        should be accessible, equitable, effective, maintained, accompanied by
        appropriate operator training, and should not impose an unreasonable
        burden on the public. Automated systems with an intended use within
        sensi

        -

        tive domains, including, but not limited to, criminal justice,
        employment, education, and health, should additional -

        ly be tailored to the purpose, provide meaningful access for oversight,
        include training for any people interacting with the system, and
        incorporate human consideration for adverse or high-risk decisions.
        Reporting that includes a description of these human governance
        processes and assessment of their timeliness, accessibility, outcomes,
        and effectiveness should be made public whenever possible. 

        Definitions for key terms in The Blueprint for an AI Bill of Rights can
        be found in Applying the Blueprint for an AI Bill of Rights. 

        Accompanying analysis and tools for actualizing each principle can be
        found in the Technical Companion. 

        7
      - |-
        Final Report    21E. Reflections on institutional 
        models
        lxiv Discussions about AI often resolve into extremes. 
        In our consultations around the world, we engaged 
        with those who see a future of boundless goods 
        provided by ever-cheaper, ever-more-helpful AI 
        systems. We also spoke with those wary of darker 
        futures, of division and unemployment, and even 
        extinction.8
        lxv We do not know whether the utopian or dystopian 
        future is more likely. Equally, we are mindful that 
        the technology may go in a direction that does 
        away with this duality. This report focuses on 
        the near-term opportunities and risks, based on 
        science and grounded in fact. 
        lxvi The seven recommendations outlined above offer 
        our best hope for reaping the benefits of AI, while 
        minimizing and mitigating the risks, as AI continues 
        evolving. We are also mindful of the practical 
        challenges to international institution-building 
        on a larger scale. This is why we are proposing a 
        networked institutional approach, with light and 
        agile support. If or when risks become more acute 
        and the stakes for opportunities escalate, such 
        calculations may change. 
        lxvii The world wars led to the modern international 
        system; the development of ever-more-powerful 
        chemical, biological and nuclear weapons led 
        to regimes limiting their spread and promoting 
        peaceful uses of the underlying technologies. 
        Evolving understanding of our common humanity 
        led to the modern human rights system and our 
        ongoing commitment to the SDGs for all. Climate 
        change evolved from a niche concern to a global 
        challenge.lxviii  AI may similarly rise to a level that requires more 
        resources and more authority than is proposed 
        in the above-mentioned recommendations, 
        into harder functions of norm elaboration, 
        implementation, monitoring, verification and 
        validation, enforcement, accountability, remedies 
        for harm and emergency responses. Reflecting on 
        such institutional models, therefore, is prudent. The 
        final section of this report seeks to contribute to 
        that effort.
        4. A call to action
        lxix We remain optimistic about the future with AI and 
        its positive potential. That optimism depends, 
        however, on realism about the risks and the 
        inadequacy of structures and incentives currently 
        in place. The technology is too important, and the 
        stakes are too high, to rely only on market forces 
        and a fragmented patchwork of national and 
        multilateral action.
        lxx The United Nations can be the vehicle for a new 
        social contract for AI that ensures global buy-
        in for a governance regime which protects and 
        empowers us all. Such a social contract will ensure 
        that opportunities are fairly distributed, and the 
        risks are not loaded on to the most vulnerable – or 
        passed on to future generations, as we have seen, 
        tragically, with climate change.
        lxxi As a group and as individuals from across many 
        fields of expertise, organizations and parts of the 
        world, we look forward to continuing this crucial 
        conversation. Together with the many others we 
        have connected with on this journey, and the global 
        community they represent, we hope that this report 
        contributes to our combined efforts to govern AI 
        for humanity.
        8   See https://safe.ai/work/statement-on-ai-risk .
  - source_sentence: >-
      What are the potential consequences of coordination gaps between various
      AI governance initiatives, as highlighted in the context information?
    sentences:
      - |-
        44   Governing AI for Humanity  B. Coordination gaps
        72 The ongoing emergence and evolution of AI 
        governance initiatives are not guaranteed to 
        work together effectively for humanity. Instead, 
        coordination gaps have appeared. Effective 
        handshaking between the selective plurilateral 
        initiatives (see fig. 8) and other regional initiatives is 
        not assured, risking incompatibility between regions.
        73 Nor are there global mechanisms for all international 
        standards development organizations (see fig. 7), 
        international scientific research initiatives or AI 
        capacity-building initiatives to coordinate with each 
        other, undermining interoperability of approaches 
        and resulting in fragmentation. The resulting 
        coordination gaps between various sub-global 
        initiatives are in some cases best addressed at the 
        global level.
        74 A separate set of coordination gaps arise within 
        the United Nations system, reflected in the array of 
        diverse United Nations documents and initiatives 
        in relation to AI. Figure 9 shows 27 United Nations-
        related instruments in specific domains that may 
        apply to AI – 23 of them are binding and will require 
        interpretation as they pertain to AI. A further 29 
        domain-level documents from the United Nations 
        and related organizations focus specifically on AI, 
        none of which are binding.17 In some cases, these 
        can address AI risks and harness AI benefits in 
        specific domains.75 The level of activity shows the importance of AI 
        to United Nations programmes. As AI expands to 
        affect ever-wider aspects of society, there will be 
        growing calls for diverse parts of the United Nations 
        system to act, including through binding norms. 
        It also shows the ad hoc nature of the responses, 
        which have largely developed organically in specific 
        domains and without an overarching strategy. The 
        resulting coordination gaps invite overlaps and 
        hinder interoperability and impact.
        76 The number and diversity of approaches are a sign 
        that the United Nations system is responding to 
        an emerging issue. With proper orchestration, and 
        in combination with processes taking a holistic 
        approach, these efforts can offer an efficient and 
        sustainable pathway to inclusive international AI 
        governance in specific domains. This could enable 
        meaningful, harmonized and coordinated impacts 
        on areas such as health, education, technical 
        standards and ethics, instead of merely contributing 
        to the proliferation of initiatives and institutions 
        in this growing field. International law, including 
        international human rights law, provides a shared 
        normative foundation for all AI-related efforts, 
        thereby facilitating coordination and coherence.
      - "\0\0 Issued a comprehensive plan for U.S. engagement on global AI\nstandards.\_The plan, developed by the NIST, incorporates broad public\nand private-sector input, identifies objectives and priority areas for AI\nstandards work, and lays out actions for U.S. stakeholders including U.S.\nagencies. NIST and others agencies will report on priority actions in 180\ndays.\_\n\0\0 Developed  for managing risks to human rights posed by AI.\nThe Department of State’s “Risk Management Profile for AI and Human\nRights”—developed in close coordination with NIST and the U.S. Agency\nfor International Development—recommends actions based on the NIST\nAI Risk Management Framework to governments, the private sector, and\ncivil society worldwide, to identify and manage risks to human rights\narising from the design, development, deployment, and use of AI.\_\n\0\0 Launched a global network of AI Safety Institutes and other\ngovernment-backed scientific offices to advance AI safety at a technical\nlevel.\_This network will accelerate critical information exchange and\ndrive toward common or compatible safety evaluations and policies.\n\0\0 Launched a landmark United Nations General Assembly resolution.\nThe unanimously adopted resolution, with more than 100 co-sponsors,\nlays out a common vision for countries around the world to promote the\nsafe and secure use of AI to address global challenges.\n\0\0 Expanded global support for the U.S.-led Political Declaration on the\nResponsible Military Use of Artificial Intelligence and\nAutonomy.\_\_Fifty-five nations now endorse the political declaration,\nwhich outlines a set of norms for the responsible development,\ndeployment, and use of military AI capabilities.\nThe Table below summarizes many of the activities that federal agencies\nhave completed in response to the Executive Order:guidance05/10/2024, 16:35 FACT SHEET: Biden-Harris Administration Announces New AI Actions and Receives Additional Major Voluntary Commitment on AI | The…\nhttps://www.whitehouse.gov/briefing-room/statements-releases/2024/07/26/fact-sheet-biden-harris-administration-announces-new-ai-actions-and-receives-addit… 5/10"
      - >-
        Final Report    55f. In addition, diverse stakeholders – in particular 

        technology companies and civil society 

        representatives  could be invited to engage 

        through existing institutions detailed below, as 

        well as policy workshops on particular aspects 

        of AI governance such as limits (if any) of open-

        source approaches to the most advanced forms 

        of AI, thresholds for tracking and reporting of 

        AI incidents, application of human rights law to 

        novel use cases, or the use of competition law/

        antitrust to address concentrations of power 

        among technology companies.30

        g. The proposed AI office could also curate a 

        repository of AI governance examples, including 

        legislation, policies and institutions from 

        around the world for consideration of the policy 

        dialogue, working with existing efforts, such as 

        OECD.

        109 Notwithstanding the two General Assembly 

        resolutions on AI in 2024, there is currently 

        no mandated institutionalized dialogue on 

        AI governance at the United Nations that 

        corresponds to the reliably inclusive vision of this 

        recommendation. Similar processes do exist at 

        the international level, but primarily in regional or 

        plurilateral constellations (para. 57), which are not 

        reliably inclusive and global.

        110 Complementing a fluid process of plurilateral and 

        regional AI summits,31 the United Nations can 

        offer a stable home for dialogue on AI governance. 

        Inclusion by design  a crucial requirement for 

        playing a stabilizing role in geopolitically delicate 

        times  can also address representation and 

        coordination gaps identified in paragraphs 64 and 

        72, promoting more effective collective action on AI 

        governance in the common interest of all countries.  AI standards
        exchange  
         
        Recommendation 3: AI standards exchange  
         
        We recommend the creation of an AI standards 

        exchange, bringing together representatives from 

        national and international standard-development 

        organizations, technology companies, civil society 

        and representatives from the international scientific 

        panel. It would be tasked with:

        a. Developing and maintaining a register of 

        definitions and applicable standards for 

        measuring and evaluating AI systems;

        b. Debating and evaluating the standards and the 

        processes for creating them; and

        c. Identifying gaps where new standards are 

        needed.

        111 When AI systems were first explored, few standards 

        existed to help to navigate or measure this new 

        frontier. The Turing Test  of whether a machine can 

        exhibit behaviour equivalent to (or indistinguishable 

        from) a human being  captured the popular 

        imagination, but is of more cultural than scientific 

        significance. Indeed, it is telling that some of 

        the greatest computational advances have been 

        measured by their success in games, such as when 

        a computer could beat humans at chess, Go, poker 

        or Jeopardy. Such measures were easily understood 

        by non-specialists, but were neither rigorous nor 

        particularly scientific.

        112 More recently, there has been a proliferation of 

        standards. Figure 13 illustrates the increasing 

        number of relevant standards adopted by ITU, the 

        International Organization for Standardization (ISO), 

        the International Electrotechnical Commission 

        (IEC) and the Institute of Electrical and Electronics 

        Engineers (IEEE).32

        30 Such a gathering could also provide an opportunity for
        multi-stakeholder debate of any hardening of the global governance of
        AI. These might include, for 

        example, prohibitions on the development of uncontainable or
        uncontrollable AI systems, or requirements that all AI systems be
        sufficiently transparent so that 

        their consequences can be traced back to a legal actor that can assume
        responsibility for them.

        31 Although multiple AI summits have helped a subset of 20–30 countries
        to align on AI safety issues, participation has been inconsistent:
        Brazil, China and 

        Ireland endorsed the Bletchley Declaration in November 2023, but not the
        Seoul Ministerial Statement six months later (see fig. 12). Conversely,
        Mexico and 

        New Zealand endorsed the Seoul Ministerial Statement, but did not
        endorse the Bletchley Declaration.

        32 Many new standards are also emerging at the national and
        multinational levels, such as the United States White House Voluntary AI
        Commitments and the 

        European Union Codes of Practice for the AI Act.
  - source_sentence: >-
      Describe the minimum set of criteria that should be included in the
      incident reporting process for GAI systems, according to the
      organizational practices established for identifying incidents.
    sentences:
      - >-
        APPENDIX

        Summaries of Additional Engagements: 

        •OSTP created an email address ( [email protected] ) to solicit
        comments from the public on the use of

        artificial intelligence and other data-driven technologies in their
        lives.

        •OSTP issued a Request For Information (RFI) on the use and governance
        of biometric technologies.113 The

        purpose of this RFI was to understand the extent and variety of
        biometric technologies in past, current, or

        planned use; the domains in which these technologies are being used; the
        entities making use of them; currentprinciples, practices, or policies
        governing their use; and the stakeholders that are, or may be, impacted
        by theiruse or regulation. The 130 responses to this RFI are available
        in full online

        114 and were submitted by the below

        listed organizations and individuals:

        Accenture 

        Access Now ACT | The App Association AHIP 

        AIethicist.org 

        Airlines for America Alliance for Automotive Innovation Amelia
        Winger-Bearskin American Civil Liberties Union American Civil Liberties
        Union of Massachusetts American Medical Association ARTICLE19 Attorneys
        General of the District of Columbia, Illinois, Maryland, Michigan,
        Minnesota, New York, North Carolina, Oregon, Vermont, and Washington
        Avanade Aware Barbara Evans Better Identity Coalition Bipartisan Policy
        Center Brandon L. Garrett and Cynthia Rudin Brian Krupp Brooklyn
        Defender Services BSA | The Software Alliance Carnegie Mellon University
        Center for Democracy & Technology Center for New Democratic Processes
        Center for Research and Education on Accessible Technology and
        Experiences at University of Washington, Devva Kasnitz, L Jean Camp,
        Jonathan Lazar, Harry Hochheiser Center on Privacy & Technology at
        Georgetown Law Cisco Systems City of Portland Smart City PDX Program
        CLEAR Clearview AI Cognoa Color of Change Common Sense Media Computing
        Community Consortium at Computing Research Association Connected Health
        Initiative Consumer Technology Association Courtney Radsch Coworker
        Cyber Farm Labs Data & Society Research Institute Data for Black Lives
        Data to Actionable Knowledge Lab at Harvard University Deloitte Dev
        Technology Group Digital Therapeutics Alliance Digital Welfare State &
        Human Rights Project and Center for Human Rights and Global Justice at
        New York University School of Law, and Temple University Institute for
        Law, Innovation & Technology Dignari Douglas Goddard Edgar Dworsky
        Electronic Frontier Foundation Electronic Privacy Information Center,
        Center for Digital Democracy, and Consumer Federation of America FaceTec
        Fight for the Future Ganesh Mani Georgia Tech Research Institute Google
        Health Information Technology Research and Development Interagency
        Working Group HireVue HR Policy Association ID.me Identity and Data
        Sciences Laboratory at Science Applications International Corporation
        Information Technology and Innovation Foundation Information Technology
        Industry Council Innocence Project Institute for Human-Centered
        Artificial Intelligence at Stanford University Integrated Justice
        Information Systems Institute International Association of Chiefs of
        Police International Biometrics + Identity Association International
        Business Machines Corporation International Committee of the Red Cross
        Inventionphysics iProov Jacob Boudreau Jennifer K. Wagner, Dan Berger,
        Margaret Hu, and Sara Katsanis Jonathan Barry-Blocker Joseph Turow Joy
        Buolamwini Joy Mack Karen Bureau Lamont Gholston Lawyers’ Committee for
        Civil Rights Under Law 

        60
      - >-
        19 GV-4.1-003 Establish policies, procedures, and processes for
        oversight functions (e.g., senior 

        leadership, legal, compliance, including internal evaluation ) across
        the GAI 

        lifecycle, from problem formulation and supply chains to system
        decommission.  Value Chain and Component 

        Integration  

        AI Actor Tasks:  AI Deployment, AI Design, AI Development, Operation and
        Monitoring  
         
        GOVERN 4.2:  Organizational teams document the risks and potential
        impacts of the AI technology they design, develop, deploy, 

        evaluate, and use, and they communicate about the impacts more
        broadly.  

        Action ID  Suggested Action  GAI Risks  

        GV-4.2-001 Establish terms of use and terms of service  for GAI systems
        . Intellectual Property ; Dangerous , 

        Violent, or Hateful Content ; 

        Obscene, Degrading, and/or 

        Abusive Content  

        GV-4.2-002 Include relevant AI Actors in the GAI system risk
        identification process.  Human -AI Configuration  

        GV-4.2-0 03 Verify that downstream GAI system impacts (such as the use
        of third -party 

        plugins) are included in the impact documentation process.  Value Chain
        and Component 

        Integration  

        AI Actor Tasks:  AI Deployment, AI Design, AI Development, Operation and
        Monitoring  
         
        GOVERN 4.3: Organizational practices are in place to enable AI testing,
        identification of incidents, and information sharing.  

        Action ID  Suggested Action  GAI Risks  

        GV4.3-- 001 Establish policies for measuring the effectiveness of
        employed  content 

        provenance methodologies (e.g., cryptography, watermarking,
        steganography, etc.) Information Integrity  

        GV-4.3-002 Establish o rganizational  practices to  identify the minimum
        set of criteria 

        necessary for GAI system incident reporting such as: System ID (auto
        -generated 

        most likely), Title, Reporter, System/Source, Data Reported, Date of
        Incident, Description, Impact(s), Stakeholder(s) Impacted.  Information
        Security
      - >-
        72   Governing AI for Humanity  Box 15: Possible functions and
        first-year deliverables of the AI office

        The AI office should have a light structure and aim to be agile, trusted
        and networked. Where necessary, it should 

        operate in a “hub and spoke” manner to connect to other parts of the
        United Nations system and beyond.

        Outreach could include serving as a key node in a so-called soft
        coordination architecture between Member 

        States, plurilateral networks, civil society organizations, academia and
        technology companies in a regime complex 

        that weaves together to solve problems collaboratively through
        networking, and as a safe, trusted place to 

        convene on relevant topics. Ambitiously, it could become the glue that
        helps to hold such other evolving networks 

        together.

        Supporting the various initiatives proposed in this report includes the
        important function of ensuring inclusiveness 

        at speed in delivering outputs such as scientific reports, governance
        dialogue and identifying appropriate follow-

        up entities.

        Common understanding :

         Facilitate recruitment of and support the international scientific
        panel.

        Common ground :

         Service policy dialogues with multi-stakeholder inputs in support of
        interoperability and policy learning. 

        An initial priority topic is the articulation of risk thresholds and
        safety frameworks across jurisdictions

         Support ITU, ISO/IEC and IEEE on setting up the AI standards exchange.

        Common benefits :

         Support the AI capacity development network with an initial focus on
        building public interest AI capacity 

        among public officials and social entrepreneurs. Define the initial
        network vision, outcomes, go vernance 

        structure, partnerships and operational mechanisms.

         Define the vision, outcomes, governance structure and operational
        mechanisms for the global fund for AI, 

        and seek feedback from Member States, industry and civil society
        stakeholders on the proposal, with a 

        view to funding initial projects within six months of establishment.

         Prepare and publish an annual list of prioritized investment areas to
        guide both the global fund for AI and 

        investments outside that structure.

        Coherent effort :

         Establish lightweight mechanisms that support Member States and other
        relevant organizations to be 

        more connected, coordinated and effective in pursuing their global AI
        governance efforts.

         Prepare initial frameworks to guide and monitor the AI office’s work,
        including a global governance risk 

        taxonomy, a global AI policy landscape review and a global stakeholder
        map.

         Develop and implement quarterly reporting and periodic in-person
        presentations to Member States on 

        the AI office’s progress against its workplan and establish feedback
        channels to support adjustments as 

        needed.

         Establish a steering committee jointly led by the AI office, ITU, UNC
        TAD, UNESCO and other relevant 

        United Nations entities and organizations to accelerate the work of the
        United Nations in service of the 

        functions above, and review progress of the accelerated efforts every
        three months.

         Promote joint learning and development opportunities for Member State
        representatives to support them 

        to carry out their responsibilities for global AI governance, in
        cooperation with relevant United Nations 

        entities and organizations such as the United Nations Institute for
        Training and Research and the United 

        Nations University.
  - source_sentence: >-
      What are some of the legal frameworks mentioned in the context that aim to
      protect personal information, and how do they relate to data privacy
      concerns?
    sentences:
      - >-
        NOTICE & 

        EXPLANATION 

        WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS

        The expectations for automated systems are meant to serve as a blueprint
        for the development of additional 

        technical standards and practices that are tailored for particular
        sectors and contexts. 

        Tailored to the level of risk. An assessment should be done to determine
        the level of risk of the auto -

        mated system. In settings where the consequences are high as determined
        by a risk assessment, or extensive 

        oversight is expected (e.g., in criminal justice or some public sector
        settings), explanatory mechanisms should be built into the system design
        so that the system’s full behavior can be explained in advance (i.e.,
        only fully transparent models should be used), rather than as an
        after-the-decision interpretation. In other settings, the extent of
        explanation provided should be tailored to the risk level. 

        Valid. The explanation provided by a system should accurately reflect
        the factors and the influences that led 

        to a particular decision, and should be meaningful for the particular
        customization based on purpose, target, and level of risk. While
        approximation and simplification may be necessary for the system to
        succeed based on the explanatory purpose and target of the explanation,
        or to account for the risk of fraud or other concerns related to
        revealing decision-making information, such simplifications should be
        done in a scientifically supportable way. Where appropriate based on the
        explanatory system, error ranges for the explanation should be
        calculated and included in the explanation, with the choice of
        presentation of such information balanced with usability and overall
        interface complexity concerns. 

        Demonstrate protections for notice and explanation 

        Reporting. Summary reporting should document the determinations made
        based on the above consider -

        ations, including: the responsible entities for accountability purposes;
        the goal and use cases for the system, identified users, and impacted
        populations; the assessment of notice clarity and timeliness; the
        assessment of the explanation's validity and accessibility; the
        assessment of the level of risk; and the account and assessment of how
        explanations are tailored, including to the purpose, the recipient of
        the explanation, and the level of risk. Individualized profile
        information should be made readily available to the greatest extent
        possible that includes explanations for any system impacts or
        inferences. Reporting should be provided in a clear plain language and
        machine-readable manner. 

        44
      - >-
        25 MP-2.3-002 Review and document accuracy, representativeness,
        relevance, suitability of data 

        used at different stages of AI life cycle.  Harmful Bias and
        Homogenization ; 

        Intellectual Property  

        MP-2.3-003 Deploy and document fact -checking techniques to verify the
        accuracy and 

        veracity of information generated by GAI systems, especially when the 

        information comes from multiple (or unknown) sources.  Information
        Integrity  

        MP-2.3-004 Develop and implement testing techniques to identify GAI
        produced content (e.g., synthetic media) that might be indistinguishable
        from human -generated content.  Information Integrity  

        MP-2.3-005 Implement plans for GAI systems to undergo regular
        adversarial testing to identify 

        vulnerabilities and potential manipulation or misuse.  Information
        Security  

        AI Actor Tasks:  AI Development, Domain Experts, TEVV  
         
        MAP 3.4:  Processes for operator and practitioner proficiency with AI
        system performance and trustworthiness  and relevant 

        technical standards and certifications  are defined, assessed, and
        documented.  

        Action ID  Suggested Action  GAI Risks  

        MP-3.4-001 Evaluate whether GAI operators and end -users can accurately
        understand 

        content lineage and origin.  Human -AI Configuration ; 

        Information Integrity  

        MP-3.4-002 Adapt existing training programs to include modules on
        digital content 

        transparency.  Information Integrity  

        MP-3.4-003 Develop certification programs that test proficiency in
        managing GAI risks and 

        interpreting content provenance, relevant to specific industry and
        context.  Information Integrity  

        MP-3.4-004 Delineate human proficiency tests from tests of GAI
        capabilities.  Human -AI Configuration  

        MP-3.4-005 Implement systems to continually monitor and track the
        outcomes of human- GAI 

        configurations for future refinement and improvements . Human -AI
        Configuration ; 

        Information Integrity  

        MP-3.4-006 Involve the end -users, practitioners, and operators in GAI
        system in prototyping 

        and testing activities. Make sure these tests cover various scenarios ,
        such as crisis 

        situations or ethically sensitive contexts.  Human -AI Configuration ; 

        Information Integrity ; Harmful Bias 

        and Homogenization ; Dangerous , 

        Violent, or Hateful Content  

        AI Actor Tasks: AI Design, AI Development, Domain Experts, End -Users,
        Human Factors, Operation and Monitoring
      - >-
        65. See, e.g., Scott Ikeda. Major Data Broker Exposes 235 Million Social
        Media Profiles in Data Lead: Info

        Appears to Have Been Scraped Without Permission. CPO Magazine. Aug. 28,
        2020. https://

        www.cpomagazine.com/cyber-security/major-data-broker-exposes-235-million-social-media-profiles-

        in-data-leak/; Lily Hay Newman. 1.2 Billion Records Found Exposed Online
        in a Single Server . WIRED,

        Nov. 22, 2019.
        https://www.wired.com/story/billion-records-exposed-online/

        66.Lola Fadulu. Facial Recognition Technology in Public Housing Prompts
        Backlash . New York Times.

        Sept. 24, 2019.

        https://www.nytimes.com/2019/09/24/us/politics/facial-recognition-technology-housing.html

        67. Jo Constantz. ‘They Were Spying On Us’: Amazon, Walmart, Use
        Surveillance Technology to Bust

        Unions. Newsweek. Dec. 13, 2021.

        https://www.newsweek.com/they-were-spying-us-amazon-walmart-use-surveillance-technology-bust-

        unions-1658603

        68. See, e.g., enforcement actions by the FTC against the photo storage
        app Everalbaum

        (https://www.ftc.gov/legal-library/browse/cases-proceedings/192-3172-everalbum-inc-matter),
        and

        against Weight Watchers and their subsidiary
        Kurbo(https://www.ftc.gov/legal-library/browse/cases-proceedings/1923228-weight-watchersww)

        69. See, e.g., HIPAA, Pub. L 104-191 (1996); Fair Debt Collection
        Practices Act (FDCPA), Pub. L. 95-109

        (1977); Family Educational Rights and Privacy Act (FERPA) (20 U.S.C. §
        1232g), Children's Online

        Privacy Protection Act of 1998, 15 U.S.C. 6501–6505, and Confidential
        Information Protection andStatistical Efficiency Act (CIPSEA) (116 Stat.
        2899)

        70. Marshall Allen. You Snooze, You Lose: Insurers Make The Old Adage
        Literally True . ProPublica. Nov.

        21, 2018.

        https://www.propublica.org/article/you-snooze-you-lose-insurers-make-the-old-adage-literally-true

        71.Charles Duhigg. How Companies Learn Your Secrets. The New York Times.
        Feb. 16, 2012.

        https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html72. Jack
        Gillum and Jeff Kao. Aggression Detectors: The Unproven, Invasive
        Surveillance Technology

        Schools are Using to Monitor Students. ProPublica. Jun. 25, 2019.

        https://features.propublica.org/aggression-detector/the-unproven-invasive-surveillance-technology-

        schools-are-using-to-monitor-students/

        73.Drew Harwell. Cheating-detection companies made millions during the
        pandemic. Now students are

        fighting back. Washington Post. Nov. 12, 2020.

        https://www.washingtonpost.com/technology/2020/11/12/test-monitoring-student-revolt/

        74. See, e.g., Heather Morrison. Virtual Testing Puts Disabled Students
        at a Disadvantage. Government

        Technology. May 24, 2022.

        https://www.govtech.com/education/k-12/virtual-testing-puts-disabled-students-at-a-disadvantage;

        Lydia X. Z. Brown, Ridhi Shetty, Matt Scherer, and Andrew Crawford.
        Ableism And Disability

        Discrimination In New Surveillance Technologies: How new surveillance
        technologies in education,

        policing, health care, and the workplace disproportionately harm
        disabled people . Center for Democracy

        and Technology Report. May 24,
        2022.https://cdt.org/insights/ableism-and-disability-discrimination-in-new-surveillance-technologies-how-new-surveillance-technologies-in-education-policing-health-care-and-the-workplace-disproportionately-harm-disabled-people/

        69
model-index:
  - name: SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy@1
            value: 0.71875
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.921875
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.96875
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 1
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.71875
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.30729166666666663
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.19374999999999998
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09999999999999999
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.71875
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.921875
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.96875
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 1
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8727659974381962
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.8304687500000002
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.8304687500000001
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.734375
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.921875
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.96875
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 1
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.734375
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.30729166666666663
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.19374999999999998
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.09999999999999999
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.734375
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.921875
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.96875
            name: Dot Recall@5
          - type: dot_recall@10
            value: 1
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.8785327200386421
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.8382812500000002
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.8382812500000001
            name: Dot Map@100

SentenceTransformer based on Alibaba-NLP/gte-large-en-v1.5

This is a sentence-transformers model finetuned from Alibaba-NLP/gte-large-en-v1.5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Alibaba-NLP/gte-large-en-v1.5
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'What are some of the legal frameworks mentioned in the context that aim to protect personal information, and how do they relate to data privacy concerns?',
    "65. See, e.g., Scott Ikeda. Major Data Broker Exposes 235 Million Social Media Profiles in Data Lead: Info\nAppears to Have Been Scraped Without Permission. CPO Magazine. Aug. 28, 2020. https://\nwww.cpomagazine.com/cyber-security/major-data-broker-exposes-235-million-social-media-profiles-\nin-data-leak/; Lily Hay Newman. 1.2 Billion Records Found Exposed Online in a Single Server . WIRED,\nNov. 22, 2019. https://www.wired.com/story/billion-records-exposed-online/\n66.Lola Fadulu. Facial Recognition Technology in Public Housing Prompts Backlash . New York Times.\nSept. 24, 2019.\nhttps://www.nytimes.com/2019/09/24/us/politics/facial-recognition-technology-housing.html\n67. Jo Constantz. ‘They Were Spying On Us’: Amazon, Walmart, Use Surveillance Technology to Bust\nUnions. Newsweek. Dec. 13, 2021.\nhttps://www.newsweek.com/they-were-spying-us-amazon-walmart-use-surveillance-technology-bust-\nunions-1658603\n68. See, e.g., enforcement actions by the FTC against the photo storage app Everalbaum\n(https://www.ftc.gov/legal-library/browse/cases-proceedings/192-3172-everalbum-inc-matter), and\nagainst Weight Watchers and their subsidiary Kurbo(https://www.ftc.gov/legal-library/browse/cases-proceedings/1923228-weight-watchersww)\n69. See, e.g., HIPAA, Pub. L 104-191 (1996); Fair Debt Collection Practices Act (FDCPA), Pub. L. 95-109\n(1977); Family Educational Rights and Privacy Act (FERPA) (20 U.S.C. § 1232g), Children's Online\nPrivacy Protection Act of 1998, 15 U.S.C. 6501–6505, and Confidential Information Protection andStatistical Efficiency Act (CIPSEA) (116 Stat. 2899)\n70. Marshall Allen. You Snooze, You Lose: Insurers Make The Old Adage Literally True . ProPublica. Nov.\n21, 2018.\nhttps://www.propublica.org/article/you-snooze-you-lose-insurers-make-the-old-adage-literally-true\n71.Charles Duhigg. How Companies Learn Your Secrets. The New York Times. Feb. 16, 2012.\nhttps://www.nytimes.com/2012/02/19/magazine/shopping-habits.html72. Jack Gillum and Jeff Kao. Aggression Detectors: The Unproven, Invasive Surveillance Technology\nSchools are Using to Monitor Students. ProPublica. Jun. 25, 2019.\nhttps://features.propublica.org/aggression-detector/the-unproven-invasive-surveillance-technology-\nschools-are-using-to-monitor-students/\n73.Drew Harwell. Cheating-detection companies made millions during the pandemic. Now students are\nfighting back. Washington Post. Nov. 12, 2020.\nhttps://www.washingtonpost.com/technology/2020/11/12/test-monitoring-student-revolt/\n74. See, e.g., Heather Morrison. Virtual Testing Puts Disabled Students at a Disadvantage. Government\nTechnology. May 24, 2022.\nhttps://www.govtech.com/education/k-12/virtual-testing-puts-disabled-students-at-a-disadvantage;\nLydia X. Z. Brown, Ridhi Shetty, Matt Scherer, and Andrew Crawford. Ableism And Disability\nDiscrimination In New Surveillance Technologies: How new surveillance technologies in education,\npolicing, health care, and the workplace disproportionately harm disabled people . Center for Democracy\nand Technology Report. May 24, 2022.https://cdt.org/insights/ableism-and-disability-discrimination-in-new-surveillance-technologies-how-new-surveillance-technologies-in-education-policing-health-care-and-the-workplace-disproportionately-harm-disabled-people/\n69",
    '25 MP-2.3-002 Review and document accuracy, representativeness, relevance, suitability of data \nused at different stages of AI life cycle.  Harmful Bias and Homogenization ; \nIntellectual Property  \nMP-2.3-003 Deploy and document fact -checking techniques to verify the accuracy and \nveracity of information generated by GAI systems, especially when the \ninformation comes from multiple (or unknown) sources.  Information Integrity  \nMP-2.3-004 Develop and implement testing techniques to identify GAI produced content (e.g., synthetic media) that might be indistinguishable from human -generated content.  Information Integrity  \nMP-2.3-005 Implement plans for GAI systems to undergo regular adversarial testing to identify \nvulnerabilities and potential manipulation or misuse.  Information Security  \nAI Actor Tasks:  AI Development, Domain Experts, TEVV  \n \nMAP 3.4:  Processes for operator and practitioner proficiency with AI system performance and trustworthiness – and relevant \ntechnical standards and certifications – are defined, assessed, and documented.  \nAction ID  Suggested Action  GAI Risks  \nMP-3.4-001 Evaluate whether GAI operators and end -users can accurately understand \ncontent lineage and origin.  Human -AI Configuration ; \nInformation Integrity  \nMP-3.4-002 Adapt existing training programs to include modules on digital content \ntransparency.  Information Integrity  \nMP-3.4-003 Develop certification programs that test proficiency in managing GAI risks and \ninterpreting content provenance, relevant to specific industry and context.  Information Integrity  \nMP-3.4-004 Delineate human proficiency tests from tests of GAI capabilities.  Human -AI Configuration  \nMP-3.4-005 Implement systems to continually monitor and track the outcomes of human- GAI \nconfigurations for future refinement and improvements . Human -AI Configuration ; \nInformation Integrity  \nMP-3.4-006 Involve the end -users, practitioners, and operators in GAI system in prototyping \nand testing activities. Make sure these tests cover various scenarios , such as crisis \nsituations or ethically sensitive contexts.  Human -AI Configuration ; \nInformation Integrity ; Harmful Bias \nand Homogenization ; Dangerous , \nViolent, or Hateful Content  \nAI Actor Tasks: AI Design, AI Development, Domain Experts, End -Users, Human Factors, Operation and Monitoring',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7188
cosine_accuracy@3 0.9219
cosine_accuracy@5 0.9688
cosine_accuracy@10 1.0
cosine_precision@1 0.7188
cosine_precision@3 0.3073
cosine_precision@5 0.1937
cosine_precision@10 0.1
cosine_recall@1 0.7188
cosine_recall@3 0.9219
cosine_recall@5 0.9688
cosine_recall@10 1.0
cosine_ndcg@10 0.8728
cosine_mrr@10 0.8305
cosine_map@100 0.8305
dot_accuracy@1 0.7344
dot_accuracy@3 0.9219
dot_accuracy@5 0.9688
dot_accuracy@10 1.0
dot_precision@1 0.7344
dot_precision@3 0.3073
dot_precision@5 0.1937
dot_precision@10 0.1
dot_recall@1 0.7344
dot_recall@3 0.9219
dot_recall@5 0.9688
dot_recall@10 1.0
dot_ndcg@10 0.8785
dot_mrr@10 0.8383
dot_map@100 0.8383

Training Details

Training Dataset

Unnamed Dataset

  • Size: 586 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 586 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 20 tokens
    • mean: 35.95 tokens
    • max: 60 tokens
    • min: 8 tokens
    • mean: 545.8 tokens
    • max: 1018 tokens
  • Samples:
    sentence_0 sentence_1
    What are the primary objectives outlined in the "Blueprint for an AI Bill of Rights" as it pertains to the American people? BLUEPRINT FOR AN
    AI B ILL OF
    RIGHTS
    MAKING AUTOMATED
    SYSTEMS WORK FOR
    THE AMERICAN PEOPLE
    OCTOBER 2022
    In what ways does the document propose to ensure that automated systems are designed and implemented to benefit society? BLUEPRINT FOR AN
    AI B ILL OF
    RIGHTS
    MAKING AUTOMATED
    SYSTEMS WORK FOR
    THE AMERICAN PEOPLE
    OCTOBER 2022
    What is the primary purpose of the Blueprint for an AI Bill of Rights as published by the White House Office of Science and Technology Policy in October 2022? About this Document
    The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People was
    published by the White House Office of Science and Technology Policy in October 2022. This framework was
    released one year after OSTP announced the launch of a process to develop “a bill of rights for an AI-powered
    world.” Its release follows a year of public engagement to inform this initiative. The framework is available
    online at: https://www.whitehouse.gov/ostp/ai-bill-of-rights
    About the Office of Science and Technology Policy
    The Office of Science and Technology Policy (OSTP) was established by the National Science and Technology
    Policy, Organization, and Priorities Act of 1976 to provide the President and others within the Executive Office
    of the President with advice on the scientific, engineering, and technological aspects of the economy, national
    security, health, foreign relations, the environment, and the technological recovery and use of resources, among
    other topics. OSTP leads interagency science and technology policy coordination efforts, assists the Office of
    Management and Budget (OMB) with an annual review and analysis of Federal research and development in
    budgets, and serves as a source of scientific and technological analysis and judgment for the President with
    respect to major policies, plans, and programs of the Federal Government.
    Legal Disclaimer
    The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People is a white paper
    published by the White House Office of Science and Technology Policy. It is intended to support the
    development of policies and practices that protect civil rights and promote democratic values in the building,
    deployment, and governance of automated systems.
    The Blueprint for an AI Bill of Rights is non-binding and does not constitute U.S. government policy. It
    does not supersede, modify, or direct an interpretation of any existing statute, regulation, policy, or
    international instrument. It does not constitute binding guidance for the public or Federal agencies and
    therefore does not require compliance with the principles described herein. It also is not determinative of what
    the U.S. government’s position will be in any international negotiation. Adoption of these principles may not
    meet the requirements of existing statutes, regulations, policies, or international instruments, or the
    requirements of the Federal agencies that enforce them. These principles are not intended to, and do not,
    prohibit or limit any lawful activity of a government agency, including law enforcement, national security, or
    intelligence activities.
    The appropriate application of the principles set forth in this white paper depends significantly on the
    context in which automated systems are being utilized. In some circumstances, application of these principles
    in whole or in part may not be appropriate given the intended use of automated systems to achieve government
    agency missions. Future sector-specific guidance will likely be necessary and important for guiding the use of
    automated systems in certain settings such as AI systems used as part of school building security or automated
    health diagnostic systems.
    The Blueprint for an AI Bill of Rights recognizes that law enforcement activities require a balancing of
    equities, for example, between the protection of sensitive law enforcement information and the principle of
    notice; as such, notice may not be appropriate, or may need to be adjusted to protect sources, methods, and
    other law enforcement equities. Even in contexts where these principles may not apply in whole or in part,
    federal departments and agencies remain subject to judicial, privacy, and civil liberties oversight as well as
    existing policies and safeguards that govern automated systems, including, for example, Executive Order 13960,
    Promoting the Use of Trustworthy Artificial Intelligence in the Federal Government (December 2020).
    This white paper recognizes that national security (which includes certain law enforcement and
    homeland security activities) and defense activities are of increased sensitivity and interest to our nation’s
    adversaries and are often subject to special requirements, such as those governing classified information and
    other protected data. Such activities require alternative, compatible safeguards through existing policies that
    govern automated systems and AI, such as the Department of Defense (DOD) AI Ethical Principles and
    Responsible AI Implementation Pathway and the Intelligence Community (IC) AI Ethics Principles and
    Framework. The implementation of these policies to national security and defense activities can be informed by
    the Blueprint for an AI Bill of Rights where feasible.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 5
  • per_device_eval_batch_size: 5
  • num_train_epochs: 2
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 5
  • per_device_eval_batch_size: 5
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step dot_map@100
0.4237 50 0.8383

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}