gte-small-mitre / README.md
acedev003's picture
Add new SentenceTransformer model.
4e4d33c verified
metadata
base_model: thenlper/gte-small
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:29440
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      Olympic Destroyer uses PsExec to interact with the ADMIN$ network share to
      execute commands on remote systems.
    sentences:
      - >-
        Adversaries may target user email to collect sensitive information.
        Emails may contain sensitive data, including trade secrets or personal
        information, that can prove valuable to adversaries. Adversaries can
        collect or forward email from mail servers or clients. 
      - >-
        Adversaries can hide a program's true filetype by changing the extension
        of a file. With certain file types (specifically this does not work with
        .app extensions), appending a space to the end of a filename will change
        how the file is processed by the operating system.For example, if there
        is a Mach-O executable file called <code>evil.bin</code>, when it is
        double clicked by a user, it will launch Terminal.app and execute. If
        this file is renamed to <code>evil.txt</code>, then when double clicked
        by a user, it will launch with the default text editing application (not
        executing the binary). However, if the file is renamed to <code>evil.txt
        </code> (note the space at the end), then when double clicked by a user,
        the true file type is determined by the OS and handled appropriately and
        the binary will be executed (Citation: Mac Backdoors are
        back).Adversaries can use this feature to trick users into double
        clicking benign-looking files of any format and ultimately executing
        something malicious.
      - >-
        Adversaries may use [Valid
        Accounts](https://attack.mitre.org/techniques/T1078) to log into a
        service that accepts remote connections, such as telnet, SSH, and VNC.
        The adversary may then perform actions as the logged-on user.In an
        enterprise environment, servers and workstations can be organized into
        domains. Domains provide centralized identity management, allowing users
        to login using one set of credentials across the entire network. If an
        adversary is able to obtain a set of valid domain credentials, they
        could login to many different machines using remote access protocols
        such as secure shell (SSH) or remote desktop protocol (RDP).(Citation:
        SSH Secure Shell)(Citation: TechNet Remote Desktop Services) They could
        also login to accessible SaaS or IaaS services, such as those that
        federate their identities to the domain. Legitimate applications (such
        as [Software Deployment
        Tools](https://attack.mitre.org/techniques/T1072) and other
        administrative programs) may utilize [Remote
        Services](https://attack.mitre.org/techniques/T1021) to access remote
        hosts. For example, Apple Remote Desktop (ARD) on macOS is native
        software used for remote management. ARD leverages a blend of protocols,
        including [VNC](https://attack.mitre.org/techniques/T1021/005) to send
        the screen and control buffers and
        [SSH](https://attack.mitre.org/techniques/T1021/004) for secure file
        transfer.(Citation: Remote Management MDM macOS)(Citation: Kickstart
        Apple Remote Desktop commands)(Citation: Apple Remote Desktop Admin
        Guide 3.3) Adversaries can abuse applications such as ARD to gain remote
        code execution and perform lateral movement. In versions of macOS prior
        to 10.14, an adversary can escalate an SSH session to an ARD session
        which enables an adversary to accept TCC (Transparency, Consent, and
        Control) prompts without user interaction and gain access to
        data.(Citation: FireEye 2019 Apple Remote Desktop)(Citation: Lockboxx
        ARD 2019)(Citation: Kickstart Apple Remote Desktop commands)
  - source_sentence: >-
      Network intrusion prevention systems and systems designed to scan and
      remove malicious email attachments or links can be used to block activity.
    sentences:
      - >-
        Adversaries may abuse task scheduling functionality to facilitate
        initial or recurring execution of malicious code. Utilities exist within
        all major operating systems to schedule programs or scripts to be
        executed at a specified date and time. A task can also be scheduled on a
        remote system, provided the proper authentication is met (ex: RPC and
        file and printer sharing in Windows environments). Scheduling a task on
        a remote system typically may require being a member of an admin or
        otherwise privileged group on the remote system.(Citation: TechNet Task
        Scheduler Security)Adversaries may use task scheduling to execute
        programs at system startup or on a scheduled basis for persistence.
        These mechanisms can also be abused to run a process under the context
        of a specified account (such as one with elevated
        permissions/privileges). Similar to [System Binary Proxy
        Execution](https://attack.mitre.org/techniques/T1218), adversaries have
        also abused task scheduling to potentially mask one-time execution under
        a trusted system process.(Citation: ProofPoint Serpent)
      - >-
        Adversaries may attempt to make an executable or file difficult to
        discover or analyze by encrypting, encoding, or otherwise obfuscating
        its contents on the system or in transit. This is common behavior that
        can be used across different platforms and the network to evade
        defenses. Payloads may be compressed, archived, or encrypted in order to
        avoid detection. These payloads may be used during Initial Access or
        later to mitigate detection. Sometimes a user's action may be required
        to open and [Deobfuscate/Decode Files or
        Information](https://attack.mitre.org/techniques/T1140) for [User
        Execution](https://attack.mitre.org/techniques/T1204). The user may also
        be required to input a password to open a password protected
        compressed/encrypted file that was provided by the adversary. (Citation:
        Volexity PowerDuke November 2016) Adversaries may also use compressed or
        archived scripts, such as JavaScript. Portions of files can also be
        encoded to hide the plain-text strings that would otherwise help
        defenders with discovery. (Citation: Linux/Cdorked.A We Live Security
        Analysis) Payloads may also be split into separate, seemingly benign
        files that only reveal malicious functionality when reassembled.
        (Citation: Carbon Black Obfuscation Sept 2016)Adversaries may also abuse
        [Command Obfuscation](https://attack.mitre.org/techniques/T1027/010) to
        obscure commands executed from payloads or directly via [Command and
        Scripting Interpreter](https://attack.mitre.org/techniques/T1059).
        Environment variables, aliases, characters, and other platform/language
        specific semantics can be used to evade signature based detections and
        application control mechanisms. (Citation: FireEye Obfuscation June
        2017) (Citation: FireEye Revoke-Obfuscation July 2017)(Citation:
        PaloAlto EncodedCommand March 2017) 
      - >-
        Adversaries may send phishing messages to gain access to victim systems.
        All forms of phishing are electronically delivered social engineering.
        Phishing can be targeted, known as spearphishing. In spearphishing, a
        specific individual, company, or industry will be targeted by the
        adversary. More generally, adversaries can conduct non-targeted
        phishing, such as in mass malware spam campaigns.Adversaries may send
        victims emails containing malicious attachments or links, typically to
        execute malicious code on victim systems. Phishing may also be conducted
        via third-party services, like social media platforms. Phishing may also
        involve social engineering techniques, such as posing as a trusted
        source, as well as evasive techniques such as removing or manipulating
        emails or metadata/headers from compromised accounts being abused to
        send messages (e.g., [Email Hiding
        Rules](https://attack.mitre.org/techniques/T1564/008)).(Citation:
        Microsoft OAuth Spam 2022)(Citation: Palo Alto Unit 42 VBA Infostealer
        2014) Another way to accomplish this is by forging or spoofing(Citation:
        Proofpoint-spoof) the identity of the sender which can be used to fool
        both the human recipient as well as automated security tools.(Citation:
        cyberproof-double-bounce) Victims may also receive phishing messages
        that instruct them to call a phone number where they are directed to
        visit a malicious URL, download malware,(Citation: sygnia Luna
        Month)(Citation: CISA Remote Monitoring and Management Software) or
        install adversary-accessible remote management tools onto their computer
        (i.e., [User
        Execution](https://attack.mitre.org/techniques/T1204)).(Citation: Unit42
        Luna Moth)
  - source_sentence: MoonWind obtains the number of removable drives from the victim.
    sentences:
      - >-
        Adversaries may attempt to gather information about attached peripheral
        devices and components connected to a computer system.(Citation:
        Peripheral Discovery Linux)(Citation: Peripheral Discovery macOS)
        Peripheral devices could include auxiliary resources that support a
        variety of functionalities such as keyboards, printers, cameras, smart
        card readers, or removable storage. The information may be used to
        enhance their awareness of the system and network environment or may be
        used for further actions.
      - >-
        Adversaries can steal application access tokens as a means of acquiring
        credentials to access remote systems and resources.Application access
        tokens are used to make authorized API requests on behalf of a user or
        service and are commonly used as a way to access resources in cloud and
        container-based applications and software-as-a-service (SaaS).(Citation:
        Auth0 - Why You Should Always Use Access Tokens to Secure APIs Sept
        2019) OAuth is one commonly implemented framework that issues tokens to
        users for access to systems. Adversaries who steal account API tokens in
        cloud and containerized environments may be able to access data and
        perform actions with the permissions of these accounts, which can lead
        to privilege escalation and further compromise of the environment.In
        Kubernetes environments, processes running inside a container
        communicate with the Kubernetes API server using service account tokens.
        If a container is compromised, an attacker may be able to steal the
        container’s token and thereby gain access to Kubernetes API
        commands.(Citation: Kubernetes Service Accounts)Token theft can also
        occur through social engineering, in which case user action may be
        required to grant access. An application desiring access to cloud-based
        services or protected APIs can gain entry using OAuth 2.0 through a
        variety of authorization protocols. An example commonly-used sequence is
        Microsoft's Authorization Code Grant flow.(Citation: Microsoft Identity
        Platform Protocols May 2019)(Citation: Microsoft - OAuth Code
        Authorization flow - June 2019) An OAuth access token enables a
        third-party application to interact with resources containing user data
        in the ways requested by the application without obtaining user
        credentials.  Adversaries can leverage OAuth authorization by
        constructing a malicious application designed to be granted access to
        resources with the target user's OAuth token.(Citation: Amnesty OAuth
        Phishing Attacks, August 2019)(Citation: Trend Micro Pawn Storm OAuth
        2017) The adversary will need to complete registration of their
        application with the authorization server, for example Microsoft
        Identity Platform using Azure Portal, the Visual Studio IDE, the
        command-line interface, PowerShell, or REST API calls.(Citation:
        Microsoft - Azure AD App Registration - May 2019) Then, they can send a
        [Spearphishing Link](https://attack.mitre.org/techniques/T1566/002) to
        the target user to entice them to grant access to the application. Once
        the OAuth access token is granted, the application can gain potentially
        long-term access to features of the user account through [Application
        Access Token](https://attack.mitre.org/techniques/T1550/001).(Citation:
        Microsoft - Azure AD Identity Tokens - Aug 2019)Application access
        tokens may function within a limited lifetime, limiting how long an
        adversary can utilize the stolen token. However, in some cases,
        adversaries can also steal application refresh tokens(Citation: Auth0
        Understanding Refresh Tokens), allowing them to obtain new access tokens
        without prompting the user.  
      - >-
        Adversaries may modify component firmware to persist on systems. Some
        adversaries may employ sophisticated means to compromise computer
        components and install malicious firmware that will execute adversary
        code outside of the operating system and main system firmware or BIOS.
        This technique may be similar to [System
        Firmware](https://attack.mitre.org/techniques/T1542/001) but conducted
        upon other system components/devices that may not have the same
        capability or level of integrity checking.Malicious component firmware
        could provide both a persistent level of access to systems despite
        potential typical failures to maintain access and hard disk re-images,
        as well as a way to evade host software-based defenses and integrity
        checks.
  - source_sentence: InvisiMole can launch a remote shell to execute commands.
    sentences:
      - >-
        Adversaries may abuse the Windows command shell for execution. The
        Windows command shell ([cmd](https://attack.mitre.org/software/S0106))
        is the primary command prompt on Windows systems. The Windows command
        prompt can be used to control almost any aspect of a system, with
        various permission levels required for different subsets of commands.
        The command prompt can be invoked remotely via [Remote
        Services](https://attack.mitre.org/techniques/T1021) such as
        [SSH](https://attack.mitre.org/techniques/T1021/004).(Citation: SSH in
        Windows)Batch files (ex: .bat or .cmd) also provide the shell with a
        list of sequential commands to run, as well as normal scripting
        operations such as conditionals and loops. Common uses of batch files
        include long or repetitive tasks, or the need to run the same set of
        commands on multiple systems.Adversaries may leverage
        [cmd](https://attack.mitre.org/software/S0106) to execute various
        commands and payloads. Common uses include
        [cmd](https://attack.mitre.org/software/S0106) to execute a single
        command, or abusing [cmd](https://attack.mitre.org/software/S0106)
        interactively with input and output forwarded over a command and control
        channel.
      - >-
        Adversaries may abuse command and script interpreters to execute
        commands, scripts, or binaries. These interfaces and languages provide
        ways of interacting with computer systems and are a common feature
        across many different platforms. Most systems come with some built-in
        command-line interface and scripting capabilities, for example, macOS
        and Linux distributions include some flavor of [Unix
        Shell](https://attack.mitre.org/techniques/T1059/004) while Windows
        installations include the [Windows Command
        Shell](https://attack.mitre.org/techniques/T1059/003) and
        [PowerShell](https://attack.mitre.org/techniques/T1059/001).There are
        also cross-platform interpreters such as
        [Python](https://attack.mitre.org/techniques/T1059/006), as well as
        those commonly associated with client applications such as
        [JavaScript](https://attack.mitre.org/techniques/T1059/007) and [Visual
        Basic](https://attack.mitre.org/techniques/T1059/005).Adversaries may
        abuse these technologies in various ways as a means of executing
        arbitrary commands. Commands and scripts can be embedded in [Initial
        Access](https://attack.mitre.org/tactics/TA0001) payloads delivered to
        victims as lure documents or as secondary payloads downloaded from an
        existing C2. Adversaries may also execute commands through interactive
        terminals/shells, as well as utilize various [Remote
        Services](https://attack.mitre.org/techniques/T1021) in order to achieve
        remote Execution.(Citation: Powershell Remote Commands)(Citation: Cisco
        IOS Software Integrity Assurance - Command History)(Citation: Remote
        Shell Execution in Python)
      - >-
        Adversaries may communicate using application layer protocols associated
        with electronic mail delivery to avoid detection/network filtering by
        blending in with existing traffic. Commands to the remote system, and
        often the results of those commands, will be embedded within the
        protocol traffic between the client and server. Protocols such as
        SMTP/S, POP3/S, and IMAP that carry electronic mail may be very common
        in environments.  Packets produced from these protocols may have many
        fields and headers in which data can be concealed. Data could also be
        concealed within the email messages themselves. An adversary may abuse
        these protocols to communicate with systems under their control within a
        victim network while also mimicking normal, expected traffic. 
  - source_sentence: >-
      BackdoorDiplomacy has dropped legitimate software onto a compromised host
      and used it to execute malicious DLLs.
    sentences:
      - >-
        Adversaries may transfer tools or other files from an external system
        into a compromised environment. Tools or files may be copied from an
        external adversary-controlled system to the victim network through the
        command and control channel or through alternate protocols such as
        [ftp](https://attack.mitre.org/software/S0095). Once present,
        adversaries may also transfer/spread tools between victim devices within
        a compromised environment (i.e. [Lateral Tool
        Transfer](https://attack.mitre.org/techniques/T1570)). On Windows,
        adversaries may use various utilities to download tools, such as `copy`,
        `finger`, [certutil](https://attack.mitre.org/software/S0160), and
        [PowerShell](https://attack.mitre.org/techniques/T1059/001) commands
        such as <code>IEX(New-Object Net.WebClient).downloadString()</code> and
        <code>Invoke-WebRequest</code>. On Linux and macOS systems, a variety of
        utilities also exist, such as `curl`, `scp`, `sftp`, `tftp`, `rsync`,
        `finger`, and `wget`.(Citation: t1105_lolbas)Adversaries may also abuse
        installers and package managers, such as `yum` or `winget`, to download
        tools to victim hosts.Files can also be transferred using various [Web
        Service](https://attack.mitre.org/techniques/T1102)s as well as native
        or otherwise present tools on the victim system.(Citation: PTSecurity
        Cobalt Dec 2016) In some cases, adversaries may be able to leverage
        services that sync between a web-based and an on-premises client, such
        as Dropbox or OneDrive, to transfer files onto victim systems. For
        example, by compromising a cloud account and logging into the service's
        web portal, an adversary may be able to trigger an automatic syncing
        process that transfers the file onto the victim's machine.(Citation:
        Dropbox Malware Sync)
      - >-
        Adversaries may communicate using application layer protocols associated
        with web traffic to avoid detection/network filtering by blending in
        with existing traffic. Commands to the remote system, and often the
        results of those commands, will be embedded within the protocol traffic
        between the client and server. Protocols such as HTTP/S(Citation:
        CrowdStrike Putter Panda) and WebSocket(Citation: Brazking-Websockets)
        that carry web traffic may be very common in environments. HTTP/S
        packets have many fields and headers in which data can be concealed. An
        adversary may abuse these protocols to communicate with systems under
        their control within a victim network while also mimicking normal,
        expected traffic. 
      - >-
        Adversaries may inject code into processes in order to evade
        process-based defenses as well as possibly elevate privileges. Process
        injection is a method of executing arbitrary code in the address space
        of a separate live process. Running code in the context of another
        process may allow access to the process's memory, system/network
        resources, and possibly elevated privileges. Execution via process
        injection may also evade detection from security products since the
        execution is masked under a legitimate process. There are many different
        ways to inject code into a process, many of which abuse legitimate
        functionalities. These implementations exist for every major OS but are
        typically platform specific. More sophisticated samples may perform
        multiple process injections to segment modules and further evade
        detection, utilizing named pipes or other inter-process communication
        (IPC) mechanisms as a communication channel. 

SentenceTransformer based on thenlper/gte-small

This is a sentence-transformers model finetuned from thenlper/gte-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: thenlper/gte-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("acedev003/gte-small-mitre")
# Run inference
sentences = [
    'BackdoorDiplomacy has dropped legitimate software onto a compromised host and used it to execute malicious DLLs.',
    "Adversaries may inject code into processes in order to evade process-based defenses as well as possibly elevate privileges. Process injection is a method of executing arbitrary code in the address space of a separate live process. Running code in the context of another process may allow access to the process's memory, system/network resources, and possibly elevated privileges. Execution via process injection may also evade detection from security products since the execution is masked under a legitimate process. There are many different ways to inject code into a process, many of which abuse legitimate functionalities. These implementations exist for every major OS but are typically platform specific. More sophisticated samples may perform multiple process injections to segment modules and further evade detection, utilizing named pipes or other inter-process communication (IPC) mechanisms as a communication channel. ",
    'Adversaries may communicate using application layer protocols associated with web traffic to avoid detection/network filtering by blending in with existing traffic. Commands to the remote system, and often the results of those commands, will be embedded within the protocol traffic between the client and server. Protocols such as HTTP/S(Citation: CrowdStrike Putter Panda) and WebSocket(Citation: Brazking-Websockets) that carry web traffic may be very common in environments. HTTP/S packets have many fields and headers in which data can be concealed. An adversary may abuse these protocols to communicate with systems under their control within a victim network while also mimicking normal, expected traffic. ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 29,440 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 4 tokens
    • mean: 25.63 tokens
    • max: 101 tokens
    • min: 37 tokens
    • mean: 283.48 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1
    Adversaries may bridge network boundaries by modifying a network device’s Network Address Translation (NAT) configuration. Adversaries may bridge network boundaries by modifying a network device’s Network Address Translation (NAT) configuration. Malicious modifications to NAT may enable an adversary to bypass restrictions on traffic routing that otherwise separate trusted and untrusted networks.Network devices such as routers and firewalls that connect multiple networks together may implement NAT during the process of passing packets between networks. When performing NAT, the network device will rewrite the source and/or destination addresses of the IP address header. Some network designs require NAT for the packets to cross the border device. A typical example of this is environments where internal networks make use of non-Internet routable addresses.(Citation: RFC1918)When an adversary gains control of a network boundary device, they can either leverage existing NAT configurations to send traffic between two separated networks, or they can implement NAT configurations of their own design. In the case of network designs that require NAT to function, this enables the adversary to overcome inherent routing limitations that would normally prevent them from accessing protected systems behind the border device. In the case of network designs that do not require NAT, address translation can be used by adversaries to obscure their activities, as changing the addresses of packets that traverse a network boundary device can make monitoring data transmissions more challenging for defenders. Adversaries may use Patch System Image to change the operating system of a network device, implementing their own custom NAT mechanisms to further obscure their activities
    When documents, applications, or programs are downloaded an extended attribute (xattr) called com.apple.quarantine can be set on the file by the application performing the download. Adversaries may undermine security controls that will either warn users of untrusted activity or prevent execution of untrusted programs. Operating systems and security products may contain mechanisms to identify programs or websites as possessing some level of trust. Examples of such features would include a program being allowed to run because it is signed by a valid code signing certificate, a program prompting the user with a warning because it has an attribute set from being downloaded from the Internet, or getting an indication that you are about to connect to an untrusted site.Adversaries may attempt to subvert these trust mechanisms. The method adversaries use will depend on the specific mechanism they seek to subvert. Adversaries may conduct File and Directory Permissions Modification or Modify Registry in support of subverting these controls.(Citation: SpectorOps Subverting Trust Sept 2017) Adversaries may also create or steal code signing certificates to acquire trust on target systems.(Citation: Securelist Digital Certificates)(Citation: Symantec Digital Certificates)
    FIN8 has used a Batch file to automate frequently executed post compromise cleanup activities. Adversaries may abuse the Windows command shell for execution. The Windows command shell (cmd) is the primary command prompt on Windows systems. The Windows command prompt can be used to control almost any aspect of a system, with various permission levels required for different subsets of commands. The command prompt can be invoked remotely via Remote Services such as SSH.(Citation: SSH in Windows)Batch files (ex: .bat or .cmd) also provide the shell with a list of sequential commands to run, as well as normal scripting operations such as conditionals and loops. Common uses of batch files include long or repetitive tasks, or the need to run the same set of commands on multiple systems.Adversaries may leverage cmd to execute various commands and payloads. Common uses include cmd to execute a single command, or abusing cmd interactively with input and output forwarded over a command and control channel.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.2717 500 0.8973
0.5435 1000 0.5649
0.8152 1500 0.4969

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.44.0
  • PyTorch: 2.4.0
  • Accelerate: 0.33.0
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}