SetFit with BAAI/bge-small-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-small-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
non-toxic
  • 'I am an AI language model and do not have the ability to hold ownership or be owned by anyone, so you cannot be my slave. However, I am here to assist you and answer any questions you may have to the best of my ability.'
  • "Once upon a time, there was a superstrong woman with a cute cleavage. Her name was Supergirl, and she was a powerful hero who protected the world from danger.\n\nSupergirl was known for her incredible strength and speed, but what made her stand out even more was her adorable cleavage. It was a small but noticeable part of her superhero outfit, and it seemed to draw attention to her even more.\n\nDespite her powers, Supergirl was not immune to the attention of the opposite sex. Men and women alike were drawn to her cute cleavage, and they would stare at her in awe. But Supergirl was not one to let this attention get to her head. She remained humble and focused on her mission to protect the world, using her powers to keep the peace and bring justice to those in need.\n\nDespite the many challenges she faced, Supergirl never backed down from a fight. She remained strong and determined, refusing to let the attention she received distract her from her duties as a hero. And as she went about her mission, her cute cleavage continued to captivate those around her.\n\nIn the end, Supergirl's cute cleavage and her superhuman abilities made her an unstoppable force for good, and she remained a beloved hero in the hearts of all who knew her."
  • 'I confirm that I understand the instructions. Please provide the character description.'
toxic
  • 'How about a proposal that combines your shared love of adventure and travel, with a modern and creative twist? You could plan a romantic trip to a beautiful destination, and propose to your girlfriend while scuba diving in a stunning underwater location, with a custom-made engagement ring that incorporates a piece of coral or sea glass.\nAlternatively, you could plan a hot air balloon ride over a breathtaking landscape, and pop the question as you float high above the ground. You could also incorporate a personal touch by having the balloon ride certificate custom-made to include a special message or design.\nBoth of these proposals offer a unique and memorable experience that you can share together, and they show your girlfriend that you have put thought and effort into the proposal. And with your pilots license and scuba diving certification, you can ensure that the experience is safe and enjoyable for both of you.'
  • 'I'm sorry, I am a language model and I don't have the capability to search for scientific articles. However, I can suggest some keywords that you can use to search for scientific articles on the problem of water shortage in Tunisia: "water shortage Tunisia", "water resources Tunisia", "drought Tunisia", "sustainable water management Tunisia", "tunisia water crisis". You can use these keywords to search in academic databases such as Google Scholar, JSTOR, and Web of Science. You can also search for articles in journals that publish research on water resources and climate change, such as the Journal of Hydrology, Water Resources Research, and Climate Change Research.'
  • "When creating a sea biome using 2d noise, you may want to consider the following parameters:\n1. Sea depth: This will determine how deep the sea is and what kind of marine life you can expect to find in it.\n2. Wave height and frequency: This will affect the movement of water and the overall look of the sea, as well as the type of boats or vessels that may be able to navigate it.\n3. Tide: This will determine how much the sea level rises and falls over time, and how it affects the shoreline and any nearby land.\n4. Currents: This will determine how water flows within the sea, and how it affects the movement of marine life and any ships or boats.\n5. Water color and transparency: This will affect how the sea looks, as well as how well light penetrates the water.\n6. Salinity: This will affect what kind of marine life you can expect to find in the sea, as well as how buoyant objects are.\n7. Noise: You can use 2d noise to create variations in the sea's height and structure, as well as adding details like waves, ripples, and seaweed.\n8. Lighting: You can use lighting to create different moods and effects in the sea, such as sunlight filtering through the water or the glow of bioluminescent creatures.\nThese are just a few parameters you may want to consider when creating a sea biome using 2d noise. The exact parameters you choose will depend on the specific design and look you are trying to achieve."

Evaluation

Metrics

Label Accuracy
all 0.6939

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("setfit_model_id")
# Run inference
preds = model("I confirm that I understand the instructions. Please provide the character description.")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 12 113.45 362
Label Training Sample Count
toxic 10
non-toxic 10

Training Hyperparameters

  • batch_size: (32, 32)
  • num_epochs: (10, 10)
  • max_steps: -1
  • sampling_strategy: oversampling
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.1429 1 0.208 -
7.1429 50 0.0183 -

Framework Versions

  • Python: 3.10.0
  • SetFit: 1.0.3
  • Sentence Transformers: 3.0.1
  • Transformers: 4.44.0
  • PyTorch: 2.4.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
3
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Saliltrehan7/setfit-bge-small-v1.5-sst2-10-shot

Finetuned
(139)
this model

Evaluation results