SetFit with jinaai/jina-embeddings-v2-base-en

This is a SetFit model that can be used for Text Classification. This SetFit model uses jinaai/jina-embeddings-v2-base-en as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
ccro:BasedOn
  • 'The axiomatizations presented in Quesada (2010, 2011) also dispense with strong monotonicity.'
ccro:Basedon
  • 'A formal mathematical description of the h-index introduced by Hirsch (2005)'
  • 'Woeginger (2008a, b) and Quesada (2009, 2010) have already suggested characterizations of the Hirsch index'
  • 'Woeginger (2008a, b) and Quesada (2009, 2010) have already suggested characterizations of the Hirsch index'
ccro:Compare
  • 'Instead, a variety of studies [8, 9] have shown that the h index by and large agrees with other objective and subjective measures of scientific quality in a variety of different disciplines (10–15),'
  • 'Instead, a variety of studies [8, 9] have shown that the h index by and large agrees with other objective and subjective measures of scientific quality in a variety of different disciplines (10–15),'
  • 'Instead, a variety of studies [8, 9] have shown that the h index by and large agrees with other objective and subjective measures of scientific quality in a variety of different disciplines (10–15),'
ccro:Contrast
  • 'Hirsch (2005) argues that two individuals with similar Hirsch-index are comparable in terms of their overall scientific impact, even if their total number of papers or their total number of citations is very different.'
  • 'The three differ from Woeginger’s (2008a) characterization in requiring fewer axioms (three instead of five)'
  • 'Marchant (2009), instead of characterizing the index itself, characterizes the ranking that the Hirsch index induces on outputs.'
ccro:Criticize
  • 'The h-index does not take into account that some papers may have extraordinarily many citations, and the g-index tries to compensate for this; see also Egghe (2006b) and Tol (2008).'
  • 'The h-index does not take into account that some papers may have extraordinarily many citations, and the g-index tries to compensate for this; see also Egghe (2006b) and Tol (2008).'
  • 'Woeginger (2008a, p. 227) stresses that his axioms should be interpreted within the context of MON.'
ccro:Discuss
  • 'The relation between N and h will depend on the detailed form of the particular distribution (HI0501-01)'
  • 'As discussed by Redner (HI0501-03), most papers earn their citations over a limited period of popularity and then they are no longer cited.'
  • 'It is also possible that papers "drop out" and then later come back into the h count, as would occur for the kind of papers termed "sleeping beauties" (HI0501-04).'
ccro:Extend
  • 'In [3] the analogous formula for the g-index has been proved'
ccro:Incorporate
  • 'In this paper, we provide an axiomatic characterization of the Hirsch-index, in very much the same spirit as Arrow (1950, 1951), May (1952), and Moulin (1988) did for numerous other problems in mathematical decision making.'
  • 'In this paper, we provide an axiomatic characterization of the Hirsch-index, in very much the same spirit as Arrow (1950, 1951), May (1952), and Moulin (1988) did for numerous other problems in mathematical decision making.'
  • 'In this paper, we provide an axiomatic characterization of the Hirsch-index, in very much the same spirit as Arrow (1950, 1951), May (1952), and Moulin (1988) did for numerous other problems in mathematical decision making.'
ccro:Negate
  • 'Recently, Lehmann et al. (2, 3) have argued that the mean number of citations per paper (nc = Nc/Np) is a superior indicator.'
  • 'If one chose instead to use as indicator of scientific achievement the mean number of citations per paper [following Lehmann et al. (2, 3)], our results suggest that (as in the stock market) ‘‘past performance is not predictive of future performance.’’'
  • 'It has been argued in the literature that one drawback of the h index is that it does not give enough ‘‘credit’’ to very highly cited papers, and various modifications have been proposed to correct this, in particular, Egghe’s g index (4), Jin et al.’s AR index (5), and Komulski’s H(2) index (6).'

Evaluation

Metrics

Label Accuracy
all 0.6667

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("Corran/CCRO2")
# Run inference
preds = model("One of the referees recommends mentioning Quesada (2008) as another characterization of the Hirsch index relying as well on monotonicity.")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 6 25.7812 53
Label Training Sample Count
ccro:BasedOn 1
ccro:Basedon 11
ccro:Compare 21
ccro:Contrast 3
ccro:Criticize 4
ccro:Discuss 37
ccro:Extend 1
ccro:Incorporate 14
ccro:Negate 4

Training Hyperparameters

  • batch_size: (32, 32)
  • num_epochs: (1, 1)
  • max_steps: -1
  • sampling_strategy: oversampling
  • num_iterations: 100
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0017 1 0.311 -
0.0833 50 0.1338 -
0.1667 100 0.0054 -
0.25 150 0.0017 -
0.3333 200 0.0065 -
0.4167 250 0.0003 -
0.5 300 0.0003 -
0.5833 350 0.0005 -
0.6667 400 0.0004 -
0.75 450 0.0002 -
0.8333 500 0.0002 -
0.9167 550 0.0002 -
1.0 600 0.0002 -

Framework Versions

  • Python: 3.10.12
  • SetFit: 1.0.3
  • Sentence Transformers: 2.2.2
  • Transformers: 4.35.2
  • PyTorch: 2.1.0+cu121
  • Datasets: 2.16.1
  • Tokenizers: 0.15.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
13
Safetensors
Model size
137M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Corran/CCRO2

Finetuned
(4)
this model

Evaluation results