Netta1994
/

setfit_baai_newrelic_gpt-4o_improved-cot-instructions_chat_few_shot_remove_final_eval

@@ -74,7 +74,7 @@ model-index:
       split: test
     metrics:
     - type: accuracy
-      value: 0.6567164179104478
       name: Accuracy
 ---
@@ -106,17 +106,17 @@ The model has been trained using an efficient few-shot learning technique that i
 - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
 ### Model Labels
-| Label | Examples                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
-|:------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| 0     | <ul><li>'Reasoning:\nThe provided answer is too general and is not grounded in the specific details or context of the given document, which focuses on study budgets and considerations for wise spending within the organization.\n\nEvaluation:'</li><li>'Reasoning:\nThe answer contains specific information from the document and lists relevant pet peeves. However, there are multiple instances of the phrase "Cassandra Rivera Heather Nelson," which seems out of place. This could cause confusion and detract from the overall clarity.\n\nFinal Result:'</li><li>"Reasoning:\nThe answer does not address the question directly. The provided steps and methods mentioned focus on handling personal documents, expense reimbursements, secure sharing of information, feedback discussion, and requesting a learning budget. The question specifically asks about accessing the company's training resources, and the answer provided does not stay focused on that.\n\nEvaluation:"</li></ul>                                                                                                                                                                                                      |
-| 1     | <ul><li>'Reasoning:\nThe answer accurately addresses the question, providing specific details from the document about how feedback should be given. It includes key points such as timing, focusing on the situation, being clear and direct, showing appreciation, and the intention behind giving feedback, all of which are mentioned in the document.\n\nEvaluation:'</li><li>'Reasoning:\nThe answer correctly identifies several reasons why it is important to share information from high-level meetings, directly supported by the provided document. The explanation is clear, relevant, and to the point, avoiding unnecessary information.\n\nEvaluation:'</li><li>'Reasoning:\nThe answer largely reproduces content from the document accurately, specifying the process for keeping track of kilometers and sending an email or excel document. However, there are inaccuracies in the email addresses and naming inconsistency. For instance, "Dustin Chan" appears in place of "finance@Dustin [email protected]" and "ORGANIZATION_2," which do not exist in the document. Moreover, it includes extraneous, non-verifiable information about the parking card.\n\nEvaluation:'</li></ul> |
 ## Evaluation
 ### Metrics
 | Label   | Accuracy |
 |:--------|:---------|
-| **all** | 0.6567   |
 ## Uses
@@ -171,16 +171,16 @@ Final evaluation:")
 ### Training Set Metrics
 | Training set | Min | Median  | Max |
 |:-------------|:----|:--------|:----|
-| Word count   | 16  | 48.1538 | 125 |
 | Label | Training Sample Count |
 |:------|:----------------------|
-| 0     | 32                    |
-| 1     | 33                    |
 ### Training Hyperparameters
 - batch_size: (16, 16)
-- num_epochs: (2, 2)
 - max_steps: -1
 - sampling_strategy: oversampling
 - num_iterations: 20
@@ -200,21 +200,27 @@ Final evaluation:")
 ### Training Results
 | Epoch  | Step | Training Loss | Validation Loss |
 |:------:|:----:|:-------------:|:---------------:|
-| 0.0061 | 1    | 0.2381        | -               |
-| 0.3067 | 50   | 0.2589        | -               |
-| 0.6135 | 100  | 0.2081        | -               |
-| 0.9202 | 150  | 0.0271        | -               |
-| 1.2270 | 200  | 0.0045        | -               |
-| 1.5337 | 250  | 0.0031        | -               |
-| 1.8405 | 300  | 0.0026        | -               |
 ### Framework Versions
 - Python: 3.10.14
 - SetFit: 1.1.0
-- Sentence Transformers: 3.1.0
 - Transformers: 4.44.0
-- PyTorch: 2.4.1+cu121
-- Datasets: 2.19.2
 - Tokenizers: 0.19.1
 ## Citation

       split: test
     metrics:
     - type: accuracy
+      value: 0.7611940298507462
       name: Accuracy
 ---
 - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
 ### Model Labels
+| Label | Examples                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+|:------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| 1     | <ul><li>'Reasoning:\nhallucination - The answer introduces information that is not found in the document, which indicates that it is hallucinating.\nEvaluation:'</li><li>'Reasoning:\nThe answer provided is mostly aligned with the content of the document, discussing pulse checking as a rough method to estimate if systolic blood pressure is relatively normal. However, the mention of checking after moderate activity seems slightly misrepresented compared to the source material. The source also provides minor additional context and disclaimers that the answer partially addresses.\n\nFinal Evaluation:'</li><li>"Reasoning:\n- Well-Supported: The answer correctly explains the flexibility in holidays, including the 4-6 weeks off, the requirement for a 2-week consecutive break, and the need for clear communication, which stems from the documents.\n- Specificity: The answer provides specific details about the holiday policy at ORGANIZATION, reflecting what's stated in the document.\n- Conciseness: The answer is clear and to the point, covering all the necessary aspects of the flexible holiday policy without unnecessary details.\n\nEvaluation:"</li></ul> |
+| 0     | <ul><li>'Reasoning:\nirrelevant - The answer provided does not relate to the document or the specific question asked.\nEvaluation:'</li><li>'Reasoning:\nThe given answer sufficiently explains the referral bonus structure, including specific amounts for typical and difficult-to-fill roles, eligibility criteria, and the referral process. It also mentions that certain roles (e.g., hiring managers) are excluded from receiving bonuses.\n\nEvaluation:'</li><li>"Reasoning:\ncontext grounding - The answer is well-supported by the document, although some specific points, such as drinking ice water, weren't explicitly mentioned.\nrelevance - The answer is directly related to the specific question asked.\nconciseness - While the answer is quite detailed, it remains focused and does not deviate into unrelated topics, making it concise enough given the context.\n\nEvaluation:"</li></ul>                                                                                                                                                                                                                                                                                    |
 ## Evaluation
 ### Metrics
 | Label   | Accuracy |
 |:--------|:---------|
+| **all** | 0.7612   |
 ## Uses
 ### Training Set Metrics
 | Training set | Min | Median  | Max |
 |:-------------|:----|:--------|:----|
+| Word count   | 3   | 38.1107 | 148 |
 | Label | Training Sample Count |
 |:------|:----------------------|
+| 0     | 111                   |
+| 1     | 133                   |
 ### Training Hyperparameters
 - batch_size: (16, 16)
+- num_epochs: (1, 1)
 - max_steps: -1
 - sampling_strategy: oversampling
 - num_iterations: 20
 ### Training Results
 | Epoch  | Step | Training Loss | Validation Loss |
 |:------:|:----:|:-------------:|:---------------:|
+| 0.0016 | 1    | 0.2275        | -               |
+| 0.0820 | 50   | 0.2535        | -               |
+| 0.1639 | 100  | 0.2094        | -               |
+| 0.2459 | 150  | 0.1707        | -               |
+| 0.3279 | 200  | 0.1085        | -               |
+| 0.4098 | 250  | 0.0439        | -               |
+| 0.4918 | 300  | 0.0268        | -               |
+| 0.5738 | 350  | 0.0129        | -               |
+| 0.6557 | 400  | 0.0256        | -               |
+| 0.7377 | 450  | 0.016         | -               |
+| 0.8197 | 500  | 0.0128        | -               |
+| 0.9016 | 550  | 0.0109        | -               |
+| 0.9836 | 600  | 0.0127        | -               |
 ### Framework Versions
 - Python: 3.10.14
 - SetFit: 1.1.0
+- Sentence Transformers: 3.1.1
 - Transformers: 4.44.0
+- PyTorch: 2.4.0+cu121
+- Datasets: 3.0.0
 - Tokenizers: 0.19.1
 ## Citation

config_sentence_transformers.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
   "__version__": {
-    "sentence_transformers": "3.1.0",
     "transformers": "4.44.0",
-    "pytorch": "2.4.1+cu121"
   },
   "prompts": {},
   "default_prompt_name": null,

 {
   "__version__": {
+    "sentence_transformers": "3.1.1",
     "transformers": "4.44.0",
+    "pytorch": "2.4.0+cu121"
   },
   "prompts": {},
   "default_prompt_name": null,

config_setfit.json CHANGED Viewed

@@ -1,4 +1,4 @@
 {
-  "normalize_embeddings": false,
-  "labels": null
 }

 {
+  "labels": null,
+  "normalize_embeddings": false
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3096f612c377cfada0a3bda45770dab2f061ba16885097bfe9bdf90c6aad4d43
 size 437951328

 version https://git-lfs.github.com/spec/v1
+oid sha256:c49a7a24588072d2710874c643a2db5e20fdce42c20e85928f35589a53668978
 size 437951328

model_head.pkl CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:928b8dea0cb857fed7dcb22cb448b1311f73e551e7a363aa1218c125201f843e
 size 7007

 version https://git-lfs.github.com/spec/v1
+oid sha256:c0c6ddbbda48b1288974be33614b39ef266863f4addbc5cc33f2ab0a8f788b20
 size 7007