yefo-ufpe commited on
Commit
79f635d
1 Parent(s): 0d4b002
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -9,14 +9,14 @@ tags:
9
  - sft
10
  - generated_from_trainer
11
  model-index:
12
- - name: output
13
  results: []
14
  ---
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
19
- # output
20
 
21
  This model is a fine-tuned version of [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) on [SWAG](https://huggingface.co/datasets/allenai/swag) dataset.
22
  It achieves the following results on the evaluation set:
@@ -25,7 +25,6 @@ It achieves the following results on the evaluation set:
25
 
26
  ## Model description
27
 
28
- More information needed
29
 
30
  ## Intended uses & limitations
31
 
@@ -41,6 +40,8 @@ dataset = load_dataset("swag")
41
 
42
  ## Training procedure
43
 
 
 
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training:
 
9
  - sft
10
  - generated_from_trainer
11
  model-index:
12
+ - name: bert-base-uncased-swag
13
  results: []
14
  ---
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
19
+ # bert-base-uncased-swag
20
 
21
  This model is a fine-tuned version of [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) on [SWAG](https://huggingface.co/datasets/allenai/swag) dataset.
22
  It achieves the following results on the evaluation set:
 
25
 
26
  ## Model description
27
 
 
28
 
29
  ## Intended uses & limitations
30
 
 
40
 
41
  ## Training procedure
42
 
43
+ Our approach focuses explicitly on adapting the Transformers weights' Wq (query) and Wv (value) in the attention module for parameter efficiency.
44
+
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training: