raygx
/

distilGPT-NepSA

@@ -1,5 +1,5 @@
 ---
-base_model: raygx/Nepali-DistilGPT2
 tags:
 - generated_from_keras_callback
 model-index:
@@ -12,11 +12,11 @@ probably proofread and complete it, then remove this comment. -->
 # distilGPT-NepSA
-This model is a fine-tuned version of [raygx/Nepali-DistilGPT2](https://huggingface.co/raygx/Nepali-DistilGPT2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Train Loss: 0.7238
-- Validation Loss: 0.7132
-- Epoch: 5
 ## Model description
@@ -35,24 +35,20 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-06, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.005}
 - training_precision: float32
 ### Training results
 | Train Loss | Validation Loss | Epoch |
 |:----------:|:---------------:|:-----:|
-| 1.0605     | 0.8926          | 0     |
-| 0.8693     | 0.8015          | 1     |
-| 0.8041     | 0.7605          | 2     |
-| 0.7711     | 0.7366          | 3     |
-| 0.7469     | 0.7236          | 4     |
-| 0.7238     | 0.7132          | 5     |
 ### Framework versions
-- Transformers 4.31.0
-- TensorFlow 2.12.0
-- Datasets 2.14.4
 - Tokenizers 0.13.3

 ---
+license: apache-2.0
 tags:
 - generated_from_keras_callback
 model-index:
 # distilGPT-NepSA
+This model is a fine-tuned version of [raygx/distilGPT-Nepali](https://huggingface.co/raygx/distilGPT-Nepali) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Train Loss: 0.6596
+- Validation Loss: 0.6809
+- Epoch: 1
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 1e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.03}
 - training_precision: float32
 ### Training results
 | Train Loss | Validation Loss | Epoch |
 |:----------:|:---------------:|:-----:|
+| 0.8788     | 0.7572          | 0     |
+| 0.6596     | 0.6809          | 1     |
 ### Framework versions
+- Transformers 4.28.1
+- TensorFlow 2.11.0
+- Datasets 2.1.0
 - Tokenizers 0.13.3

config.json CHANGED Viewed

@@ -1,14 +1,14 @@
 {
-  "_name_or_path": "raygx/Nepali-DistilGPT2",
   "_num_labels": 1,
   "activation_function": "gelu_new",
   "architectures": [
     "GPT2ForSequenceClassification"
   ],
   "attn_pdrop": 0.1,
-  "bos_token_id": 1,
   "embd_pdrop": 0.1,
-  "eos_token_id": 2,
   "id2label": {
     "0": "NEUTRAL",
     "1": "POSITIVE",
@@ -24,7 +24,7 @@
   "model_type": "gpt2",
   "n_ctx": 1024,
   "n_embd": 768,
-  "n_head": 6,
   "n_inner": null,
   "n_layer": 6,
   "n_positions": 1024,
@@ -44,7 +44,7 @@
       "max_length": 50
     }
   },
-  "transformers_version": "4.31.0",
   "use_cache": true,
-  "vocab_size": 50000
 }

 {
+  "_name_or_path": "raygx/distilGPT-Nepali",
   "_num_labels": 1,
   "activation_function": "gelu_new",
   "architectures": [
     "GPT2ForSequenceClassification"
   ],
   "attn_pdrop": 0.1,
+  "bos_token_id": null,
   "embd_pdrop": 0.1,
+  "eos_token_id": null,
   "id2label": {
     "0": "NEUTRAL",
     "1": "POSITIVE",
   "model_type": "gpt2",
   "n_ctx": 1024,
   "n_embd": 768,
+  "n_head": 12,
   "n_inner": null,
   "n_layer": 6,
   "n_positions": 1024,
       "max_length": 50
     }
   },
+  "transformers_version": "4.28.1",
   "use_cache": true,
+  "vocab_size": 50003
 }

tf_model.h5 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d61db576793435960e0346a64e4eb48a56d49096170860eddab818346fa1f035
-size 326968664

 version https://git-lfs.github.com/spec/v1
+oid sha256:7cf295902ff41cd0fe53ad81486af836bd3c509e68586f9c7f3adbc99977219b
+size 480590728