sumitaryal commited on
Commit
d5a6ec0
1 Parent(s): 705bb0a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - sumitaryal/nepali_grammatical_error_detection
5
+ language:
6
+ - ne
7
+ metrics:
8
+ - accuracy
9
+ base_model:
10
+ - google/muril-base-cased
11
+ pipeline_tag: text-classification
12
+ ---
13
+
14
+ # Model Card for Nepali Grammatical Error Detection (MuRIL)
15
+
16
+ This model is designed for **Nepali Grammatical Error Detection (GED)** task. It utilizes the BERT-based MuRIL model to detect grammatical errors in Nepali text.
17
+
18
+ ## Model Details
19
+
20
+ ### Model Description
21
+
22
+ - **Developed by:** Sumit Aryal
23
+ - **Model type:** BERT (MuRIL-based)
24
+ - **Language(s):** Nepali
25
+ - **License:** Apache 2.0
26
+ - **Finetuned from model:** google/muril-base-cased
27
+
28
+ ### Dataset
29
+
30
+ - **Dataset Name:** [Nepali Grammatical Error Detection Dataset](https://huggingface.co/datasets/sumitaryal/nepali_grammatical_error_detection)
31
+ - **Description:** The dataset comprises a total of **2,568,682** correctly constructed sentences alongside their erroneous counterparts, resulting in **7,514,122** samples for the training dataset. For the validation dataset, it contains **365,606** correct sentences and **405,905** incorrect sentences. This diverse collection encompasses various types of grammatical errors, including verb inflections, homophones, punctuation errors, and sentence structure issues, making it a comprehensive resource for training and evaluating grammatical error detection models.
32
+
33
+ ### Model Sources
34
+
35
+ - **Repository:** [Nepali Grammatical Error Detection MuRIL](https://huggingface.co/sumitaryal/Nepali_Grammatical_Error_Detection_MuRIL)
36
+ - **Paper:** "BERT-Based Nepali Grammatical Error Detection and Correction Leveraging a New Corpus" (INSPECT-2024)
37
+
38
+ ## Uses
39
+
40
+ ### Direct Use
41
+
42
+ - Grammar checking for written Nepali text.
43
+
44
+ ## Evaluation Metrics
45
+ - **Accuracy:** 91.1515%
46
+ - **Traning Loss:** 0.242700
47
+ - **Validation Loss:** 0.217756
48
+
49
+ ## How to Get Started with the Model
50
+
51
+ Use the code below to get started with the model.
52
+
53
+ ```python
54
+ import torch
55
+ from transformers import BertForSequenceClassification, AutoTokenizer
56
+
57
+ model = BertForSequenceClassification.from_pretrained("sumitaryal/Nepali_Grammatical_Error_Detection_MuRIL")
58
+ tokenizer = AutoTokenizer.from_pretrained("sumitaryal/Nepali_Grammatical_Error_Detection_MuRIL", do_lower_case=False)
59
+
60
+ input_sentence = "रामले भात खायो ।"
61
+ inputs = tokenizer(input_sentence, return_tensors="pt")
62
+
63
+ with torch.no_grad():
64
+ logits = model(**inputs).logits
65
+
66
+ predicted_class_id = logits.argmax().item()
67
+ predicted_class = model.config.id2label[predicted_class_id]
68
+ print(f'The sentence "{input_sentence}" is "{predicted_class}"')
69
+ ```
70
+
71
+ ## Training Details
72
+ - Framework: PyTorch
73
+ - Hyperparameters:
74
+ - Epoch = 1
75
+ - Train Batch Size = 256
76
+ - Valid Batch Size = 256
77
+ - Loss Function = Cross Entripy Loss
78
+ - Optimizer = AdamW
79
+ - Optimizer Parameters:
80
+ - Learning Rate = 5e-5
81
+ - β1 = 0.9
82
+ - β2 = 0.999
83
+ - ϵ = 1e−8
84
+ - GPU = NVIDIA® GeForce® RTXTM 4060 GPU, 8GB VRAM