File size: 3,093 Bytes
d5a6ec0 aa66957 d5a6ec0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
license: apache-2.0
datasets:
- sumitaryal/nepali_grammatical_error_detection
language:
- ne
metrics:
- accuracy
base_model:
- google/muril-base-cased
pipeline_tag: text-classification
widget:
- src: रामले भात खायो ।
example_title: Sample 1
new_version: sumitaryal/Nepali_Grammatical_Error_Detection_MuRIL
library_name: transformers
---
# Model Card for Nepali Grammatical Error Detection (MuRIL)
This model is designed for **Nepali Grammatical Error Detection (GED)** task. It utilizes the BERT-based MuRIL model to detect grammatical errors in Nepali text.
## Model Details
### Model Description
- **Developed by:** Sumit Aryal
- **Model type:** BERT (MuRIL-based)
- **Language(s):** Nepali
- **License:** Apache 2.0
- **Finetuned from model:** google/muril-base-cased
### Dataset
- **Dataset Name:** [Nepali Grammatical Error Detection Dataset](https://huggingface.co./datasets/sumitaryal/nepali_grammatical_error_detection)
- **Description:** The dataset comprises a total of **2,568,682** correctly constructed sentences alongside their erroneous counterparts, resulting in **7,514,122** samples for the training dataset. For the validation dataset, it contains **365,606** correct sentences and **405,905** incorrect sentences. This diverse collection encompasses various types of grammatical errors, including verb inflections, homophones, punctuation errors, and sentence structure issues, making it a comprehensive resource for training and evaluating grammatical error detection models.
### Model Sources
- **Repository:** [Nepali Grammatical Error Detection MuRIL](https://huggingface.co./sumitaryal/Nepali_Grammatical_Error_Detection_MuRIL)
- **Paper:** "BERT-Based Nepali Grammatical Error Detection and Correction Leveraging a New Corpus" (INSPECT-2024)
## Uses
### Direct Use
- Grammar checking for written Nepali text.
## Evaluation Metrics
- **Accuracy:** 91.1515%
- **Traning Loss:** 0.242700
- **Validation Loss:** 0.217756
## How to Get Started with the Model
Use the code below to get started with the model.
```python
import torch
from transformers import BertForSequenceClassification, AutoTokenizer
model = BertForSequenceClassification.from_pretrained("sumitaryal/Nepali_Grammatical_Error_Detection_MuRIL")
tokenizer = AutoTokenizer.from_pretrained("sumitaryal/Nepali_Grammatical_Error_Detection_MuRIL", do_lower_case=False)
input_sentence = "रामले भात खायो ।"
inputs = tokenizer(input_sentence, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
predicted_class = model.config.id2label[predicted_class_id]
print(f'The sentence "{input_sentence}" is "{predicted_class}"')
```
## Training Details
- Framework: PyTorch
- Hyperparameters:
- Epoch = 1
- Train Batch Size = 256
- Valid Batch Size = 256
- Loss Function = Cross Entripy Loss
- Optimizer = AdamW
- Optimizer Parameters:
- Learning Rate = 5e-5
- β1 = 0.9
- β2 = 0.999
- ϵ = 1e−8
- GPU = NVIDIA® GeForce® RTXTM 4060 GPU, 8GB VRAM |