zicsx commited on
Commit
e3c1988
1 Parent(s): 0c61e12

added readme

Browse files
Files changed (1) hide show
  1. README.md +109 -0
README.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Hindi-Punk: Punctuation Prediction Model
3
+
4
+ Hindi-Punk is a fine-tuned model based on BERT MuRIL (Multilingual Representations for Indian Languages), specifically designed for adding punctuation to Hindi text. Leveraging the powerful capabilities of Google's MuRIL, which excels in understanding and representing multiple Indian languages, Hindi-Punk offers precise punctuation prediction for Hindi, making it a highly effective tool for natural language processing applications involving Hindi text.
5
+
6
+
7
+ ## Getting Started
8
+
9
+ To use the Hindi-Punk model, you'll need to have Python installed on your system along with PyTorch and the Hugging Face Transformers library. If you don't have them installed, you can install them using pip:
10
+
11
+ ```bash
12
+ pip install torch transformers
13
+ ```
14
+
15
+ ## Using the Model
16
+
17
+ ### Step 1: Import Required Libraries
18
+
19
+ Start by importing the necessary libraries:
20
+
21
+ ```python
22
+ import torch
23
+ import torch.nn as nn
24
+ from transformers import AutoTokenizer
25
+ from huggingface_hub import hf_hub_download
26
+ from transformers import BertModel
27
+ ```
28
+
29
+ ### Step 2: Download and Load the Model
30
+
31
+ The model is hosted on Hugging Face, and you can download it directly using the following code:
32
+
33
+ ```python
34
+ # Define the repository name and filename
35
+ repo_name = "zicsx/Hindi-Punk"
36
+ filename = "Hindi-Punk-model.pth"
37
+
38
+ # Download the file
39
+ model_path = hf_hub_download(repo_id=repo_name, filename=filename)
40
+ ```
41
+
42
+ Load the model using PyTorch:
43
+
44
+ ```python
45
+ # Define the model classes
46
+ class CustomTokenClassifier(nn.Module):
47
+ # ...
48
+
49
+ class PunctuationModel(nn.Module):
50
+ # ...
51
+
52
+ # Initialize and load the model
53
+ model = PunctuationModel(
54
+ bert_model_name='google/muril-base-cased',
55
+ punct_num_classes=5,
56
+ hidden_size=768
57
+ )
58
+ model.load_state_dict(torch.load(model_path))
59
+ ```
60
+
61
+ ### Step 3: Tokenization
62
+
63
+ Use the tokenizer associated with the model:
64
+
65
+ ```python
66
+ tokenizer = AutoTokenizer.from_pretrained(
67
+ pretrained_model_name_or_path="zicsx/Hindi-Punk", use_fast=True,
68
+ )
69
+ ```
70
+
71
+ ### Step 4: Define Inference Functions
72
+
73
+ Create functions to perform inference and process the model's output:
74
+
75
+ ```python
76
+ def predict_punctuation_capitalization(model, text, tokenizer):
77
+ # ...
78
+
79
+ def combine_predictions_with_text(text, tokenizer, punct_predictions, punct_index_to_label):
80
+ # ...
81
+ ```
82
+
83
+ ### Step 5: Run the Model
84
+
85
+ You can now run the model on your input text:
86
+
87
+ ```python
88
+ text = "Your Hindi text here"
89
+ punct_predictions = predict_punctuation_capitalization(model, text, tokenizer)
90
+ combined_text = combine_predictions_with_text(text, tokenizer, punct_predictions, punct_index_to_label)
91
+ print("Combined Text:", combined_text)
92
+ ```
93
+
94
+ ## Example
95
+
96
+ Here's an example of how to use the model:
97
+
98
+ ```python
99
+ example_text = "सलामअलैकुम कहाँ जा रहे हैं जी आओ बैठो छोड़ देता हूँ हेलो एक्सक्यूज मी आपका क्या नाम है तुम लोगों को बाद में देख लेता हूँ"
100
+ punct_predictions = predict_punctuation_capitalization(model, example_text, tokenizer)
101
+ combined_text = combine_predictions_with_text(example_text, tokenizer, punct_predictions, punct_index_to_label)
102
+ print("Combined Text:", combined_text)
103
+ ```
104
+
105
+ ## License
106
+
107
+ This model is open source and available under the [MIT License](https://opensource.org/licenses/MIT).
108
+
109
+ ---