Kumshe commited on
Commit
6123449
·
verified ·
1 Parent(s): 480c0fb

Update README.md

Browse files

![Screenshot 2024-08-31 at 20.44.59.png](https://cdn-uploads.huggingface.co/production/uploads/665eb34e46c7555475ae0350/9pb2vN9ZMzTWz-9p_dygx.png)

![Screenshot 2024-08-31 at 20.44.21.jpg](https://cdn-uploads.huggingface.co/production/uploads/665eb34e46c7555475ae0350/5c_6BXTiq6PYf0awtd8-8.jpeg)
![Screenshot 2024-08-31 at 20.44.29.jpg](https://cdn-uploads.huggingface.co/production/uploads/665eb34e46c7555475ae0350/6skIxqs69akt1qUTH1kFc.jpeg)

Files changed (1) hide show
  1. README.md +82 -56
README.md CHANGED
@@ -1,60 +1,86 @@
1
- ---
2
- language: ha # Hausa language code
3
- tags:
4
- - sentiment-analysis
5
- - hausa
6
- - social-media
7
- - transformers
8
- - bert
9
- license: apache-2.0
10
- ---
11
-
12
- # Hausa Sentiment Analysis
13
-
14
- This model is a fine-tuned version of `bert-base-cased` designed for sentiment analysis of Hausa text data. The model is specifically trained to classify social media text (tweets) into different sentiment categories.
15
-
16
- ## Model Description
17
-
18
- **Hausa Sentiment Analysis** is a BERT-based model fine-tuned for analyzing the sentiment of Hausa language social media text. The model was trained on 35,000 examples collected from various social media platforms, making it suitable for sentiment analysis tasks in Hausa.
19
-
20
- ## Intended Uses & Limitations
21
-
22
- - **Intended Use**: Sentiment analysis of social media texts in the Hausa language.
23
- - **Primary Use Cases**: Monitoring and analyzing public sentiment on social media platforms, academic research in natural language processing (NLP) for low-resource languages.
24
- - **Limitations**: May not perform well on text outside the social media domain or with dialectal variations.
25
 
26
- ## Training Data
 
 
27
 
28
- - **Data Source**: Collected from social media platforms.
29
- - **Number of Examples**: 35,000
30
- - **Preprocessing**: Text normalization, tokenization.
31
-
32
- ## Training Procedure
33
-
34
- - **Training Script**: Used the Hugging Face `Trainer` API.
35
- - **Hyperparameters**:
36
- - Epochs: 40
37
- - Batch Size (Train): 32
38
- - Batch Size (Eval): 64
39
- - Warmup Steps: 10
40
- - Weight Decay: 0.01
41
- - Logging Steps: 200
42
-
43
- ## Evaluation
44
-
45
- - **Evaluation Metrics**: Accuracy, Precision, Recall, F1-score.
46
- - **Results**: The model achieved high performance on the validation set, indicating strong capability in handling Hausa social media sentiment analysis tasks.
47
-
48
- ## How to Use
49
-
50
- To use this model for sentiment analysis, you can load it using the `transformers` library:
51
-
52
- ```python
53
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
54
 
55
- tokenizer = AutoTokenizer.from_pretrained("Kumshe/Hausa-sentiment-analysis")
56
- model = AutoModelForSequenceClassification.from_pretrained("Kumshe/Hausa-sentiment-analysis")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
- # Example usage
59
- inputs = tokenizer("This is an example tweet in Hausa language", return_tensors="pt")
60
- outputs = model(**inputs)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
 
2
+ **Model Name**: Hausa Sentiment Analysis
3
+ **Model ID**: `Kumshe/Hausa-sentiment-analysis`
4
+ **Language**: Hausa
5
 
6
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
+ ### **Model Description**
9
+ This model is a BERT-based model fine-tuned for sentiment analysis in the Hausa language. It is trained to classify social media text into different sentiment categories: positive, negative, or neutral.
10
+
11
+ ### **Intended Use**
12
+ - **Primary Use Case**: Sentiment analysis for Hausa social media content, such as tweets or Facebook posts.
13
+ - **Target Users**: NLP researchers, businesses analyzing social media, and developers building sentiment analysis tools for Hausa language content.
14
+ - **Example Usage**:
15
+ ```python
16
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
17
+
18
+ # Load the model and tokenizer
19
+ tokenizer = AutoTokenizer.from_pretrained("Kumshe/Hausa-sentiment-analysis")
20
+ model = AutoModelForSequenceClassification.from_pretrained("Kumshe/Hausa-sentiment-analysis")
21
+
22
+ # Encode the input text
23
+ inputs = tokenizer("Your Hausa text here", return_tensors="pt")
24
+
25
+ # Get model predictions
26
+ outputs = model(**inputs)
27
+ ```
28
+
29
+ ### **Model Architecture**
30
+ - **Base Model**: BERT (Bidirectional Encoder Representations from Transformers)
31
+ - **Pre-trained Model**: `bert-base-cased` from Hugging Face Transformers library.
32
+ - **Fine-Tuned Model**: Fine-tuned for 3 epochs on a Hausa sentiment dataset.
33
+
34
+ ### **Training Data**
35
+ - **Data Source**: The model was trained on a dataset containing 35,000 examples from social media platforms such as Twitter and Facebook.
36
+ - **Data Split**:
37
+ - **Training Set**: 80% of the data
38
+ - **Validation Set**: 20% of the data
39
+
40
+ ### **Training Details**
41
+ - **Number of Epochs**: 40
42
+ - **Batch Size**:
43
+ - Per device training batch size: 32
44
+ - Per device evaluation batch size: 64
45
+ - **Learning Rate Schedule**: Warm-up steps: 10, Weight decay: 0.01
46
+ - **Optimizer**: AdamW
47
+ - **Training Hardware**: Trained on Kaggle using 2 NVIDIA T4 GPUs.
48
+
49
+ ### **Evaluation Metrics**
50
+ - **Evaluation Loss**: 0.6265
51
+ - **Accuracy**: 73.47%
52
+ - **F1 Score**: 73.47%
53
+ - **Precision**: 73.54%
54
+ - **Recall**: 73.47%
55
+
56
+ ### **Model Performance**
57
+ The model performs well on the given dataset, achieving a balanced performance between precision, recall, and F1 score, making it suitable for general sentiment analysis tasks in Hausa language text.
58
+
59
+ ### **Limitations**
60
+ - The model may not generalize well to other types of Hausa text outside of social media (e.g., formal writing or literature).
61
+ - Performance may degrade on text containing slang or regional dialects not well-represented in the training data.
62
+ - The model is biased towards the examples in the training dataset; biases in the data may affect predictions.
63
+
64
+ ### **Ethical Considerations**
65
+ - Sentiment analysis models can potentially amplify biases present in the training data.
66
+ - Use cautiously in sensitive applications to avoid unintended consequences.
67
+ - Consider the impact on privacy and data protection laws, especially when analyzing social media content.
68
+
69
+ ### **License**
70
+ - Apache 2.0
71
+
72
+ ### **Citation**
73
+ If you use this model in your work, please cite it as follows:
74
+ ```
75
+ @misc{Kumshe2024HausaSentimentAnalysis,
76
+ author = {Umar Muhammad Mustapha Kumshe},
77
+ title = {Hausa Sentiment Analysis},
78
+ year = {2024},
79
+ publisher = {Hugging Face},
80
+ howpublished = {\url{https://huggingface.co/Kumshe/Hausa-sentiment-analysis}},
81
+ }
82
+ ```
83
+
84
+ ### **Contributions**
85
+ This model was fine-tuned by Umar Muhammad Mustapha Kumshe. Feel free to contribute, provide feedback, or raise issues on the [model repository](https://huggingface.co/Kumshe/Hausa-sentiment-analysis).
86