README.md · tabularisai/robust-sentiment-analysis at 304856b271ec1e345eb321981b3d64c84373d056

File size: 4,202 Bytes

5ae83ec
 
 
 
 
6c3ffa5
5ae83ec
 
304856b
5ae83ec
 
6c3ffa5
 
 
 
 
 
 
304856b
6c3ffa5
 
 
304856b
5ae83ec
6c3ffa5
 
 
 
5ae83ec
 
 
6c3ffa5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5ae83ec
6c3ffa5
5ae83ec
6c3ffa5
5ae83ec
6c3ffa5
 
 
 
 
5ae83ec
6c3ffa5
5ae83ec
6c3ffa5
5ae83ec
6c3ffa5
 
 
 
5ae83ec
6c3ffa5
5ae83ec
304856b

---
language: en
tags:
- text-classification
- sentiment-analysis
license: apache-2.0
---

# BERT-based Sentiment Classification Mode

## Model Details
- **Model Name:** tabularisai/bert-base-uncased-sentiment-five-classes
- **Base Model:** bert-base-uncased
- **Task:** Text Classification (Sentiment Analysis)
- **Language:** English

## Model Description

This model is a fine-tuned version of `bert-base-uncased` for sentiment analysis. **Trained exclusively on syntethic data produced by SOTA LLMs: Llama3, Gemma2, and more**

### Training Data

The model was fine-tuned on synthetic data, which allows for targeted training on a diverse range of sentiment expressions without the limitations often found in real-world datasets. 

### Training Procedure

- The model was fine-tuned for 5 epochs.
- Achieved a train_acc_off_by_one (accuracy allowing for predictions off by one class) of approximately *0.95* on the validation dataset.

## Intended Use

This model is designed for sentiment analysis tasks, particularly useful for:
- Social media monitoring
- Customer feedback analysis
- Product review sentiment classification
- Brand sentiment tracking

## How to Use

Here's a quick example of how to use the model:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "tabularisai/bert-base-uncased-sentiment-five-classes"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Function to predict sentiment
def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(probabilities, dim=-1).item()
    
    sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"}
    return sentiment_map[predicted_class]

# Example usage
texts = [
    "I absolutely loved this movie! The acting was superb and the plot was engaging.",
    "The service at this restaurant was terrible. I'll never go back.",
    "The product works as expected. Nothing special, but it gets the job done.",
    "I'm somewhat disappointed with my purchase. It's not as good as I hoped.",
    "This book changed my life! I couldn't put it down and learned so much."
]

for text in texts:
    sentiment = predict_sentiment(text)
    print(f"Text: {text}")
    print(f"Sentiment: {sentiment}\n")
```

## Model Performance

The model demonstrates strong performance across various sentiment categories. Here are some example predictions:
```
1. "I absolutely loved this movie! The acting was superb and the plot was engaging."
   Predicted Sentiment: Very Positive

2. "The service at this restaurant was terrible. I'll never go back."
   Predicted Sentiment: Very Negative

3. "The product works as expected. Nothing special, but it gets the job done."
   Predicted Sentiment: Neutral

4. "I'm somewhat disappointed with my purchase. It's not as good as I hoped."
   Predicted Sentiment: Negative

5. "This book changed my life! I couldn't put it down and learned so much."
   Predicted Sentiment: Very Positive
```


## Training Procedure

The model was fine-tuned on synthetic data using the `bert-base-uncased` architecture. The training process involved:

- Dataset: Synthetic data designed to cover a wide range of sentiment expressions
- Training framework: PyTorch Lightning
- Number of epochs: 5
- Performance metric: Achieved train_acc_off_by_one of approximately 0.95 on the validation dataset
- Hardware: [Specify the hardware used for training]

## Ethical Considerations

While efforts have been made to create a balanced and fair model through the use of synthetic data, users should be aware that the model may still exhibit biases. It's crucial to thoroughly test the model in your specific use case and monitor its performance over time.

## Citation
```
Will be included
```

## Contact

For questions, feedback, or issues related to this model, please `[email protected]`