---
license: apache-2.0
datasets:
- stanfordnlp/imdb
language:
- en
base_model:
- google-bert/bert-base-uncased
pipeline_tag: text-classification
tags:
- IMDB
- Sentiment Analysis
---
# BERT-Based Sentiment Analysis Models

## Model Description

This repository contains two versions of BERT-based models fine-tuned for sentiment analysis tasks:

- **BERT-1**: Fine-tuned on the IMDB movie reviews dataset.
- **BERT-2**: Fine-tuned on a combined dataset of IMDB movie reviews dataset and Twitter comments.

Both models are based on the `bert-base-uncased` pre-trained model from Hugging Face's Transformers library.

## Intended Use

These models are intended for binary sentiment analysis of English text data. They can be used to classify text into positive or negative sentiment categories.

### Loading the Models

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load BERT-1
tokenizer_bert1 = AutoTokenizer.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-1")
model_bert1 = AutoModelForSequenceClassification.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-1")

# Load BERT-2
tokenizer_bert2 = AutoTokenizer.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-2")
model_bert2 = AutoModelForSequenceClassification.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-2")
```

### Performing Sentiment Analysis

```python
from transformers import pipeline

# Initialize pipelines
sentiment_pipeline_bert1 = pipeline("sentiment-analysis", model=model_bert1, tokenizer=tokenizer_bert1)
sentiment_pipeline_bert2 = pipeline("sentiment-analysis", model=model_bert2, tokenizer=tokenizer_bert2)

# Sample text
text = "I absolutely loved this product! It exceeded my expectations."

# Get predictions
result_bert1 = sentiment_pipeline_bert1(text)
result_bert2 = sentiment_pipeline_bert2(text)

print("BERT-1 Prediction:", result_bert1)
print("BERT-2 Prediction:", result_bert2)
```

## Training Details

### BERT-1

- **Dataset**: [IMDB Movie Reviews Dataset](https://ai.stanford.edu/~amaas/data/sentiment/)
- **Objective**: Binary sentiment classification (positive/negative)
- **Optimizer**: AdamW with a learning rate `lr` (value unspecified)
- **Scheduler**: Linear scheduler with warmup (`get_linear_schedule_with_warmup`)
- **Epochs**: `num_epochs = 3`
- **Device**: Trained on GPU if available
- **Metrics Monitored**: Training loss, training accuracy, testing accuracy per epoch

### BERT-2

- **Dataset**:
  - [IMDB Movie Reviews Dataset](https://ai.stanford.edu/~amaas/data/sentiment/)
  - [Twitter Comment - Sentiment Analysis Dataset](https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset)
- **Objective**: Binary sentiment classification (positive/negative)
- **Optimizer**: AdamW with weight decay (`0.01`) and parameters requiring gradients
- **Scheduler**: Linear scheduler with warmup (`10%` of total steps)
- **Gradient Clipping**: Applied with `max_norm=1.0`
- **Early Stopping**: Implemented with a patience of `2` epochs without improvement in validation loss
- **Epochs**: `num_epochs = 3`, training may stop early due to early stopping
- **Device**: Trained on GPU if available
- **Metrics Monitored**: Training loss, training accuracy, validation loss, validation accuracy per epoch

## Limitations and Biases

- **Data Bias**: The models are trained on specific datasets, which may contain inherent biases such as demographic or cultural biases.
- **Language Support**: Only supports English language text.
- **Generalization**: Performance may degrade on text significantly different from the training data (e.g., slang, jargon).
- **Ethical Considerations**: Users should be cautious of potential biases in predictions and should not use the model for critical decisions without human oversight.

## License

The models are distributed under the same license as the original `bert-base-uncased` model ([Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)).

## Acknowledgements

- Thanks to the Hugging Face team for providing the Transformers library and model hosting.
- The IMDB dataset is made available by [Maas et al.](https://ai.stanford.edu/~amaas/data/sentiment/) under a [Creative Commons Attribution-NonCommercial 3.0 Unported License](https://creativecommons.org/licenses/by-nc/3.0/).

---

**Disclaimer**: The models are provided "as is" without warranty of any kind. The author is not responsible for any outcomes resulting from the use of these models.