--- license: apache-2.0 datasets: - stanfordnlp/imdb language: - en base_model: - google-bert/bert-base-uncased pipeline_tag: text-classification tags: - IMDB - Sentiment Analysis --- # BERT-Based Sentiment Analysis Models ## Model Description This repository contains two versions of BERT-based models fine-tuned for sentiment analysis tasks: - **BERT-1**: Fine-tuned on the IMDB movie reviews dataset. - **BERT-2**: Fine-tuned on a combined dataset of IMDB movie reviews dataset and Twitter comments. Both models are based on the `bert-base-uncased` pre-trained model from Hugging Face's Transformers library. ## Intended Use These models are intended for binary sentiment analysis of English text data. They can be used to classify text into positive or negative sentiment categories. ### Loading the Models ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load BERT-1 tokenizer_bert1 = AutoTokenizer.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-1") model_bert1 = AutoModelForSequenceClassification.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-1") # Load BERT-2 tokenizer_bert2 = AutoTokenizer.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-2") model_bert2 = AutoModelForSequenceClassification.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-2") ``` ### Performing Sentiment Analysis ```python from transformers import pipeline # Initialize pipelines sentiment_pipeline_bert1 = pipeline("sentiment-analysis", model=model_bert1, tokenizer=tokenizer_bert1) sentiment_pipeline_bert2 = pipeline("sentiment-analysis", model=model_bert2, tokenizer=tokenizer_bert2) # Sample text text = "I absolutely loved this product! It exceeded my expectations." # Get predictions result_bert1 = sentiment_pipeline_bert1(text) result_bert2 = sentiment_pipeline_bert2(text) print("BERT-1 Prediction:", result_bert1) print("BERT-2 Prediction:", result_bert2) ``` ## Training Details ### BERT-1 - **Dataset**: [IMDB Movie Reviews Dataset](https://ai.stanford.edu/~amaas/data/sentiment/) - **Objective**: Binary sentiment classification (positive/negative) - **Optimizer**: AdamW with a learning rate `lr` (value unspecified) - **Scheduler**: Linear scheduler with warmup (`get_linear_schedule_with_warmup`) - **Epochs**: `num_epochs = 3` - **Device**: Trained on GPU if available - **Metrics Monitored**: Training loss, training accuracy, testing accuracy per epoch ### BERT-2 - **Dataset**: - [IMDB Movie Reviews Dataset](https://ai.stanford.edu/~amaas/data/sentiment/) - [Twitter Comment - Sentiment Analysis Dataset](https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset) - **Objective**: Binary sentiment classification (positive/negative) - **Optimizer**: AdamW with weight decay (`0.01`) and parameters requiring gradients - **Scheduler**: Linear scheduler with warmup (`10%` of total steps) - **Gradient Clipping**: Applied with `max_norm=1.0` - **Early Stopping**: Implemented with a patience of `2` epochs without improvement in validation loss - **Epochs**: `num_epochs = 3`, training may stop early due to early stopping - **Device**: Trained on GPU if available - **Metrics Monitored**: Training loss, training accuracy, validation loss, validation accuracy per epoch ## Limitations and Biases - **Data Bias**: The models are trained on specific datasets, which may contain inherent biases such as demographic or cultural biases. - **Language Support**: Only supports English language text. - **Generalization**: Performance may degrade on text significantly different from the training data (e.g., slang, jargon). - **Ethical Considerations**: Users should be cautious of potential biases in predictions and should not use the model for critical decisions without human oversight. ## License The models are distributed under the same license as the original `bert-base-uncased` model ([Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)). ## Acknowledgements - Thanks to the Hugging Face team for providing the Transformers library and model hosting. - The IMDB dataset is made available by [Maas et al.](https://ai.stanford.edu/~amaas/data/sentiment/) under a [Creative Commons Attribution-NonCommercial 3.0 Unported License](https://creativecommons.org/licenses/by-nc/3.0/). --- **Disclaimer**: The models are provided "as is" without warranty of any kind. The author is not responsible for any outcomes resulting from the use of these models.