Yelp Review Classifier

This model is a sentiment classification model for Yelp reviews, trained to predict whether a review is star ratings (1 to 5 stars). The model was fine-tuned using the distilbert-base-uncased model architecture, based on the DistilBERT model from Hugging Face, and trained on a Yelp reviews dataset.

Model Details

  • Model Type: DistilBERT-based model for sequence classification
  • Model Architecture: distilbert-base-uncased
  • Number of Parameters: Approximately 66M parameters
  • Training Dataset: The model was trained on a curated Yelp reviews dataset, labeled for star ratings (1 to 5 stars).
  • Fine-Tuning Task: Multi-class classification for Yelp reviews, predicting the star rating (from 1 to 5 stars) based on the content of the review.

Training Data

  • Dataset: Custom Yelp reviews dataset
  • Data Description: The dataset consists of Yelp reviews, labeled for star ratings (1 to 5 stars).
  • Preprocessing: The dataset was preprocessed by cleaning the reviews to remove unwanted characters and URLs.

Training Details

  • Training Framework: Hugging Face Transformers and PyTorch
  • Learning Rate: 2e-5
  • Epochs: 6
  • Batch Size: 16
  • Optimizer: AdamW
  • Training Time: Approximately 2 hours on a GPU

Usage

To use the model for inference, you can use the following code:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the fine-tuned model and tokenizer from Hugging Face
model_name = "kmack/YELP-Review_Classifier"  # Replace with your model name if different
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# List of reviews for prediction
reviews = [
    "The food was absolutely delicious, and the atmosphere was perfect for a family gathering. The staff was friendly, and we had a great time. Definitely coming back!",
    "It was decent, but nothing special. The food was okay, but the service was a bit slow. I think there are better places around.",
    "I had a terrible experience. The waiter was rude, and the food was cold when it arrived. I won't be returning anytime soon."
]

# Map prediction to star ratings
label_map = {
    0: "1 Star",
    1: "2 Stars",
    2: "3 Stars",
    3: "4 Stars",
    4: "5 Stars"
}

# Iterate over each review and get the prediction
for review in reviews:
    # Tokenize the input text
    inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True)

    # Get predictions
    with torch.no_grad():
        outputs = model(**inputs)

    # Get the predicted label (0 to 4 for star ratings)
    prediction = torch.argmax(outputs.logits, dim=-1).item()

    # Map prediction to star rating
    predicted_rating = label_map[prediction]

    print(f"Rating: {predicted_rating}\n")

Citation

If you use this model in your research, please cite the following:

  author = {Kmack},
  title = {YELP-Review_Classifier},
  year = {2024},
  url = {https://huggingface.co./kmack/YELP-Review_Classifier}
}
Downloads last month
170
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for kmack/YELP-Review_Classifier

Finetuned
(7283)
this model

Dataset used to train kmack/YELP-Review_Classifier