Finetuned BERT model for classifying community posts

This distilbert model was fine-tuned on ~20.000 community postings using the HuggingFace adapter from Kern AI refinery. The postings consist of comments from various forums and social media sites. For the finetuning, a single NVidia K80 was used for about two hours.

Join our Discord if you have questions about this model: https://discord.gg/MdZyqSxKbe

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a language model introduced by Google researchers in 2018. It’s designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers2.

BERT is based on the transformer architecture and uses WordPiece to convert each English word into an integer code. This model has a classification head on top of it, which means that this BERT model is specifically made for text classification.

DISCLAIMER: Currently, the model has a slight bias towards neutral and positive predictions.

Features

The model can handle various text classification tasks, especially when it comes to postings made in forums and community sites.
The output of the model are the three classes "positive", "neutral" and "negative" plus the models respective confidence score of the class.
The model was fine-tuned on a custom datasets that was curated by Kern AI and labeled in our tool refinery.
The model is currently supported by the PyTorch framework and can be easily deployed on various platforms using the HuggingFace Pipeline API.

Usage

To use the model, you need to install the HuggingFace Transformers library:

pip install transformers

Then you can load the model and the tokenizer from the HuggingFace Hub:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("KernAI/community-sentiment-bert")
tokenizer = AutoTokenizer.from_pretrained("KernAI/community-sentiment-bert")

To classify a single sentence or a sentence pair, you can use the HuggingFace Pipeline API:

from transformers import pipeline

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
result = classifier("This is a positive sentence.")
print(result)
# [{'label': 'Positive', 'score': 0.9998656511306763}]