The model is a port of our CommentBERT model from the paper:

@inproceedings{ochodek2022automated,
  title={Automated code review comment classification to improve modern code reviews},
  author={Ochodek, Miroslaw and Staron, Miroslaw and Meding, Wilhelm and S{\"o}der, Ola},
  booktitle={International Conference on Software Quality},
  pages={23--40},
  year={2022},
  organization={Springer}
}

The original model was implemented in Keras with two outputs - comment-purpose and subject-purpose. Here, we divided it into two separate model with one output each.


license: apache-2.0

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from scipy.special import softmax

checkpoint = 'mochodek/bert4comment-purpose'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

id2class = {
    0: 'discussion_participation',
    1: 'discussion_trigger',
    2: 'change_request',
    3: 'acknowledgement',
    4: 'same_as'
}

text = "Please, make constant from that string"
encoded_input = tokenizer(text, return_tensors='pt')

output = model(**encoded_input)

scores = softmax(output.logits.detach().numpy())

id2class[np.argmax(scores)]
Downloads last month
18
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.