Edit model card

GitHub issues classifier (using zero shot classification)

Predicts wether a statement is a feature request, issue/bug or question

This model was trained using the Zero-shot classifier distillation method with the BART-large-mnli model as teacher model, to train a classifier on Github issues from the Github Issues Prediction dataset

Labels

As per the dataset Kaggle competition, the classifier predicts wether an issue is a bug, feature or question. After playing around with different labels pre-training I've used a different mapping of labels that yielded better predictions (see notebook here for details), labels being

  • issue
  • feature request
  • question

Training data

  • 15k of Github issues titles ("unlabeled_titles_simple.txt")
  • Hypothesis used: "This request is a {}"
  • Teacher model used: valhalla/distilbart-mnli-12-1
  • Studend model used: distilbert-base-uncased

Results

Agreement of student and teacher predictions: 94.82%

See this notebook for more info on feature engineering choice made

How to train using your own dataset

Acknowledgements

Downloads last month
7,191
Inference Examples
Inference API (serverless) is not available, repository is disabled.