|
--- |
|
license: other |
|
datasets: |
|
- taishi-i/awesome-japanese-nlp-classification-dataset |
|
language: |
|
- en |
|
- ja |
|
metrics: |
|
- f1 |
|
library_name: transformers |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# Model overview |
|
|
|
This model is the baseline model for [awesome-japanese-nlp-classification-dataset](https://huggingface.co./datasets/taishi-i/awesome-japanese-nlp-classification-dataset). It was trained on this dataset, saved using the development data, and evaluated using the test data. The following table shows the evaluation results. |
|
|
|
| Label | Precision | Recall | F1-Score | Support | |
|
|--------------|-----------|--------|----------|---------| |
|
| 0 | 0.98 | 0.99 | 0.98 | 796 | |
|
| 1 | 0.79 | 0.70 | **0.74** | 60 | |
|
| Accuracy | | | 0.97 | 856 | |
|
| Macro Avg | 0.89 | 0.84 | 0.86 | 856 | |
|
| Weighted Avg | 0.96 | 0.97 | 0.97 | 856 | |
|
|
|
|
|
# Usage |
|
|
|
Please install the following library. |
|
|
|
```bash |
|
pip install transformers |
|
``` |
|
|
|
You can easily use a classification model with the pipeline method. |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
pipe = pipeline( |
|
"text-classification", |
|
model="taishi-i/awesome-japanese-nlp-classification-model", |
|
) |
|
|
|
# Relevant sample |
|
text = "ディープラーニングによる自然言語処理(共立出版)のサポートページです" |
|
label = pipe(text) |
|
print(label) # [{'label': '1', 'score': 0.9910495281219482}] |
|
|
|
# Not Relevant sample |
|
text = "AIイラストを管理するデスクトップアプリ" |
|
label = pipe(text) |
|
print(label) # [{'label': '0', 'score': 0.9986791014671326}] |
|
``` |
|
|
|
# Evaluation |
|
|
|
Please install the following library. |
|
|
|
```bash |
|
pip install evaluate scikit-learn datasets transformers torch |
|
``` |
|
|
|
```python |
|
import evaluate |
|
from datasets import load_dataset |
|
from sklearn.metrics import classification_report |
|
from transformers import pipeline |
|
|
|
# Evaluation dataset |
|
dataset = load_dataset("taishi-i/awesome-japanese-nlp-classification-dataset") |
|
|
|
# Text classification model |
|
pipe = pipeline( |
|
"text-classification", |
|
model="taishi-i/awesome-japanese-nlp-classification-model", |
|
) |
|
|
|
# Evaluation metric |
|
f1 = evaluate.load("f1") |
|
|
|
# Predict process |
|
predicted_labels = [] |
|
for text in dataset["test"]["text"]: |
|
prediction = pipe(text) |
|
predicted_label = prediction[0]["label"] |
|
predicted_labels.append(int(predicted_label)) |
|
|
|
score = f1.compute( |
|
predictions=predicted_labels, references=dataset["test"]["label"] |
|
) |
|
print(score) |
|
|
|
report = classification_report( |
|
y_true=dataset["test"]["label"], y_pred=predicted_labels |
|
) |
|
print(report) |
|
``` |
|
|
|
# License |
|
|
|
This model was trained from a dataset collected from the GitHub API under [GitHub Acceptable Use Policies - 7. Information Usage Restrictions](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions) and [GitHub Terms of Service - H. API Terms](https://docs.github.com/en/site-policy/github-terms/github-terms-of-service#h-api-terms). It should be used solely for research verification purposes. Adhering to GitHub's regulations is mandatory. |
|
|