File size: 3,139 Bytes
f411111
 
6c08a56
 
 
 
 
 
 
 
 
f411111
6c08a56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0df6d6c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6c08a56
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
license: other
datasets:
- taishi-i/awesome-japanese-nlp-classification-dataset
language:
- en
- ja
metrics:
- f1
library_name: transformers
pipeline_tag: text-classification
---

# Model overview

This model is the baseline model for [awesome-japanese-nlp-classification-dataset](https://huggingface.co./datasets/taishi-i/awesome-japanese-nlp-classification-dataset). It was trained on this dataset, saved using the development data, and evaluated using the test data. The following table shows the evaluation results.

| Label        | Precision | Recall | F1-Score | Support |
|--------------|-----------|--------|----------|---------|
| 0            | 0.98      | 0.99   | 0.98     | 796     |
| 1            | 0.79      | 0.70   | **0.74** | 60      |
| Accuracy     |           |        | 0.97     | 856     |
| Macro Avg    | 0.89      | 0.84   | 0.86     | 856     |
| Weighted Avg | 0.96      | 0.97   | 0.97     | 856     |


# Usage

Please install the following library.

```bash
pip install transformers
```

You can easily use a classification model with the pipeline method.

```python
from transformers import pipeline

pipe = pipeline(
    "text-classification",
    model="taishi-i/awesome-japanese-nlp-classification-model",
)

# Relevant sample
text = "ディープラーニングによる自然言語処理(共立出版)のサポートページです"
label = pipe(text)
print(label) # [{'label': '1', 'score': 0.9910495281219482}]

# Not Relevant sample
text = "AIイラストを管理するデスクトップアプリ"
label = pipe(text)
print(label) # [{'label': '0', 'score': 0.9986791014671326}]
```

# Evaluation

Please install the following library.

```bash
pip install evaluate scikit-learn datasets transformers torch
```

```python
import evaluate
from datasets import load_dataset
from sklearn.metrics import classification_report
from transformers import pipeline

# Evaluation dataset
dataset = load_dataset("taishi-i/awesome-japanese-nlp-classification-dataset")

# Text classification model
pipe = pipeline(
    "text-classification",
    model="taishi-i/awesome-japanese-nlp-classification-model",
)

# Evaluation metric
f1 = evaluate.load("f1")

# Predict process
predicted_labels = []
for text in dataset["test"]["text"]:
    prediction = pipe(text)
    predicted_label = prediction[0]["label"]
    predicted_labels.append(int(predicted_label))

score = f1.compute(
    predictions=predicted_labels, references=dataset["test"]["label"]
)
print(score)

report = classification_report(
    y_true=dataset["test"]["label"], y_pred=predicted_labels
)
print(report)
```

# License

This model was trained from a dataset collected from the GitHub API under [GitHub Acceptable Use Policies - 7. Information Usage Restrictions](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions) and [GitHub Terms of Service - H. API Terms](https://docs.github.com/en/site-policy/github-terms/github-terms-of-service#h-api-terms). It should be used solely for research verification purposes. Adhering to GitHub's regulations is mandatory.