taishi-i
/

awesome-japanese-nlp-classification-model

Text Classification

Inference Endpoints

Model card Files Files and versions Community

awesome-japanese-nlp-classification-model / README.md

taishi-i's picture

add evaluation script to README.md

0df6d6c over 1 year ago

|

history blame contribute delete

3.14 kB

	---
	license: other
	datasets:
	- taishi-i/awesome-japanese-nlp-classification-dataset
	language:
	- en
	- ja
	metrics:
	- f1
	library_name: transformers
	pipeline_tag: text-classification
	---

	# Model overview

	This model is the baseline model for [awesome-japanese-nlp-classification-dataset](https://huggingface.co./datasets/taishi-i/awesome-japanese-nlp-classification-dataset). It was trained on this dataset, saved using the development data, and evaluated using the test data. The following table shows the evaluation results.

	\| Label \| Precision \| Recall \| F1-Score \| Support \|
	\|--------------\|-----------\|--------\|----------\|---------\|
	\| 0 \| 0.98 \| 0.99 \| 0.98 \| 796 \|
	\| 1 \| 0.79 \| 0.70 \| 0.74 \| 60 \|
	\| Accuracy \| \| \| 0.97 \| 856 \|
	\| Macro Avg \| 0.89 \| 0.84 \| 0.86 \| 856 \|
	\| Weighted Avg \| 0.96 \| 0.97 \| 0.97 \| 856 \|


	# Usage

	Please install the following library.

	```bash
	pip install transformers
	```

	You can easily use a classification model with the pipeline method.

	```python
	from transformers import pipeline

	pipe = pipeline(
	"text-classification",
	model="taishi-i/awesome-japanese-nlp-classification-model",
	)

	# Relevant sample
	text = "ディープラーニングによる自然言語処理（共立出版）のサポートページです"
	label = pipe(text)
	print(label) # [{'label': '1', 'score': 0.9910495281219482}]

	# Not Relevant sample
	text = "AIイラストを管理するデスクトップアプリ"
	label = pipe(text)
	print(label) # [{'label': '0', 'score': 0.9986791014671326}]
	```

	# Evaluation

	Please install the following library.

	```bash
	pip install evaluate scikit-learn datasets transformers torch
	```

	```python
	import evaluate
	from datasets import load_dataset
	from sklearn.metrics import classification_report
	from transformers import pipeline

	# Evaluation dataset
	dataset = load_dataset("taishi-i/awesome-japanese-nlp-classification-dataset")

	# Text classification model
	pipe = pipeline(
	"text-classification",
	model="taishi-i/awesome-japanese-nlp-classification-model",
	)

	# Evaluation metric
	f1 = evaluate.load("f1")

	# Predict process
	predicted_labels = []
	for text in dataset["test"]["text"]:
	prediction = pipe(text)
	predicted_label = prediction[0]["label"]
	predicted_labels.append(int(predicted_label))

	score = f1.compute(
	predictions=predicted_labels, references=dataset["test"]["label"]
	)
	print(score)

	report = classification_report(
	y_true=dataset["test"]["label"], y_pred=predicted_labels
	)
	print(report)
	```

	# License

	This model was trained from a dataset collected from the GitHub API under [GitHub Acceptable Use Policies - 7. Information Usage Restrictions](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions) and [GitHub Terms of Service - H. API Terms](https://docs.github.com/en/site-policy/github-terms/github-terms-of-service#h-api-terms). It should be used solely for research verification purposes. Adhering to GitHub's regulations is mandatory.