poltextlab
/

HunEmBERT3

Text Classification

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

HunEmBERT3 / README.md

poltextlab's picture

Update README.md

6b10681 verified 7 months ago

|

history blame contribute delete

No virus

2.65 kB

	---
	license: apache-2.0
	language:
	- hu
	metrics:
	- accuracy
	model-index:
	- name: huBERTPlain
	results:
	- task:
	type: text-classification
	metrics:
	- type: f1
	value: 0.91
	widget:
	- text: "A vegetációs időben az országban rendszeresen jelentkező jégesők ellen is van mód védekezni lokálisan, ki-ki a saját nagy értékű ültetvényén."
	example_title: "Positive"

	- text: "Magyarország több évtizede küzd demográfiai válsággal, és egyre több gyermekre vágyó pár meddőségi problémákkal néz szembe."
	exmaple_title: "Negative"

	- text: "Tisztelt fideszes, KDNP-s Képviselőtársaim!"
	example_title: "Neutral"

	---

	## Model description

	Cased fine-tuned BERT model for Hungarian, trained on (manually annotated) parliamentary pre-agenda speeches scraped from `parlament.hu`.

	## Intended uses & limitations

	The model can be used as any other (cased) BERT model. It has been tested recognizing positive, negative, and neutral sentences in (parliamentary) pre-agenda speeches, where:
	* 'Label_0': Neutral
	* 'Label_1': Positive
	* 'Label_2': Negative

	## Training

	The fine-tuned version of the original huBERT model (`SZTAKI-HLT/hubert-base-cc`), trained on HunEmPoli corpus.

	\| Category \| Count \| Ratio \| Sentiment \| Count \| Ratio \|
	\| -------- \| ----- \| ------ \| --------- \| ----- \| ------ \|
	\| Neutral \| 351 \| 1.85% \| Neutral \| 351 \| 1.85% \|
	\| Fear \| 162 \| 0.85% \| Negative \| 11180 \| 58.84% \|
	\| Sadness \| 4258 \| 22.41% \|
	\| Anger \| 643 \| 3.38% \|
	\| Disgust \| 6117 \| 32.19% \|
	\| Success \| 6602 \| 34.74% \| Positive \| 7471 \| 39.32% \|
	\| Joy \| 441 \| 2.32% \|
	\| Trust \| 428 \| 2.25% \|
	\| Sum \| 19002 \| \| \| \| \|

	## Eval results

	\| Class \| Precision \| Recall \| F-Score \|
	\|-----\|------------\|------------\|------\|
	\|Neutral\|0.83\|0.71\|0.76\|
	\|Positive\|0.87\|0.91\|0.9\|
	\|Negative\|0.94\|0.91\|0.93\|
	\|Macro AVG\|0.88\|0.85\|0.86\|
	\|Weighted WVG\|0.91\|0.91\|0.91\|


	## Usage

	```py
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("poltextlab/HunEmBERT3")
	model = AutoModelForSequenceClassification.from_pretrained("poltextlab/HunEmBERT3")
	```

	### BibTeX entry and citation info

	If you use the model, please cite the following paper:

	Bibtex:
	```bibtex
	@ARTICLE{10149341,
	author={{"U}veges, Istv{\'a}n and Ring, Orsolya},
	journal={IEEE Access},
	title={HunEmBERT: a fine-tuned BERT-model for classifying sentiment and emotion in political communication},
	year={2023},
	volume={11},
	number={},
	pages={60267-60278},
	doi={10.1109/ACCESS.2023.3285536}
	}
	```