pchatz
/

palobert-base-greek-social-media-sentiment-v2

Text Classification

Model card Files Files and versions Community

palobert-base-greek-social-media-sentiment-v2 / README.md

pchatz's picture

Update README.md

2fc56ec almost 2 years ago

|

3.46 kB

	---
	language:
	- el
	pipeline_tag: text-classification
	---
	# PaloBERT for Sentiment Analysis

	A greek [RoBERTa](https://arxiv.org/abs/1907.11692) based model ([PaloBERT](https://huggingface.co./pchatz/palobert-base-greek-social-media): an updated version of [palobert-base-greek-uncased-v1](https://huggingface.co./gealexandri/palobert-base-greek-uncased-v1)) fine-tuned for sentiment analysis.

	## Training data

	The model is pre-trained on a corpus of 458,293 documents collected from greek social media (Twitter, Instagram, Facebook and YouTube). A RoBERTa tokenizer trained from scratch on the same corpus is also included. The fine-tuning process is done on a dataset of ~60,000 documents, also collected from greek social media.

	The corpus as well as the annotated dataset have been provided by [Palo LTD](http://www.paloservices.com/).

	## Requirements

	```
	pip install transformers
	pip install torch

	```

	## Pre-processing details

	In order to use this model, the text needs to be pre-processed as follows:

	* remove all greek diacritics
	* convert to lowercase
	* remove all punctuation

	```python
	import re
	import unicodedata

	def preprocess(text, default_replace=""):
	text = text.lower()
	text = unicodedata.normalize('NFD',text).translate({ord('\N{COMBINING ACUTE ACCENT}'):None})
	text = re.sub(r'[^\w\s]', default_replace, text)
	return text
	```

	## Load Model

	```python
	from transformers import AutoTokenizer, AutoModel

	tokenizer = AutoTokenizer.from_pretrained("pchatz/palobert-base-greek-social-media-v2") #load PaloBERT pre-trained model
	language_model = AutoModel.from_pretrained("pchatz/palobert-base-greek-social-media-v2")
	```
	Refer to [GitHub](https://github.com/Paulinechatz/sentiment-analysis-greek-social-media/blob/main/code/train_classifier_roberta_arch.py#L100) code for details on ModelClass architecture
	```python
	model = TheModelClass(args, *kwargs) #load fine-tuned model as SentimentClassifier_v2
	model.load_state_dict(torch.load(PATH))
	model.eval()
	```
	You can use this sentiment analysis model directly on raw text:
	```python
	#Example
	class_names={0: 'neutral', 1:'positive', 2:'negative'}
	text='οι εξετασεις ηταν πολυ καλες'
	encoding=tokenizer(text,return_tensors='pt')

	input_ids = encoding['input_ids']
	attention_mask = encoding['attention_mask']

	output = model(input_ids, attention_mask)
	_,prediction = torch.max(output, dim=1)

	print(f'sentiment : {class_names[prediction.item()]}') #positive
	```

	## Evaluation

	For detailed results refer to Thesis: ['Ανάλυση συναισθήματος κειμένου στα Ελληνικά με χρήση Δικτύων Μετασχηματιστών'](http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18623) (version - p2)

	## Author

	[Pavlina Chatziantoniou](https://huggingface.co./pchatz), [Georgios Alexandridis](https://huggingface.co./gealexandri) and Athanasios Voulodimos

	## BibTeX entry and citation info

	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18623

	```bibtex

	@Article{info12080331,
	AUTHOR = {Alexandridis, Georgios and Varlamis, Iraklis and Korovesis, Konstantinos and Caridakis, George and Tsantilas, Panagiotis},
	TITLE = {A Survey on Sentiment Analysis and Opinion Mining in Greek Social Media},
	JOURNAL = {Information},
	VOLUME = {12},
	YEAR = {2021},
	NUMBER = {8},
	ARTICLE-NUMBER = {331},
	URL = {https://www.mdpi.com/2078-2489/12/8/331},
	ISSN = {2078-2489},
	DOI = {10.3390/info12080331}
	}
	```