PaloBERT for Sentiment Analysis

A greek RoBERTa based model (PaloBERT: an updated version of palobert-base-greek-uncased-v1) fine-tuned for sentiment analysis.

Training data

The model is pre-trained on a corpus of 458,293 documents collected from greek social media (Twitter, Instagram, Facebook and YouTube). A RoBERTa tokenizer trained from scratch on the same corpus is also included. The fine-tuning process is done on a dataset of ~60,000 documents, also collected from greek social media.

The corpus as well as the annotated dataset have been provided by Palo LTD.

Requirements

pip install transformers
pip install torch

Pre-processing details

In order to use this model, the text needs to be pre-processed as follows:

  • remove all greek diacritics
  • convert to lowercase
  • remove all punctuation
import re
import unicodedata

def preprocess(text, default_replace=""):
  text = text.lower()
  text = unicodedata.normalize('NFD',text).translate({ord('\N{COMBINING ACUTE ACCENT}'):None})
  text = re.sub(r'[^\w\s]', default_replace, text)
  return text

Load Model

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("pchatz/palobert-base-greek-social-media-v2") #load PaloBERT pre-trained model
language_model = AutoModel.from_pretrained("pchatz/palobert-base-greek-social-media-v2")

Refer to GitHub code for details on ModelClass architecture

model = TheModelClass(*args, **kwargs) #load fine-tuned model as SentimentClassifier_v2
model.load_state_dict(torch.load(PATH))
model.eval()

You can use this sentiment analysis model directly on raw text:

#Example
class_names={0: 'neutral', 1:'positive', 2:'negative'}
text='οι εξετασεις ηταν πολυ καλες' 
encoding=tokenizer(text,return_tensors='pt')

input_ids = encoding['input_ids']
attention_mask = encoding['attention_mask']

output = model(input_ids, attention_mask)
_,prediction = torch.max(output, dim=1)

print(f'sentiment  : {class_names[prediction.item()]}') #positive

Evaluation

For detailed results refer to Thesis: 'Ανάλυση συναισθήματος κειμένου στα Ελληνικά με χρήση Δικτύων Μετασχηματιστών' (version - p2)

Author

Pavlina Chatziantoniou, Georgios Alexandridis and Athanasios Voulodimos

BibTeX entry and citation info

http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18623


@Article{info12080331,
AUTHOR = {Alexandridis, Georgios and Varlamis, Iraklis and Korovesis, Konstantinos and Caridakis, George and Tsantilas, Panagiotis},
TITLE = {A Survey on Sentiment Analysis and Opinion Mining in Greek Social Media},
JOURNAL = {Information},
VOLUME = {12},
YEAR = {2021},
NUMBER = {8},
ARTICLE-NUMBER = {331},
URL = {https://www.mdpi.com/2078-2489/12/8/331},
ISSN = {2078-2489},
DOI = {10.3390/info12080331}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.