IMISLab
/

Greek-Reddit-BERT

+---
+license: apache-2.0
+language:
+- el
+metrics:
+- f1
+- recall
+- precision
+- hamming_loss
+pipeline_tag: text-classification
+widget:
+- text: >-
+    Δεν ξέρω αν είμαι ο μόνος αλλά πιστεύω πως όσο είμαστε απασχολημένοι με την όλη κατάσταση της αστυνομίας η κυβέρνηση προσπαθεί να καλύψει αλλά γεγονότα της επικαιρότητας όπως πανδημία και εξωτερική πολιτική.
+  example_title: Πολιτική
+- text: >-
+    Άλλες οικονομίες, όπως η Κίνα, προσπαθούν να διατηρούν την αξία του νομίσματος τους χαμηλά ώστε να καταστήσουν τις εξαγωγές τους πιο ελκυστικές στο εξωτερικό. Γιατί όμως θεωρούμε πως η πτωτική πορεία της Τουρκικής λίρας είναι η "αχίλλειος πτέρνα" της Τουρκίας;
+  example_title: Οικονομία
+- text: >-
+    Γνωρίζει κανείς γιατί δεν ψηφίζουμε πια για να βγει ποιο τραγούδι θα εκπροσωπήσει την Ελλάδα; Τα τελευταία χρόνια ο κόσμος είναι δυσαρεστημένος με τα τραγούδια που στέλνουν, γιατί συνεχίζεται αυτό;
+  example_title: Ψυχαγωγία/Κουλτούρα
+model-index:
+- name: IMISLab/Greek-Reddit-BERT
+  results:
+  - task:
+      type: text-classification
+      name: Text-classification
+    dataset:
+      name: GreekReddit
+      type: greekreddit
+      config: default
+      split: test
+    metrics:
+    - name: Precision
+      type: precision
+      value: 80.05
+      verified: true
+    - name: Recall
+      type: recall
+      value: 81.48
+      verified: true
+    - name: F1
+      type: f1
+      value: 80.61
+      verified: true
+    - name: Hamming Loss
+      type: hamming_loss
+      value: 19.84
+      verified: true
+datasets:
+- IMISLab/GreekReddit
+library_name: transformers
+tags:
+- Social Media
+- Reddit
+- Topic Classification
+- Text Classification
+- Greek NLP
+---
+# Greek-Reddit-BERT
+A Greek topic classification model based on [GREEK-BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)
+This model is fine-tuned on [GreekReddit](https://huggingface.co/datasets/IMISLab/GreekReddit) as part of our upcoming research paper:
+[Mastrokostas, C., Giarelis, N., & Karacapilidis, N. (2024) Social Media Topic Classification on Greek Reddit]()
+For more information see the evaluation section below.
+## Training dataset
+The training dataset of `Greek-Reddit-BERT` is [GreekReddit](https://huggingface.co/datasets/IMISLab/GreekReddit), which is a topic classification dataset.
+Overall, [GreekReddit](https://huggingface.co/datasets/IMISLab/GreekReddit) contains 6,534 user posts collected from Greek subreddits belonging to various topics (i.e., society, politics, economy, entertainment/culture, sports).
+## Training configuration
+We fine-tuned `nlpaueb/bert-base-greek-uncased-v1` (110 million parameters) on the GreekReddit train split using the following parameters:
+* GPU batch size = 16
+* Total training epochs = 4
+* Learning rate = 5e−5
+* Dropout Rate = 0.1
+* Number of labels = 10
+* No warmup steps
+* 32-bit floating precision
+* Tokenization
+  * maximum input token length = 512
+  * padding = True
+  * truncation = True
+## Evaluation
+**Model**|**Precision**|**Recall**|**F1**|**Hamming Loss**
+------------|-----------|-----------|-----------|-------------
+Greek-Reddit-BERT|80.05|81.48|80.61|19.84
+### Example code
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
+model_name = 'IMISLab/Greek-Reddit-BERT'
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+topic_classifier = pipeline(
+    'text-classification',
+    device = 'cpu',
+    model = model,
+    tokenizer = tokenizer,
+    truncation = True,
+    max_length = 512
+)
+text = 'Άλλες οικονομίες, όπως η Κίνα, προσπαθούν να διατηρούν την αξία του νομίσματος τους χαμηλά ώστε να καταστήσουν τις εξαγωγές τους πιο ελκυστικές στο εξωτερικό. Γιατί όμως θεωρούμε πως η πτωτική πορεία της Τουρκικής λίρας είναι η ""αχίλλειος πτέρνα"" της Τουρκίας;'
+output = topic_classifier(text)
+print(output[0]['label'])
+```
+## Contact
+If you have any questions/feedback about the model please e-mail one of the following authors:
+```
+[email protected]
+[email protected]
+[email protected]
+```
+## Citation
+```
+TBA
+```