cardiffnlp
/

tweet-topic-latest-single

Text Classification

Inference Endpoints

Model card Files Files and versions Community

antypasd commited on Oct 25, 2022

Commit

0ff86a9

·

1 Parent(s): 8584991

Update README.md

Files changed (1) hide show

README.md +53 -29

README.md CHANGED Viewed

@@ -1,46 +1,70 @@
----
-tags:
-- generated_from_keras_callback
-model-index:
-- name: tweet-topic-latest-single
-  results: []
----
-<!-- This model card has been generated automatically according to the information Keras had access to. You should
-probably proofread and complete it, then remove this comment. -->
 # tweet-topic-latest-single
-This model was trained from scratch on an unknown dataset.
-It achieves the following results on the evaluation set:
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- optimizer: None
-- training_precision: float32
-### Training results
-### Framework versions
-- Transformers 4.23.1
-- TensorFlow 2.10.0
-- Tokenizers 0.13.1

 # tweet-topic-latest-single
+This is a RoBERTa-base model trained on 168.86M tweets until the end of September 2022 and finetuned for single-label topic classification on a corpus of 6,997 [tweets](https://huggingface.co/datasets/cardiffnlp/tweet_topic_single).
+The original RoBERTa-base model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-sep2022). This model is suitable for English.
+- Reference Papers: [TimeLMs paper](https://arxiv.org/abs/2202.03829), [TweetTopic](https://arxiv.org/abs/2209.09824)
+- Git Repo: [TimeLMs official repository](https://github.com/cardiffnlp/timelms).
+<b>Labels</b>:
+- 0 -> arts_&_culture;
+- 1 -> business_&_entrepreneurs;
+- 2 -> pop_culture;
+- 3 -> daily_life;
+- 4 -> sports_&_gaming;
+- 5 -> science_&_technology
+## Full classification example
+```python
+from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification
+from transformers import AutoTokenizer
+import numpy as np
+from scipy.special import softmax
+MODEL = f"cardiffnlp/tweet-topic-latest-single"
+tokenizer = AutoTokenizer.from_pretrained(MODEL)
+# PT
+model = AutoModelForSequenceClassification.from_pretrained(MODEL)
+class_mapping = model.config.id2label
+text = "Tesla stock is on the rise!"
+encoded_input = tokenizer(text, return_tensors='pt')
+output = model(**encoded_input)
+scores = output[0][0].detach().numpy()
+scores = softmax(scores)
+# TF
+#model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
+#class_mapping = model.config.id2label
+#text = "Tesla stock is on the rise!"
+#encoded_input = tokenizer(text, return_tensors='tf')
+#output = model(**encoded_input)
+#scores = output[0][0]
+#scores = softmax(scores)
+ranking = np.argsort(scores)
+ranking = ranking[::-1]
+for i in range(scores.shape[0]):
+    l = class_mapping[ranking[i]]
+    s = scores[ranking[i]]
+    print(f"{i+1}) {l} {np.round(float(s), 4)}")
+```
+Output:
+```
+1) business_&_entrepreneurs 0.8929
+2) sports_&_gaming 0.0478
+3) science_&_technology 0.0185
+4) daily_life 0.0178
+5) arts_&_culture 0.0128
+6) pop_culture 0.0102
+```