Chi Honolulu commited on
Commit
dfe4529
·
1 Parent(s): add1be5

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ license: mit
5
+ language:
6
+ - cs
7
+ ---
8
+ # Model Card for robeczech-base-binary-cs-iib
9
+
10
+ <!-- Provide a quick summary of what the model is/does. -->
11
+
12
+ This model is fine-tuned for binary text classification of Supportive Interactions in Instant Messenger dialogs of Adolescents in Czech.
13
+
14
+ ## Model Details
15
+
16
+ ### Model Description
17
+
18
+ The model was fine-tuned on a Czech dataset of Instant Messenger dialogs of Adolescents. The classification is binary and the model outputs probablities for labels {0,1}: Supportive Interactions present or not.
19
+
20
+ - **Developed by:** Anonymous
21
+ - **Language(s):** cs
22
+ - **Finetuned from:** ufal/robeczech-base
23
+
24
+ ### Model Sources
25
+
26
+ <!-- Provide the basic links for the model. -->
27
+
28
+ - **Repository:** https://github.com/chi2024submission
29
+ - **Paper:** Stay tuned!
30
+
31
+ ## Usage
32
+ Here is how to use this model to classify a context-window of a dialogue:
33
+
34
+ ```python
35
+ import numpy as np
36
+ from transformers import AutoTokenizer, RobertaForSequenceClassification
37
+
38
+ # Prepare input texts. This model is pretrained and fine-tuned for Czech
39
+ test_texts = ['Utterance1;Utterance2;Utterance3']
40
+
41
+
42
+ # Load the model and tokenizer
43
+ model = RobertaForSequenceClassification.from_pretrained('chi2024/robeczech-base-binary-cs-iib',
44
+ num_labels=2).to("cuda")
45
+ tokenizer = AutoTokenizer.from_pretrained('chi2024/robeczech-base-binary-cs-iib', use_fast=False, truncation_side='left')
46
+ assert tokenizer.truncation_side == 'left'
47
+
48
+ # Define helper functions
49
+ def get_probs(text, tokenizer, model):
50
+ inputs = tokenizer(text, padding=True, truncation=True, max_length=256, return_tensors="pt").to("cuda")
51
+ outputs = model(**inputs)
52
+ return outputs[0].softmax(1)
53
+
54
+ def preds2class(probs, threshold=0.5):
55
+ pclasses = np.zeros(probs.shape)
56
+ pclasses[np.where(probs >= threshold)] = 1
57
+ return pclasses.argmax(-1)
58
+
59
+ def print_predictions(texts):
60
+ probabilities = [get_probs(texts[i], tokenizer, model).cpu().detach().numpy()[0] for i in
61
+ range(len(texts))]
62
+ predicted_classes = preds2class(np.array(probabilities))
63
+ for c, p in zip(predicted_classes, probabilities):
64
+ print(f'{c}: {p}')
65
+
66
+ # Run the prediction
67
+ print_predictions(test_texts)
68
+ ```