alespalla commited on
Commit
3097a8c
·
1 Parent(s): 38e80cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -6
README.md CHANGED
@@ -8,14 +8,17 @@ datasets:
8
  model-index:
9
  - name: distillbert_conv_quality_score
10
  results: []
 
 
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information Keras had access to. You should
14
- probably proofread and complete it, then remove this comment. -->
15
 
16
  # distillbert_conv_quality_score
17
 
18
  This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the conv_ai_2 dataset.
 
 
 
19
  It achieves the following results on the evaluation set:
20
  - training/loss: 0.0165
21
  - validation/loss: 0.0149
@@ -24,13 +27,60 @@ It achieves the following results on the evaluation set:
24
 
25
  More information needed
26
 
27
- ## Intended uses & limitations
28
 
29
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ## Training and evaluation data
32
 
33
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ## Training procedure
36
 
@@ -76,4 +126,4 @@ The following hyperparameters were used during training:
76
 
77
  - Transformers 4.26.1
78
  - Datasets 2.10.1
79
- - Tokenizers 0.13.2
 
8
  model-index:
9
  - name: distillbert_conv_quality_score
10
  results: []
11
+ language:
12
+ - en
13
  ---
14
 
 
 
15
 
16
  # distillbert_conv_quality_score
17
 
18
  This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the conv_ai_2 dataset.
19
+ It was trained to Generate a score from a conversation. The score is a float between 0 and 1.
20
+
21
+
22
  It achieves the following results on the evaluation set:
23
  - training/loss: 0.0165
24
  - validation/loss: 0.0149
 
27
 
28
  More information needed
29
 
30
+ ## Usage
31
 
32
+ ```python
33
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
34
+
35
+ model_name = "alespalla/distillbert_conv_quality_score"
36
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
37
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
38
+
39
+ conversation = '''
40
+ Q: Begin
41
+ A: lol ! do you think it is strange to feel like you have been through life before ?
42
+ Q: Hellow
43
+ A: I don't understand you 🙈. Also, try to guess: i like to ...
44
+ Q: How are you?
45
+ A: make time stop, funny you :)
46
+ Q: What is your name?
47
+ A: jessie. hows your day going ? 😃
48
+ '''
49
+
50
+ score = model(**tokenizer(conversation, return_tensors='pt')).logits.item()
51
+ print(f"Score: {score}")
52
+ ```
53
 
54
  ## Training and evaluation data
55
 
56
+ The training data was generated from `conv_ai_2` using the following function
57
+
58
+ ```python
59
+
60
+ from datasets import load_dataset
61
+
62
+ def get_dataset(regression=False):
63
+
64
+ db = load_dataset("conv_ai_2")
65
+
66
+ def generate_converation(elem):
67
+ text = ""
68
+ for idx, txt in enumerate(elem["dialog"]):
69
+ if idx % 2:
70
+ text += f"A: {txt['text']}\n"
71
+ else:
72
+ text += f"Q: {txt['text']}\n"
73
+ if regression:
74
+ return {'text': text, "labels": (elem['eval_score'] - 1)/4}
75
+ return {'text': text, "labels": elem['eval_score'] - 1}
76
+
77
+ db = db.filter(lambda example: example["eval_score"] > 0)
78
+ db = db.map(generate_converation, remove_columns=db['train'].column_names)
79
+ db = db['train'].train_test_split(test_size=0.2).shuffle(42)
80
+
81
+ return db
82
+
83
+ ```
84
 
85
  ## Training procedure
86
 
 
126
 
127
  - Transformers 4.26.1
128
  - Datasets 2.10.1
129
+ - Tokenizers 0.13.2