alespalla
/

distillbert_conv_quality_score

Text Classification

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

alespalla commited on Mar 11, 2023

Commit

3097a8c

·

1 Parent(s): 38e80cc

Update README.md

Files changed (1) hide show

README.md +56 -6

README.md CHANGED Viewed

@@ -8,14 +8,17 @@ datasets:
 model-index:
 - name: distillbert_conv_quality_score
   results: []
 ---
-<!-- This model card has been generated automatically according to the information Keras had access to. You should
-probably proofread and complete it, then remove this comment. -->
 # distillbert_conv_quality_score
 This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the conv_ai_2 dataset.
 It achieves the following results on the evaluation set:
 - training/loss: 0.0165
 - validation/loss: 0.0149
@@ -24,13 +27,60 @@ It achieves the following results on the evaluation set:
 More information needed
-## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -76,4 +126,4 @@ The following hyperparameters were used during training:
 - Transformers 4.26.1
 - Datasets 2.10.1
-- Tokenizers 0.13.2

 model-index:
 - name: distillbert_conv_quality_score
   results: []
+language:
+- en
 ---
 # distillbert_conv_quality_score
 This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the conv_ai_2 dataset.
+It was trained to Generate a score from a conversation. The score is a float between 0 and 1.
 It achieves the following results on the evaluation set:
 - training/loss: 0.0165
 - validation/loss: 0.0149
 More information needed
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+model_name = "alespalla/distillbert_conv_quality_score"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+conversation = '''
+Q: Begin
+A: lol ! do you think it is strange to feel like you have been through life before ?
+Q: Hellow
+A: I don't understand you 🙈. Also, try to guess: i like to ...
+Q: How are you?
+A: make time stop, funny you :)
+Q: What is your name?
+A: jessie. hows your day going ? 😃
+'''
+score = model(**tokenizer(conversation, return_tensors='pt')).logits.item()
+print(f"Score: {score}")
+```
 ## Training and evaluation data
+The training data was generated from `conv_ai_2` using the following function
+```python
+from datasets import load_dataset
+def get_dataset(regression=False):
+    db = load_dataset("conv_ai_2")
+    def generate_converation(elem):
+        text = ""
+        for idx, txt in enumerate(elem["dialog"]):
+            if idx % 2:
+                text += f"A: {txt['text']}\n"
+            else:
+                text += f"Q: {txt['text']}\n"
+        if regression:
+            return {'text': text, "labels": (elem['eval_score'] - 1)/4}
+        return {'text': text, "labels": elem['eval_score'] - 1}
+    db = db.filter(lambda example: example["eval_score"] > 0)
+    db = db.map(generate_converation, remove_columns=db['train'].column_names)
+    db = db['train'].train_test_split(test_size=0.2).shuffle(42)
+    return db
+```
 ## Training procedure
 - Transformers 4.26.1
 - Datasets 2.10.1
+- Tokenizers 0.13.2