File size: 4,183 Bytes
5657307
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# Using the `evaluator` with custom pipelines

The evaluator is designed to work with `transformer` pipelines out-of-the-box. However, in many cases you might have a model or pipeline that's not part of the `transformer` ecosystem. You can still use `evaluator` to easily compute metrics for them. In this guide we show how to do this for a Scikit-Learn [pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline) and a Spacy [pipeline](https://spacy.io). Let's start with the Scikit-Learn case.

## Scikit-Learn

First we need to train a model. We'll train a simple text classifier on the [IMDb dataset](https://huggingface.co/datasets/imdb), so let's start by downloading the dataset:

```py
from datasets import load_dataset

ds = load_dataset("imdb")
```

Then we can build a simple TF-IDF preprocessor and Naive Bayes classifier wrapped in a `Pipeline`:

```py
from sklearn.pipeline import Pipeline
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer

text_clf = Pipeline([
        ('vect', CountVectorizer()),
        ('tfidf', TfidfTransformer()),
        ('clf', MultinomialNB()),
])

text_clf.fit(ds["train"]["text"], ds["train"]["label"])
```

Following the convention in the `TextClassificationPipeline` of `transformers` our pipeline should be callable and return a list of dictionaries. In addition we use the `task` attribute to check if the pipeline is compatible with the `evaluator`. We can write a small wrapper class for that purpose:

```py
class ScikitEvalPipeline:
    def __init__(self, pipeline):
        self.pipeline = pipeline
        self.task = "text-classification"

    def __call__(self, input_texts, **kwargs):
        return [{"label": p} for p in self.pipeline.predict(input_texts)]

pipe = ScikitEvalPipeline(text_clf)
```

We can now pass this `pipeline` to the `evaluator`:

```py
from evaluate import evaluator

task_evaluator = evaluator("text-classification")
task_evaluator.compute(pipe, ds["test"], "accuracy")

>>> {'accuracy': 0.82956}
```

Implementing that simple wrapper is all that's needed to use any model from any framework with the `evaluator`. In the `__call__` you can implement all logic necessary for efficient forward passes through your model.

## Spacy

We'll use the `polarity` feature of the `spacytextblob` project to get a simple sentiment analyzer. First you'll need to install the project and download the resources:

```bash
pip install spacytextblob
python -m textblob.download_corpora
python -m spacy download en_core_web_sm
```

Then we can simply load the `nlp` pipeline and add the `spacytextblob` pipeline:
```py
import spacy

nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob')
```

This snippet shows how we can use the `polarity` feature added with `spacytextblob` to get the sentiment of a text:

```py
texts = ["This movie is horrible", "This movie is awesome"]
results = nlp.pipe(texts)

for txt, res in zip(texts, results):
    print(f"{text} | Polarity: {res._.blob.polarity}")
```

Now we can wrap it in a simple wrapper class like in the Scikit-Learn example before. It just has to return a list of dictionaries with the predicted lables. If the polarity is larger than 0 we'll predict positive sentiment and negative otherwise:

```py
class SpacyEvalPipeline:
    def __init__(self, nlp):
        self.nlp = nlp
        self.task = "text-classification"

    def __call__(self, input_texts, **kwargs):
        results =[]
        for p in self.nlp.pipe(input_texts):
            if p._.blob.polarity>=0:
                results.append({"label": 1})
            else:
                results.append({"label": 0})
        return results

pipe = SpacyEvalPipeline(nlp)
```

That class is compatible with the `evaluator` and we can use the same instance from the previous examlpe along with the IMDb test set:

```py
eval.compute(pipe, ds["test"], "accuracy")
>>> {'accuracy': 0.6914}
```

This will take a little longer than the Scikit-Learn example but after roughly 10-15min you will have the evaluation results!