File size: 3,400 Bytes
42c6f09
d796fa6
42c6f09
 
cd4126b
 
42c6f09
 
cd4126b
 
 
103d8d9
cd4126b
 
 
 
 
 
aeecab3
cd4126b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51ea164
cd4126b
 
51ea164
72f3651
cd4126b
 
64607b6
51ea164
cd4126b
 
 
42c6f09
49039c7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42c6f09
cd4126b
2e177e9
42c6f09
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
library_name: tf-keras
---

x100 smaller with less than 0.5 accuracy drop vs. distilbert-base-uncased-finetuned-sst-2-english

## Model description

2 Layers Bilstm model finetuned on SST-2 and distlled from RoBERTa teacher  

distilbert-base-uncased-finetuned-sst-2-english:    92.2 accuracy, 67M parameters  
moshew/distilbilstm-finetuned-sst-2-english:        91.9 accuracy, 0.66M parameters

## How to get started with the model

Example on SST-2 test dataset classification:
​​
```python
!pip install datasets
from datasets import load_dataset  
import numpy as np  
from sklearn.metrics import accuracy_score  
from keras.preprocessing.text import Tokenizer
from keras.utils import pad_sequences
import tensorflow as tf
from huggingface_hub import from_pretrained_keras

from datasets import load_dataset
sst2 = load_dataset("SetFit/sst2")
augmented_sst2_dataset = load_dataset("jmamou/augmented-glue-sst2")

# Tokenize our training data
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(augmented_sst2_dataset['train']['sentence'])

# Encode test data sentences into sequences
test_sequences = tokenizer.texts_to_sequences(sst2['test']['text'])

# Pad the test sequences
test_padded = pad_sequences(test_sequences, padding = 'post', truncating = 'post', maxlen=64)

reloaded_model = from_pretrained_keras('moshew/distilbilstm-finetuned-sst-2-english')

#Evaluate model on SST2 test data (GLUE)
pred=reloaded_model.predict(test_padded)
pred_bin = np.argmax(pred,1)
accuracy_score(pred_bin, sst2['test']['label'])

0.9187259747391543

reloaded_model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 64)]              0         
                                                                 
 embedding (Embedding)       (None, 64, 50)            500000    
                                                                 
 bidirectional (Bidirectiona  (None, 64, 128)          58880     
 l)                                                              
                                                                 
 bidirectional_1 (Bidirectio  (None, 128)              98816     
 nal)                                                            
                                                                 
 dropout (Dropout)           (None, 128)               0         
                                                                 
 dense (Dense)               (None, 2)                 258       
                                                                 
=================================================================
Total params: 657,954
Trainable params: 657,954
Non-trainable params: 0
_________________________________________________________________

```

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:

| Hyperparameters | Value |
| :-- | :-- |
| name | Adam |
| learning_rate | 0.0010000000474974513 |
| decay | 0.0 |
| beta_1 | 0.8999999761581421 |
| beta_2 | 0.9990000128746033 |
| epsilon | 1e-07 |
| amsgrad | False |
| training_precision | float32 |


 ## Model Plot

<details>
<summary>View Model Plot</summary>

![Model Image](./model.png)

</details>