File size: 3,397 Bytes
42c6f09 cd4126b 42c6f09 cd4126b 103d8d9 cd4126b aeecab3 cd4126b 51ea164 cd4126b 51ea164 72f3651 cd4126b 64607b6 51ea164 cd4126b 42c6f09 49039c7 42c6f09 cd4126b 2e177e9 42c6f09 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
---
library_name: keras
---
x100 smaller with less than 0.5 accuracy drop vs. distilbert-base-uncased-finetuned-sst-2-english
## Model description
2 Layers Bilstm model finetuned on SST-2 and distlled from RoBERTa teacher
distilbert-base-uncased-finetuned-sst-2-english: 92.2 accuracy, 67M parameters
moshew/distilbilstm-finetuned-sst-2-english: 91.9 accuracy, 0.66M parameters
## How to get started with the model
Example on SST-2 test dataset classification:
```python
!pip install datasets
from datasets import load_dataset
import numpy as np
from sklearn.metrics import accuracy_score
from keras.preprocessing.text import Tokenizer
from keras.utils import pad_sequences
import tensorflow as tf
from huggingface_hub import from_pretrained_keras
from datasets import load_dataset
sst2 = load_dataset("SetFit/sst2")
augmented_sst2_dataset = load_dataset("jmamou/augmented-glue-sst2")
# Tokenize our training data
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(augmented_sst2_dataset['train']['sentence'])
# Encode test data sentences into sequences
test_sequences = tokenizer.texts_to_sequences(sst2['test']['text'])
# Pad the test sequences
test_padded = pad_sequences(test_sequences, padding = 'post', truncating = 'post', maxlen=64)
reloaded_model = from_pretrained_keras('moshew/distilbilstm-finetuned-sst-2-english')
#Evaluate model on SST2 test data (GLUE)
pred=reloaded_model.predict(test_padded)
pred_bin = np.argmax(pred,1)
accuracy_score(pred_bin, sst2['test']['label'])
0.9187259747391543
reloaded_model.summary()
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 64)] 0
embedding (Embedding) (None, 64, 50) 500000
bidirectional (Bidirectiona (None, 64, 128) 58880
l)
bidirectional_1 (Bidirectio (None, 128) 98816
nal)
dropout (Dropout) (None, 128) 0
dense (Dense) (None, 2) 258
=================================================================
Total params: 657,954
Trainable params: 657,954
Non-trainable params: 0
_________________________________________________________________
```
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
| Hyperparameters | Value |
| :-- | :-- |
| name | Adam |
| learning_rate | 0.0010000000474974513 |
| decay | 0.0 |
| beta_1 | 0.8999999761581421 |
| beta_2 | 0.9990000128746033 |
| epsilon | 1e-07 |
| amsgrad | False |
| training_precision | float32 |
## Model Plot
<details>
<summary>View Model Plot</summary>
![Model Image](./model.png)
</details> |