File size: 3,162 Bytes
42c6f09 64607b6 42c6f09 64607b6 42c6f09 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
---
library_name: keras
---
## Model description
>x100 smaller size vs. DistilBERT with less than 0.5 drop in accuracy evaluated on SST-2 test data
DistilBERT - 92.2, 67M parameters
DistlBiLSTM - 91.7 65.8K paramter
## Intended uses & limitations
More information needed
## Training and evaluation data
Here is the evaluation code
from datasets import load_dataset
import numpy as np
from sklearn.metrics import accuracy_score
from keras.preprocessing.text import Tokenizer
from keras.utils import pad_sequences
import tensorflow as tf
from huggingface_hub import from_pretrained_keras
from datasets import load_dataset
sst2 = load_dataset("SetFit/sst2")
augmented_sst2_dataset = load_dataset("jmamou/augmented-glue-sst2")
pad_type = 'post'
trunc_type = 'post'
#Tokenize our training data
tokenizer = Tokenizer(num_words=MAX_NUM_WORDS)
tokenizer.fit_on_texts(augmented_sst2_dataset['train']['sentence'])
#Encode training data sentences into sequences
test_sequences = tokenizer.texts_to_sequences(sst2['test']['text'])
#Pad the training sequences
test_padded = pad_sequences(test_sequences, padding=pad_type, truncating=trunc_type, maxlen=MAX_LEN)
reloaded_model = from_pretrained_keras('moshew/distilbilstm-finetuned-sst-2-english')
pred=reloaded_model.predict(test_padded)
pred_bin = np.argmax(pred,1)
accuracy_score(pred_bin, sst2['test']['label'])
0.9176276771004942
reloaded_model.summary()
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 64)] 0
embedding (Embedding) (None, 64, 50) 500000
bidirectional (Bidirectiona (None, 64, 128) 58880
l)
bidirectional_1 (Bidirectio (None, 128) 98816
nal)
dropout (Dropout) (None, 128) 0
dense (Dense) (None, 2) 258
=================================================================
Total params: 657,954
Trainable params: 657,954
Non-trainable params: 0
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
| Hyperparameters | Value |
| :-- | :-- |
| name | Adam |
| learning_rate | 0.0010000000474974513 |
| decay | 0.0 |
| beta_1 | 0.8999999761581421 |
| beta_2 | 0.9990000128746033 |
| epsilon | 1e-07 |
| amsgrad | False |
| training_precision | float32 |
## Model Plot
<details>
<summary>View Model Plot</summary>
![Model Image](./model.png)
</details> |