library_name: keras
Model description
x100 smaller size vs. DistilBERT with less than 0.5 drop in accuracy evaluated on SST-2 test data
DistilBERT - 92.2, 67M parameters DistlBiLSTM - 91.7 65.8K paramter
Intended uses & limitations
More information needed
Training and evaluation data
Here is the evaluation code from datasets import load_dataset import numpy as np from sklearn.metrics import accuracy_score
from keras.preprocessing.text import Tokenizer from keras.utils import pad_sequences import tensorflow as tf from huggingface_hub import from_pretrained_keras
from datasets import load_dataset sst2 = load_dataset("SetFit/sst2") augmented_sst2_dataset = load_dataset("jmamou/augmented-glue-sst2")
pad_type = 'post' trunc_type = 'post'
#Tokenize our training data tokenizer = Tokenizer(num_words=MAX_NUM_WORDS) tokenizer.fit_on_texts(augmented_sst2_dataset['train']['sentence'])
#Encode training data sentences into sequences test_sequences = tokenizer.texts_to_sequences(sst2['test']['text'])
#Pad the training sequences test_padded = pad_sequences(test_sequences, padding=pad_type, truncating=trunc_type, maxlen=MAX_LEN)
reloaded_model = from_pretrained_keras('moshew/distilbilstm-finetuned-sst-2-english')
pred=reloaded_model.predict(test_padded) pred_bin = np.argmax(pred,1) accuracy_score(pred_bin, sst2['test']['label'])
0.9176276771004942
reloaded_model.summary()
Model: "model" _________________________________________________________________ Layer (type) Output Shape Param #
input_1 (InputLayer) [(None, 64)] 0
embedding (Embedding) (None, 64, 50) 500000
bidirectional (Bidirectiona (None, 64, 128) 58880
l)
bidirectional_1 (Bidirectio (None, 128) 98816
nal)
dropout (Dropout) (None, 128) 0
dense (Dense) (None, 2) 258
Total params: 657,954 Trainable params: 657,954 Non-trainable params: 0
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
Hyperparameters | Value |
---|---|
name | Adam |
learning_rate | 0.0010000000474974513 |
decay | 0.0 |
beta_1 | 0.8999999761581421 |
beta_2 | 0.9990000128746033 |
epsilon | 1e-07 |
amsgrad | False |
training_precision | float32 |