moshew's picture
Update README.md
64607b6
|
raw
history blame
3.16 kB
metadata
library_name: keras

Model description

x100 smaller size vs. DistilBERT with less than 0.5 drop in accuracy evaluated on SST-2 test data

DistilBERT - 92.2, 67M parameters DistlBiLSTM - 91.7 65.8K paramter

Intended uses & limitations

More information needed

Training and evaluation data

Here is the evaluation code from datasets import load_dataset import numpy as np from sklearn.metrics import accuracy_score

from keras.preprocessing.text import Tokenizer from keras.utils import pad_sequences import tensorflow as tf from huggingface_hub import from_pretrained_keras

from datasets import load_dataset sst2 = load_dataset("SetFit/sst2") augmented_sst2_dataset = load_dataset("jmamou/augmented-glue-sst2")

pad_type = 'post' trunc_type = 'post'

#Tokenize our training data tokenizer = Tokenizer(num_words=MAX_NUM_WORDS) tokenizer.fit_on_texts(augmented_sst2_dataset['train']['sentence'])

#Encode training data sentences into sequences test_sequences = tokenizer.texts_to_sequences(sst2['test']['text'])

#Pad the training sequences test_padded = pad_sequences(test_sequences, padding=pad_type, truncating=trunc_type, maxlen=MAX_LEN)

reloaded_model = from_pretrained_keras('moshew/distilbilstm-finetuned-sst-2-english')

pred=reloaded_model.predict(test_padded) pred_bin = np.argmax(pred,1) accuracy_score(pred_bin, sst2['test']['label'])

0.9176276771004942

reloaded_model.summary()

Model: "model" _________________________________________________________________ Layer (type) Output Shape Param #

input_1 (InputLayer) [(None, 64)] 0

embedding (Embedding) (None, 64, 50) 500000

bidirectional (Bidirectiona (None, 64, 128) 58880
l)

bidirectional_1 (Bidirectio (None, 128) 98816
nal)

dropout (Dropout) (None, 128) 0

dense (Dense) (None, 2) 258

Total params: 657,954 Trainable params: 657,954 Non-trainable params: 0

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

Hyperparameters Value
name Adam
learning_rate 0.0010000000474974513
decay 0.0
beta_1 0.8999999761581421
beta_2 0.9990000128746033
epsilon 1e-07
amsgrad False
training_precision float32

Model Plot

View Model Plot

Model Image