moshew
/

distilbilstm-finetuned-sst-2-english

Model card Files Files and versions Community

moshew commited on Dec 20, 2022

Commit

64607b6

·

1 Parent(s): 42c6f09

Update README.md

Files changed (1) hide show

README.md +64 -2

README.md CHANGED Viewed

@@ -4,7 +4,10 @@ library_name: keras
 ## Model description
-More information needed
 ## Intended uses & limitations
@@ -12,7 +15,66 @@ More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure

 ## Model description
+>x100 smaller size vs. DistilBERT with less than 0.5 drop in accuracy evaluated on SST-2 test data
+DistilBERT - 92.2, 67M parameters
+DistlBiLSTM - 91.7 65.8K paramter
 ## Intended uses & limitations
 ## Training and evaluation data
+Here is the evaluation code
+from datasets import load_dataset
+import numpy as np
+from sklearn.metrics import accuracy_score
+from keras.preprocessing.text import Tokenizer
+from keras.utils import pad_sequences
+import tensorflow as tf
+from huggingface_hub import from_pretrained_keras
+from datasets import load_dataset
+sst2 = load_dataset("SetFit/sst2")
+augmented_sst2_dataset = load_dataset("jmamou/augmented-glue-sst2")
+pad_type = 'post'
+trunc_type = 'post'
+#Tokenize our training data
+tokenizer = Tokenizer(num_words=MAX_NUM_WORDS)
+tokenizer.fit_on_texts(augmented_sst2_dataset['train']['sentence'])
+#Encode training data sentences into sequences
+test_sequences = tokenizer.texts_to_sequences(sst2['test']['text'])
+#Pad the training sequences
+test_padded = pad_sequences(test_sequences, padding=pad_type, truncating=trunc_type, maxlen=MAX_LEN)
+reloaded_model = from_pretrained_keras('moshew/distilbilstm-finetuned-sst-2-english')
+pred=reloaded_model.predict(test_padded)
+pred_bin = np.argmax(pred,1)
+accuracy_score(pred_bin, sst2['test']['label'])
+0.9176276771004942
+reloaded_model.summary()
+Model: "model"
+_________________________________________________________________
+ Layer (type)                Output Shape              Param #
+=================================================================
+ input_1 (InputLayer)        [(None, 64)]              0
+ embedding (Embedding)       (None, 64, 50)            500000
+ bidirectional (Bidirectiona  (None, 64, 128)          58880
+ l)
+ bidirectional_1 (Bidirectio  (None, 128)              98816
+ nal)
+ dropout (Dropout)           (None, 128)               0
+ dense (Dense)               (None, 2)                 258
+=================================================================
+Total params: 657,954
+Trainable params: 657,954
+Non-trainable params: 0
 ## Training procedure