--- library_name: keras --- x100 smaller with less than 0.5 accuracy drop vs. distilbert-base-uncased-finetuned-sst-2-english ## Model description 2 Layers Bilstm model finetuned on SST-2 and distlled from RoBERTa teacher distilbert-base-uncased-finetuned-sst-2-english: 92.2 accuracy, 67M parameters moshew/distilbilstm-finetuned-sst-2-english: 91.9 accuracy, 0.66M parameters ## How to get started with the model Example on SST-2 test dataset classification: ​​ ```python !pip install datasets from datasets import load_dataset import numpy as np from sklearn.metrics import accuracy_score from keras.preprocessing.text import Tokenizer from keras.utils import pad_sequences import tensorflow as tf from huggingface_hub import from_pretrained_keras from datasets import load_dataset sst2 = load_dataset("SetFit/sst2") augmented_sst2_dataset = load_dataset("jmamou/augmented-glue-sst2") # Tokenize our training data tokenizer = Tokenizer(num_words=10000) tokenizer.fit_on_texts(augmented_sst2_dataset['train']['sentence']) # Encode training data sentences into sequences test_sequences = tokenizer.texts_to_sequences(sst2['test']['text']) # Pad the training sequences test_padded = pad_sequences(test_sequences, padding=pad_type = 'post', truncating=trunc_type = 'post', maxlen=64) reloaded_model = from_pretrained_keras('moshew/distilbilstm-finetuned-sst-2-english') pred=reloaded_model.predict(test_padded) pred_bin = np.argmax(pred,1) accuracy_score(pred_bin, sst2['test']['label']) 0.9187259747391543 reloaded_model.summary() Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 64)] 0 embedding (Embedding) (None, 64, 50) 500000 bidirectional (Bidirectiona (None, 64, 128) 58880 l) bidirectional_1 (Bidirectio (None, 128) 98816 nal) dropout (Dropout) (None, 128) 0 dense (Dense) (None, 2) 258 ================================================================= Total params: 657,954 Trainable params: 657,954 Non-trainable params: 0 _________________________________________________________________ ``` ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: | Hyperparameters | Value | | :-- | :-- | | name | Adam | | learning_rate | 0.0010000000474974513 | | decay | 0.0 | | beta_1 | 0.8999999761581421 | | beta_2 | 0.9990000128746033 | | epsilon | 1e-07 | | amsgrad | False | | training_precision | float32 | ## Model Plot
View Model Plot ![Model Image](./model.png)