moshew
/

distilbilstm-finetuned-sst-2-english

Model card Files Files and versions Community

distilbilstm-finetuned-sst-2-english / README.md

Wauplin's picture

Wauplin HF staff

Set `library_name` to `tf-keras`.

d796fa6 verified 6 months ago

|

3.4 kB

	---
	library_name: tf-keras
	---

	x100 smaller with less than 0.5 accuracy drop vs. distilbert-base-uncased-finetuned-sst-2-english

	## Model description

	2 Layers Bilstm model finetuned on SST-2 and distlled from RoBERTa teacher

	distilbert-base-uncased-finetuned-sst-2-english: 92.2 accuracy, 67M parameters
	moshew/distilbilstm-finetuned-sst-2-english: 91.9 accuracy, 0.66M parameters

	## How to get started with the model

	Example on SST-2 test dataset classification:

	```python
	!pip install datasets
	from datasets import load_dataset
	import numpy as np
	from sklearn.metrics import accuracy_score
	from keras.preprocessing.text import Tokenizer
	from keras.utils import pad_sequences
	import tensorflow as tf
	from huggingface_hub import from_pretrained_keras

	from datasets import load_dataset
	sst2 = load_dataset("SetFit/sst2")
	augmented_sst2_dataset = load_dataset("jmamou/augmented-glue-sst2")

	# Tokenize our training data
	tokenizer = Tokenizer(num_words=10000)
	tokenizer.fit_on_texts(augmented_sst2_dataset['train']['sentence'])

	# Encode test data sentences into sequences
	test_sequences = tokenizer.texts_to_sequences(sst2['test']['text'])

	# Pad the test sequences
	test_padded = pad_sequences(test_sequences, padding = 'post', truncating = 'post', maxlen=64)

	reloaded_model = from_pretrained_keras('moshew/distilbilstm-finetuned-sst-2-english')

	#Evaluate model on SST2 test data (GLUE)
	pred=reloaded_model.predict(test_padded)
	pred_bin = np.argmax(pred,1)
	accuracy_score(pred_bin, sst2['test']['label'])

	0.9187259747391543

	reloaded_model.summary()

	Model: "model"
	_________________________________________________________________
	Layer (type) Output Shape Param #
	=================================================================
	input_1 (InputLayer) [(None, 64)] 0

	embedding (Embedding) (None, 64, 50) 500000

	bidirectional (Bidirectiona (None, 64, 128) 58880
	l)

	bidirectional_1 (Bidirectio (None, 128) 98816
	nal)

	dropout (Dropout) (None, 128) 0

	dense (Dense) (None, 2) 258

	=================================================================
	Total params: 657,954
	Trainable params: 657,954
	Non-trainable params: 0
	_________________________________________________________________

	```

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:

	\| Hyperparameters \| Value \|
	\| :-- \| :-- \|
	\| name \| Adam \|
	\| learning_rate \| 0.0010000000474974513 \|
	\| decay \| 0.0 \|
	\| beta_1 \| 0.8999999761581421 \|
	\| beta_2 \| 0.9990000128746033 \|
	\| epsilon \| 1e-07 \|
	\| amsgrad \| False \|
	\| training_precision \| float32 \|


	## Model Plot

	<details>
	<summary>View Model Plot</summary>

	![Model Image](./model.png)

	</details>