gokuls
/

HBERTv1_emb_compress_48_L12_H64_A2

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

HBERTv1_emb_compress_48_L12_H64_A2 / README.md

gokuls's picture

End of training

503d744 over 1 year ago

|

history blame contribute delete

3.66 kB

	---
	tags:
	- generated_from_trainer
	datasets:
	- gokuls/wiki_book_corpus_complete_processed_bert_dataset
	metrics:
	- accuracy
	model-index:
	- name: HBERTv1_emb_compress_48_L12_H64_A2
	results:
	- task:
	name: Masked Language Modeling
	type: fill-mask
	dataset:
	name: gokuls/wiki_book_corpus_complete_processed_bert_dataset
	type: gokuls/wiki_book_corpus_complete_processed_bert_dataset
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.12850906143802152
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# HBERTv1_emb_compress_48_L12_H64_A2

	This model is a fine-tuned version of [](https://huggingface.co/) on the gokuls/wiki_book_corpus_complete_processed_bert_dataset dataset.
	It achieves the following results on the evaluation set:
	- Loss: 6.4079
	- Accuracy: 0.1285

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 96
	- eval_batch_size: 96
	- seed: 10
	- distributed_type: multi-GPU
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 10000
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:------:\|:---------------:\|:--------:\|
	\| 8.6554 \| 0.16 \| 10000 \| 8.5846 \| 0.0483 \|
	\| 7.2331 \| 0.33 \| 20000 \| 7.2280 \| 0.0542 \|
	\| 7.0014 \| 0.49 \| 30000 \| 6.9927 \| 0.0677 \|
	\| 6.8699 \| 0.66 \| 40000 \| 6.8637 \| 0.0856 \|
	\| 6.7777 \| 0.82 \| 50000 \| 6.7726 \| 0.0922 \|
	\| 6.7091 \| 0.98 \| 60000 \| 6.7101 \| 0.0974 \|
	\| 6.6626 \| 1.15 \| 70000 \| 6.6620 \| 0.1015 \|
	\| 6.6279 \| 1.31 \| 80000 \| 6.6255 \| 0.1040 \|
	\| 6.5917 \| 1.47 \| 90000 \| 6.5948 \| 0.1068 \|
	\| 6.5691 \| 1.64 \| 100000 \| 6.5695 \| 0.1094 \|
	\| 6.5486 \| 1.8 \| 110000 \| 6.5460 \| 0.1122 \|
	\| 6.5246 \| 1.97 \| 120000 \| 6.5275 \| 0.1144 \|
	\| 6.5069 \| 2.13 \| 130000 \| 6.5115 \| 0.1162 \|
	\| 6.5001 \| 2.29 \| 140000 \| 6.4962 \| 0.1180 \|
	\| 6.4785 \| 2.46 \| 150000 \| 6.4822 \| 0.1197 \|
	\| 6.4706 \| 2.62 \| 160000 \| 6.4714 \| 0.1212 \|
	\| 6.4612 \| 2.79 \| 170000 \| 6.4610 \| 0.1225 \|
	\| 6.4485 \| 2.95 \| 180000 \| 6.4530 \| 0.1233 \|
	\| 6.4477 \| 3.11 \| 190000 \| 6.4441 \| 0.1243 \|
	\| 6.4373 \| 3.28 \| 200000 \| 6.4395 \| 0.1251 \|
	\| 6.4351 \| 3.44 \| 210000 \| 6.4322 \| 0.1259 \|
	\| 6.4273 \| 3.6 \| 220000 \| 6.4264 \| 0.1262 \|
	\| 6.4153 \| 3.77 \| 230000 \| 6.4219 \| 0.1269 \|
	\| 6.4188 \| 3.93 \| 240000 \| 6.4182 \| 0.1274 \|
	\| 6.4128 \| 4.1 \| 250000 \| 6.4150 \| 0.1278 \|
	\| 6.4189 \| 4.26 \| 260000 \| 6.4121 \| 0.1280 \|
	\| 6.4102 \| 4.42 \| 270000 \| 6.4112 \| 0.1282 \|
	\| 6.4105 \| 4.59 \| 280000 \| 6.4087 \| 0.1285 \|
	\| 6.4065 \| 4.75 \| 290000 \| 6.4067 \| 0.1287 \|
	\| 6.4082 \| 4.92 \| 300000 \| 6.4070 \| 0.1285 \|


	### Framework versions

	- Transformers 4.33.2
	- Pytorch 1.14.0a0+410ce96
	- Datasets 2.14.5
	- Tokenizers 0.13.3