YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
text_generation_bangla_model
BanglaCLM dataset:
OSCAR: 12.84GB
Wikipedia dump: 6.24GB
ProthomAlo: 3.92GB
Kalerkantho: 3.24GB
Model description
- context size : 128
Training and evaluation data
The BanglaCLM data set is divided into a training set (90%)and a validation set (10%).
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
Batch size: 32
Initial learning rate: 5e-5
Number of warmup steps: 10000
Weight decay rate: 0.01
Tokenization algorithm: BPE
Vocabulary size of tokenizer: 50256
Total trainable params: 124,439,808
Epochs: 40
Number of training steps: 40772228
training_precision: float32
Training results
perplexity score: 2.86.
Framework versions
- Transformers 4.26.1
- TensorFlow 2.11.0
- Datasets 2.10.0
- Tokenizers 0.13.2
Citation
If you find this model helpful, please cite.
@INPROCEEDINGS{10303383,
author={Salim, Md. Shahidul and Murad, Hasan and Das, Dola and Ahmed, Faisal},
booktitle={2023 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)},
title={BanglaGPT: A Generative Pretrained Transformer-Based Model for Bangla Language},
year={2023},
volume={},
number={},
pages={56-59},
doi={10.1109/ICICT4SD59951.2023.10303383}}
- Downloads last month
- 127
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.