arabic-nano-gpt-v0 / README.md
e-hossam96's picture
Update README.md
f0b017b verified
|
raw
history blame
9.72 kB
---
library_name: transformers
license: mit
base_model: openai-community/gpt2
tags:
- generated_from_trainer
model-index:
- name: arabic-nano-gpt
results: []
datasets:
- wikimedia/wikipedia
language:
- ar
---
# arabic-nano-gpt
This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) on an unknown dataset.
It achieves the following results on the held-out test set:
- Loss: 3.28796
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 256
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 24
### Training results
<!-- | Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:------:|:---------------:|
| 5.62 | 0.0585 | 1000 | 5.3754 |
| 4.6527 | 0.1170 | 2000 | 4.4918 |
| 4.2818 | 0.1755 | 3000 | 4.1137 |
| 4.1289 | 0.2340 | 4000 | 3.9388 |
| 4.0021 | 0.2924 | 5000 | 3.8274 |
| 3.9301 | 0.3509 | 6000 | 3.7534 |
| 3.8822 | 0.4094 | 7000 | 3.6986 |
| 3.8375 | 0.4679 | 8000 | 3.6557 |
| 3.7918 | 0.5264 | 9000 | 3.6266 |
| 3.7723 | 0.5849 | 10000 | 3.5994 |
| 3.7549 | 0.6434 | 11000 | 3.5787 |
| 3.7324 | 0.7019 | 12000 | 3.5612 |
| 3.7249 | 0.7604 | 13000 | 3.5436 |
| 3.6989 | 0.8188 | 14000 | 3.5323 |
| 3.7003 | 0.8773 | 15000 | 3.5169 |
| 3.6919 | 0.9358 | 16000 | 3.5055 |
| 3.6717 | 0.9943 | 17000 | 3.4966 |
| 3.6612 | 1.0528 | 18000 | 3.4868 |
| 3.6467 | 1.1113 | 19000 | 3.4787 |
| 3.6497 | 1.1698 | 20000 | 3.4707 |
| 3.6193 | 1.2283 | 21000 | 3.4639 |
| 3.6302 | 1.2868 | 22000 | 3.4572 |
| 3.6225 | 1.3452 | 23000 | 3.4516 |
| 3.635 | 1.4037 | 24000 | 3.4458 |
| 3.6115 | 1.4622 | 25000 | 3.4416 |
| 3.6162 | 1.5207 | 26000 | 3.4348 |
| 3.6142 | 1.5792 | 27000 | 3.4329 |
| 3.5956 | 1.6377 | 28000 | 3.4293 |
| 3.5885 | 1.6962 | 29000 | 3.4226 |
| 3.603 | 1.7547 | 30000 | 3.4195 |
| 3.5947 | 1.8132 | 31000 | 3.4142 |
| 3.588 | 1.8716 | 32000 | 3.4113 |
| 3.5803 | 1.9301 | 33000 | 3.4065 |
| 3.5891 | 1.9886 | 34000 | 3.4044 |
| 3.5801 | 2.0471 | 35000 | 3.4032 |
| 3.5739 | 2.1056 | 36000 | 3.3988 |
| 3.5661 | 2.1641 | 37000 | 3.3981 |
| 3.5657 | 2.2226 | 38000 | 3.3934 |
| 3.5727 | 2.2811 | 39000 | 3.3907 |
| 3.5617 | 2.3396 | 40000 | 3.3885 |
| 3.5579 | 2.3980 | 41000 | 3.3855 |
| 3.5553 | 2.4565 | 42000 | 3.3816 |
| 3.5647 | 2.5150 | 43000 | 3.3803 |
| 3.5531 | 2.5735 | 44000 | 3.3799 |
| 3.5494 | 2.6320 | 45000 | 3.3777 |
| 3.5525 | 2.6905 | 46000 | 3.3759 |
| 3.5487 | 2.7490 | 47000 | 3.3725 |
| 3.5551 | 2.8075 | 48000 | 3.3711 |
| 3.5511 | 2.8660 | 49000 | 3.3681 |
| 3.5463 | 2.9244 | 50000 | 3.3695 |
| 3.5419 | 2.9829 | 51000 | 3.3660 |
| 3.5414 | 3.0414 | 52000 | 3.3648 |
| 3.5388 | 3.0999 | 53000 | 3.3605 |
| 3.5333 | 3.1584 | 54000 | 3.3619 |
| 3.525 | 3.2169 | 55000 | 3.3588 |
| 3.5361 | 3.2754 | 56000 | 3.3572 |
| 3.5302 | 3.3339 | 57000 | 3.3540 |
| 3.5355 | 3.3924 | 58000 | 3.3553 |
| 3.5391 | 3.4508 | 59000 | 3.3504 |
| 3.531 | 3.5093 | 60000 | 3.3495 |
| 3.5293 | 3.5678 | 61000 | 3.3483 |
| 3.5269 | 3.6263 | 62000 | 3.3489 |
| 3.5181 | 3.6848 | 63000 | 3.3494 |
| 3.5205 | 3.7433 | 64000 | 3.3480 |
| 3.5237 | 3.8018 | 65000 | 3.3440 |
| 3.5316 | 3.8603 | 66000 | 3.3417 |
| 3.5222 | 3.9188 | 67000 | 3.3433 |
| 3.5174 | 3.9772 | 68000 | 3.3418 |
| 3.518 | 4.0357 | 69000 | 3.3414 |
| 3.5036 | 4.0942 | 70000 | 3.3365 |
| 3.5101 | 4.1527 | 71000 | 3.3367 |
| 3.5145 | 4.2112 | 72000 | 3.3361 |
| 3.5053 | 4.2697 | 73000 | 3.3355 |
| 3.5153 | 4.3282 | 74000 | 3.3334 |
| 3.5003 | 4.3867 | 75000 | 3.3334 |
| 3.5001 | 4.4452 | 76000 | 3.3326 |
| 3.5114 | 4.5036 | 77000 | 3.3298 |
| 3.5108 | 4.5621 | 78000 | 3.3292 |
| 3.4985 | 4.6206 | 79000 | 3.3288 |
| 3.497 | 4.6791 | 80000 | 3.3303 |
| 3.4982 | 4.7376 | 81000 | 3.3291 |
| 3.5068 | 4.7961 | 82000 | 3.3272 |
| 3.4915 | 4.8546 | 83000 | 3.3244 |
| 3.5036 | 4.9131 | 84000 | 3.3214 |
| 3.5027 | 4.9716 | 85000 | 3.3214 |
| 3.5078 | 5.0300 | 86000 | 3.3225 |
| 3.5112 | 5.0885 | 87000 | 3.3243 |
| 3.5049 | 5.1470 | 88000 | 3.3216 |
| 3.4917 | 5.2055 | 89000 | 3.3192 |
| 3.4802 | 5.2640 | 90000 | 3.3188 |
| 3.4971 | 5.3225 | 91000 | 3.3201 |
| 3.4941 | 5.3810 | 92000 | 3.3175 |
| 3.4998 | 5.4395 | 93000 | 3.3179 |
| 3.5011 | 5.4980 | 94000 | 3.3164 |
| 3.4912 | 5.5564 | 95000 | 3.3180 |
| 3.4961 | 5.6149 | 96000 | 3.3168 |
| 3.4833 | 5.6734 | 97000 | 3.3148 |
| 3.498 | 5.7319 | 98000 | 3.3133 |
| 3.4892 | 5.7904 | 99000 | 3.3142 |
| 3.4967 | 5.8489 | 100000 | 3.3142 |
| 3.4847 | 5.9074 | 101000 | 3.3094 |
| 3.4899 | 5.9659 | 102000 | 3.3102 |
| 3.4774 | 6.0244 | 103000 | 3.3110 |
| 3.4854 | 6.0828 | 104000 | 3.3106 |
| 3.4873 | 6.1413 | 105000 | 3.3087 |
| 3.4869 | 6.1998 | 106000 | 3.3102 |
| 3.4833 | 6.2583 | 107000 | 3.3063 |
| 3.491 | 6.3168 | 108000 | 3.3082 |
| 3.4776 | 6.3753 | 109000 | 3.3075 |
| 3.4924 | 6.4338 | 110000 | 3.3068 |
| 3.4804 | 6.4923 | 111000 | 3.3050 |
| 3.4805 | 6.5508 | 112000 | 3.3041 |
| 3.4892 | 6.6093 | 113000 | 3.3031 |
| 3.4775 | 6.6677 | 114000 | 3.3032 |
| 3.481 | 6.7262 | 115000 | 3.3036 |
| 3.4782 | 6.7847 | 116000 | 3.3025 |
| 3.4804 | 6.8432 | 117000 | 3.3017 |
| 3.4841 | 6.9017 | 118000 | 3.2999 |
| 3.4784 | 6.9602 | 119000 | 3.3008 |
| 3.4821 | 7.0187 | 120000 | 3.3001 |
| 3.4671 | 7.0772 | 121000 | 3.3008 |
| 3.485 | 7.1357 | 122000 | 3.2976 |
| 3.4737 | 7.1941 | 123000 | 3.2985 |
| 3.4793 | 7.2526 | 124000 | 3.2979 |
| 3.4651 | 7.3111 | 125000 | 3.2968 |
| 3.4847 | 7.3696 | 126000 | 3.2974 |
| 3.474 | 7.4281 | 127000 | 3.2973 |
| 3.4769 | 7.4866 | 128000 | 3.2955 |
| 3.486 | 7.5451 | 129000 | 3.2953 |
| 3.4684 | 7.6036 | 130000 | 3.2944 |
| 3.4826 | 7.6621 | 131000 | 3.2949 |
| 3.4685 | 7.7205 | 132000 | 3.2944 |
| 3.4608 | 7.7790 | 133000 | 3.2931 |
| 3.4655 | 7.8375 | 134000 | 3.2953 |
| 3.4648 | 7.8960 | 135000 | 3.2928 |
| 3.4632 | 7.9545 | 136000 | 3.2936 |
| 3.4666 | 8.0130 | 137000 | 3.2902 |
| 3.4663 | 8.0715 | 138000 | 3.2939 |
| 3.4713 | 8.1300 | 139000 | 3.2904 |
| 3.4654 | 8.1885 | 140000 | 3.2917 |
| 3.466 | 8.2469 | 141000 | 3.2913 |
| 3.4724 | 8.3054 | 142000 | 3.2889 |
| 3.4695 | 8.3639 | 143000 | 3.2890 |
| 3.4729 | 8.4224 | 144000 | 3.2876 |
| 3.4551 | 8.4809 | 145000 | 3.2898 |
| 3.4652 | 8.5394 | 146000 | 3.2885 |
| 3.4689 | 8.5979 | 147000 | 3.2854 |
| 3.4647 | 8.6564 | 148000 | 3.2857 |
| 3.4653 | 8.7149 | 149000 | 3.2857 |
| 3.4552 | 8.7733 | 150000 | 3.2861 |
| 3.47 | 8.8318 | 151000 | 3.2868 |
| 3.4627 | 8.8903 | 152000 | 3.2854 | -->
**Training Loss**
![image/png](https://cdn-uploads.huggingface.co/production/uploads/63ccee86374057a338e03c1e/970nr9bptjHSMsjLDHfaY.png)
**Validation Loss**
![image/png](https://cdn-uploads.huggingface.co/production/uploads/63ccee86374057a338e03c1e/GUbnak7yV02vd0NZhbeEO.png)
### Framework versions
- Transformers 4.45.2
- Pytorch 2.5.0
- Datasets 3.0.1
- Tokenizers 0.20.1