|
--- |
|
library_name: transformers |
|
license: mit |
|
base_model: openai-community/gpt2 |
|
tags: |
|
- generated_from_trainer |
|
model-index: |
|
- name: arabic-nano-gpt |
|
results: [] |
|
datasets: |
|
- wikimedia/wikipedia |
|
language: |
|
- ar |
|
--- |
|
|
|
|
|
# arabic-nano-gpt |
|
|
|
This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) on an unknown dataset. |
|
It achieves the following results on the held-out test set: |
|
- Loss: 3.28796 |
|
|
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.001 |
|
- train_batch_size: 64 |
|
- eval_batch_size: 64 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 4 |
|
- total_train_batch_size: 256 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_ratio: 0.01 |
|
- num_epochs: 24 |
|
|
|
### Training results |
|
|
|
<!-- | Training Loss | Epoch | Step | Validation Loss | |
|
|:-------------:|:------:|:------:|:---------------:| |
|
| 5.62 | 0.0585 | 1000 | 5.3754 | |
|
| 4.6527 | 0.1170 | 2000 | 4.4918 | |
|
| 4.2818 | 0.1755 | 3000 | 4.1137 | |
|
| 4.1289 | 0.2340 | 4000 | 3.9388 | |
|
| 4.0021 | 0.2924 | 5000 | 3.8274 | |
|
| 3.9301 | 0.3509 | 6000 | 3.7534 | |
|
| 3.8822 | 0.4094 | 7000 | 3.6986 | |
|
| 3.8375 | 0.4679 | 8000 | 3.6557 | |
|
| 3.7918 | 0.5264 | 9000 | 3.6266 | |
|
| 3.7723 | 0.5849 | 10000 | 3.5994 | |
|
| 3.7549 | 0.6434 | 11000 | 3.5787 | |
|
| 3.7324 | 0.7019 | 12000 | 3.5612 | |
|
| 3.7249 | 0.7604 | 13000 | 3.5436 | |
|
| 3.6989 | 0.8188 | 14000 | 3.5323 | |
|
| 3.7003 | 0.8773 | 15000 | 3.5169 | |
|
| 3.6919 | 0.9358 | 16000 | 3.5055 | |
|
| 3.6717 | 0.9943 | 17000 | 3.4966 | |
|
| 3.6612 | 1.0528 | 18000 | 3.4868 | |
|
| 3.6467 | 1.1113 | 19000 | 3.4787 | |
|
| 3.6497 | 1.1698 | 20000 | 3.4707 | |
|
| 3.6193 | 1.2283 | 21000 | 3.4639 | |
|
| 3.6302 | 1.2868 | 22000 | 3.4572 | |
|
| 3.6225 | 1.3452 | 23000 | 3.4516 | |
|
| 3.635 | 1.4037 | 24000 | 3.4458 | |
|
| 3.6115 | 1.4622 | 25000 | 3.4416 | |
|
| 3.6162 | 1.5207 | 26000 | 3.4348 | |
|
| 3.6142 | 1.5792 | 27000 | 3.4329 | |
|
| 3.5956 | 1.6377 | 28000 | 3.4293 | |
|
| 3.5885 | 1.6962 | 29000 | 3.4226 | |
|
| 3.603 | 1.7547 | 30000 | 3.4195 | |
|
| 3.5947 | 1.8132 | 31000 | 3.4142 | |
|
| 3.588 | 1.8716 | 32000 | 3.4113 | |
|
| 3.5803 | 1.9301 | 33000 | 3.4065 | |
|
| 3.5891 | 1.9886 | 34000 | 3.4044 | |
|
| 3.5801 | 2.0471 | 35000 | 3.4032 | |
|
| 3.5739 | 2.1056 | 36000 | 3.3988 | |
|
| 3.5661 | 2.1641 | 37000 | 3.3981 | |
|
| 3.5657 | 2.2226 | 38000 | 3.3934 | |
|
| 3.5727 | 2.2811 | 39000 | 3.3907 | |
|
| 3.5617 | 2.3396 | 40000 | 3.3885 | |
|
| 3.5579 | 2.3980 | 41000 | 3.3855 | |
|
| 3.5553 | 2.4565 | 42000 | 3.3816 | |
|
| 3.5647 | 2.5150 | 43000 | 3.3803 | |
|
| 3.5531 | 2.5735 | 44000 | 3.3799 | |
|
| 3.5494 | 2.6320 | 45000 | 3.3777 | |
|
| 3.5525 | 2.6905 | 46000 | 3.3759 | |
|
| 3.5487 | 2.7490 | 47000 | 3.3725 | |
|
| 3.5551 | 2.8075 | 48000 | 3.3711 | |
|
| 3.5511 | 2.8660 | 49000 | 3.3681 | |
|
| 3.5463 | 2.9244 | 50000 | 3.3695 | |
|
| 3.5419 | 2.9829 | 51000 | 3.3660 | |
|
| 3.5414 | 3.0414 | 52000 | 3.3648 | |
|
| 3.5388 | 3.0999 | 53000 | 3.3605 | |
|
| 3.5333 | 3.1584 | 54000 | 3.3619 | |
|
| 3.525 | 3.2169 | 55000 | 3.3588 | |
|
| 3.5361 | 3.2754 | 56000 | 3.3572 | |
|
| 3.5302 | 3.3339 | 57000 | 3.3540 | |
|
| 3.5355 | 3.3924 | 58000 | 3.3553 | |
|
| 3.5391 | 3.4508 | 59000 | 3.3504 | |
|
| 3.531 | 3.5093 | 60000 | 3.3495 | |
|
| 3.5293 | 3.5678 | 61000 | 3.3483 | |
|
| 3.5269 | 3.6263 | 62000 | 3.3489 | |
|
| 3.5181 | 3.6848 | 63000 | 3.3494 | |
|
| 3.5205 | 3.7433 | 64000 | 3.3480 | |
|
| 3.5237 | 3.8018 | 65000 | 3.3440 | |
|
| 3.5316 | 3.8603 | 66000 | 3.3417 | |
|
| 3.5222 | 3.9188 | 67000 | 3.3433 | |
|
| 3.5174 | 3.9772 | 68000 | 3.3418 | |
|
| 3.518 | 4.0357 | 69000 | 3.3414 | |
|
| 3.5036 | 4.0942 | 70000 | 3.3365 | |
|
| 3.5101 | 4.1527 | 71000 | 3.3367 | |
|
| 3.5145 | 4.2112 | 72000 | 3.3361 | |
|
| 3.5053 | 4.2697 | 73000 | 3.3355 | |
|
| 3.5153 | 4.3282 | 74000 | 3.3334 | |
|
| 3.5003 | 4.3867 | 75000 | 3.3334 | |
|
| 3.5001 | 4.4452 | 76000 | 3.3326 | |
|
| 3.5114 | 4.5036 | 77000 | 3.3298 | |
|
| 3.5108 | 4.5621 | 78000 | 3.3292 | |
|
| 3.4985 | 4.6206 | 79000 | 3.3288 | |
|
| 3.497 | 4.6791 | 80000 | 3.3303 | |
|
| 3.4982 | 4.7376 | 81000 | 3.3291 | |
|
| 3.5068 | 4.7961 | 82000 | 3.3272 | |
|
| 3.4915 | 4.8546 | 83000 | 3.3244 | |
|
| 3.5036 | 4.9131 | 84000 | 3.3214 | |
|
| 3.5027 | 4.9716 | 85000 | 3.3214 | |
|
| 3.5078 | 5.0300 | 86000 | 3.3225 | |
|
| 3.5112 | 5.0885 | 87000 | 3.3243 | |
|
| 3.5049 | 5.1470 | 88000 | 3.3216 | |
|
| 3.4917 | 5.2055 | 89000 | 3.3192 | |
|
| 3.4802 | 5.2640 | 90000 | 3.3188 | |
|
| 3.4971 | 5.3225 | 91000 | 3.3201 | |
|
| 3.4941 | 5.3810 | 92000 | 3.3175 | |
|
| 3.4998 | 5.4395 | 93000 | 3.3179 | |
|
| 3.5011 | 5.4980 | 94000 | 3.3164 | |
|
| 3.4912 | 5.5564 | 95000 | 3.3180 | |
|
| 3.4961 | 5.6149 | 96000 | 3.3168 | |
|
| 3.4833 | 5.6734 | 97000 | 3.3148 | |
|
| 3.498 | 5.7319 | 98000 | 3.3133 | |
|
| 3.4892 | 5.7904 | 99000 | 3.3142 | |
|
| 3.4967 | 5.8489 | 100000 | 3.3142 | |
|
| 3.4847 | 5.9074 | 101000 | 3.3094 | |
|
| 3.4899 | 5.9659 | 102000 | 3.3102 | |
|
| 3.4774 | 6.0244 | 103000 | 3.3110 | |
|
| 3.4854 | 6.0828 | 104000 | 3.3106 | |
|
| 3.4873 | 6.1413 | 105000 | 3.3087 | |
|
| 3.4869 | 6.1998 | 106000 | 3.3102 | |
|
| 3.4833 | 6.2583 | 107000 | 3.3063 | |
|
| 3.491 | 6.3168 | 108000 | 3.3082 | |
|
| 3.4776 | 6.3753 | 109000 | 3.3075 | |
|
| 3.4924 | 6.4338 | 110000 | 3.3068 | |
|
| 3.4804 | 6.4923 | 111000 | 3.3050 | |
|
| 3.4805 | 6.5508 | 112000 | 3.3041 | |
|
| 3.4892 | 6.6093 | 113000 | 3.3031 | |
|
| 3.4775 | 6.6677 | 114000 | 3.3032 | |
|
| 3.481 | 6.7262 | 115000 | 3.3036 | |
|
| 3.4782 | 6.7847 | 116000 | 3.3025 | |
|
| 3.4804 | 6.8432 | 117000 | 3.3017 | |
|
| 3.4841 | 6.9017 | 118000 | 3.2999 | |
|
| 3.4784 | 6.9602 | 119000 | 3.3008 | |
|
| 3.4821 | 7.0187 | 120000 | 3.3001 | |
|
| 3.4671 | 7.0772 | 121000 | 3.3008 | |
|
| 3.485 | 7.1357 | 122000 | 3.2976 | |
|
| 3.4737 | 7.1941 | 123000 | 3.2985 | |
|
| 3.4793 | 7.2526 | 124000 | 3.2979 | |
|
| 3.4651 | 7.3111 | 125000 | 3.2968 | |
|
| 3.4847 | 7.3696 | 126000 | 3.2974 | |
|
| 3.474 | 7.4281 | 127000 | 3.2973 | |
|
| 3.4769 | 7.4866 | 128000 | 3.2955 | |
|
| 3.486 | 7.5451 | 129000 | 3.2953 | |
|
| 3.4684 | 7.6036 | 130000 | 3.2944 | |
|
| 3.4826 | 7.6621 | 131000 | 3.2949 | |
|
| 3.4685 | 7.7205 | 132000 | 3.2944 | |
|
| 3.4608 | 7.7790 | 133000 | 3.2931 | |
|
| 3.4655 | 7.8375 | 134000 | 3.2953 | |
|
| 3.4648 | 7.8960 | 135000 | 3.2928 | |
|
| 3.4632 | 7.9545 | 136000 | 3.2936 | |
|
| 3.4666 | 8.0130 | 137000 | 3.2902 | |
|
| 3.4663 | 8.0715 | 138000 | 3.2939 | |
|
| 3.4713 | 8.1300 | 139000 | 3.2904 | |
|
| 3.4654 | 8.1885 | 140000 | 3.2917 | |
|
| 3.466 | 8.2469 | 141000 | 3.2913 | |
|
| 3.4724 | 8.3054 | 142000 | 3.2889 | |
|
| 3.4695 | 8.3639 | 143000 | 3.2890 | |
|
| 3.4729 | 8.4224 | 144000 | 3.2876 | |
|
| 3.4551 | 8.4809 | 145000 | 3.2898 | |
|
| 3.4652 | 8.5394 | 146000 | 3.2885 | |
|
| 3.4689 | 8.5979 | 147000 | 3.2854 | |
|
| 3.4647 | 8.6564 | 148000 | 3.2857 | |
|
| 3.4653 | 8.7149 | 149000 | 3.2857 | |
|
| 3.4552 | 8.7733 | 150000 | 3.2861 | |
|
| 3.47 | 8.8318 | 151000 | 3.2868 | |
|
| 3.4627 | 8.8903 | 152000 | 3.2854 | --> |
|
|
|
**Training Loss** |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/63ccee86374057a338e03c1e/970nr9bptjHSMsjLDHfaY.png) |
|
|
|
**Validation Loss** |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/63ccee86374057a338e03c1e/GUbnak7yV02vd0NZhbeEO.png) |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.45.2 |
|
- Pytorch 2.5.0 |
|
- Datasets 3.0.1 |
|
- Tokenizers 0.20.1 |