File size: 2,152 Bytes
4024bd4 72a4d62 4024bd4 72a4d62 62c866d 72a4d62 62c866d 72a4d62 4024bd4 72a4d62 62c866d 4024bd4 72a4d62 4024bd4 72a4d62 4024bd4 72a4d62 4024bd4 72a4d62 4024bd4 72a4d62 4024bd4 72a4d62 4024bd4 72a4d62 4024bd4 72a4d62 1965f36 4024bd4 1965f36 4024bd4 1965f36 4024bd4 72a4d62 4024bd4 72a4d62 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
library_name: transformers
license: mit
base_model: openai-community/gpt2
tags:
- generated_from_trainer
model-index:
- name: arabic-nano-gpt-v1
results: []
datasets:
- wikimedia/wikipedia
language:
- ar
---
# arabic-nano-gpt-v1
This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) on the arabic [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.
Repository on GitHub: [e-hossam96/arabic-nano-gpt](https://github.com/e-hossam96/arabic-nano-gpt.git)
The model achieves the following results on the held-out test set:
- Loss: 3.02885
## How to Use
```python
import torch
from transformers import pipeline
model_ckpt = "e-hossam96/arabic-nano-gpt-v1"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
lm = pipeline(task="text-generation", model=model_ckpt, device=device)
prompt = """المحرك النفاث هو محرك ينفث الموائع (الماء أو الهواء) بسرعة فائقة \
لينتج قوة دافعة اعتمادا على مبدأ قانون نيوتن الثالث للحركة. \
هذا التعريف الواسع للمحركات النفاثة يتضمن أيضا"""
output = lm(prompt, max_new_tokens=128)
print(output[0]["generated_text"])
```
## Model description
- Embedding Size: 384
- Attention Heads: 4
- Attention Layers: 4
## Training and evaluation data
The entire wikipedia dataset was split into three splits based on the 90-5-5 ratios.
## Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 512
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 24
## Training Loss

## Validation Loss

## Framework versions
- Transformers 4.45.2
- Pytorch 2.5.0
- Datasets 3.0.1
- Tokenizers 0.20.1
|