File size: 2,144 Bytes
3977ced
 
 
 
 
b331cd3
3977ced
b331cd3
 
3ddbc59
b331cd3
3ddbc59
b331cd3
3977ced
 
 
 
b331cd3
 
 
 
 
 
3ddbc59
 
b331cd3
3977ced
b331cd3
 
 
3977ced
b331cd3
 
3977ced
 
b331cd3
3977ced
b331cd3
 
 
3977ced
b331cd3
3977ced
b331cd3
 
 
 
3977ced
b331cd3
 
 
 
 
 
 
 
 
3977ced
 
b331cd3
326d518
 
b225392
3977ced
326d518
b19a398
3977ced
 
b225392
326d518
3977ced
b331cd3
f0b017b
b331cd3
f0b017b
b331cd3
3977ced
b331cd3
3977ced
b331cd3
3977ced
4f490f8
 
3977ced
b331cd3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
library_name: transformers
license: mit
base_model: openai-community/gpt2
tags:
  - generated_from_trainer
model-index:
  - name: arabic-nano-gpt
    results: []
datasets:
  - wikimedia/wikipedia
language:
  - ar
---

# arabic-nano-gpt

This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) on the arabic [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.

Repository on GitHub: [e-hossam96/arabic-nano-gpt](https://github.com/e-hossam96/arabic-nano-gpt.git)

The model achieves the following results on the held-out test set:

- Loss: 3.28796

## How to Use

```python
import torch
from transformers import pipeline

model_ckpt = "e-hossam96/arabic-nano-gpt-v0"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


lm = pipeline(task="text-generation", model=model_ckpt, device=device)

prompt = """المحرك النفاث هو محرك ينفث الموائع (الماء أو الهواء) بسرعة فائقة \
لينتج قوة دافعة اعتمادا على مبدأ قانون نيوتن الثالث للحركة. \
هذا التعريف الواسع للمحركات النفاثة يتضمن أيضا"""

output = lm(prompt, max_new_tokens=128)

print(output[0]["generated_text"])
```

## Model description

- Embedding Size: 256
- Attention Heads: 4
- Attention Layers: 4

## Training and evaluation data

The entire wikipedia dataset was split into three splits based on the 90-5-5 ratios.

## Training hyperparameters

The following hyperparameters were used during training:

- learning_rate: 0.001
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 256
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 24

## Training Loss

![Training Loss](assets/arabic-nano-gpt-v0-train-loss.png)

## Validation Loss

![Validation Loss](assets/arabic-nano-gpt-v0-eval-loss.png)

## Framework versions

- Transformers 4.45.2
- Pytorch 2.5.0
- Datasets 3.0.1
- Tokenizers 0.20.1