ModernBERT-ar-base-tiny
This model was trained on Fineweb2 Ar sample dataset.
The tokenizer was also trained using the same dataset.
See sample code
(usage and training) and initial post
Updated: Jan. 12, 2025.
Model description
ModernBERT Arabic (MLM) experiment.
Intended uses & limitations
Educational and explorational uses only. Limited data, not fully trained.
Training and evaluation data
Evaluation on 5% of the data, uses 2 GPUs.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 12
- eval_batch_size: 12
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 24
- total_eval_batch_size: 24
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 50000
- mixed_precision_training: Native AMP
Framework versions
- Transformers 4.49.0.dev0
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 108
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.