File size: 7,433 Bytes

---
base_model: facebook/nougat-base
library_name: transformers
license: cc-by-4.0
tags:
- generated_from_trainer
model-index:
- name: dhivehi-nougat-base
  results: []
datasets:
- alakxender/dhivehi-image-text
language:
- dv
---

# DHIVEHI NOUGAT BASE (IMAGE-TO-TEXT)

This model is a fine-tuned version of [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0142

## Model description

Finetuned dhivehi on text-image dataset, config all

## Usage

```python
from PIL import Image
import torch
from transformers import NougatProcessor, VisionEncoderDecoderModel
from pathlib import Path

# Load the model and processor
processor = NougatProcessor.from_pretrained("alakxender/dhivehi-nougat-base")
model = VisionEncoderDecoderModel.from_pretrained(
    "alakxender/dhivehi-nougat-base",  
    torch_dtype=torch.bfloat16,                 # Optional: Load the model with BF16 data type for faster inference and lower memory usage
    attn_implementation={                       # Optional: Specify the attention kernel implementations for different parts of the model
        "decoder": "flash_attention_2",         # Use FlashAttention-2 for the decoder for improved performance
        "encoder": "eager"                      # Use the default ("eager") attention implementation for the encoder
    }
)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

context_length = 128

def predict(img_path):
    # Ensure image is in RGB format
    image = Image.open(img_path).convert("RGB")  
    pixel_values = processor(image, return_tensors="pt").pixel_values.to(torch.bfloat16)

    # generate prediction
    outputs = model.generate(
        pixel_values.to(device),
        min_length=1,
        max_new_tokens=context_length,
        repetition_penalty=1.5,
        bad_words_ids=[[processor.tokenizer.unk_token_id]],
        eos_token_id=processor.tokenizer.eos_token_id,
    )

    page_sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
    return page_sequence

print(predict("DV01-04_31.jpg"))
```

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 3
- eval_batch_size: 3
- seed: 42
- gradient_accumulation_steps: 6
- total_train_batch_size: 18
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 100

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 6.4404        | 0.0057 | 100  | 1.0417          |
| 5.7761        | 0.0114 | 200  | 0.9055          |
| 5.1723        | 0.0171 | 300  | 0.8193          |
| 4.8315        | 0.0228 | 400  | 0.7661          |
| 4.4217        | 0.0285 | 500  | 0.7232          |
| 3.9861        | 0.0342 | 600  | 0.6724          |
| 3.7268        | 0.0400 | 700  | 0.5966          |
| 3.5393        | 0.0457 | 800  | 0.5337          |
| 2.8666        | 0.0514 | 900  | 0.4108          |
| 2.0269        | 0.0571 | 1000 | 0.2803          |
| 1.4121        | 0.0628 | 1100 | 0.1904          |
| 1.0161        | 0.0685 | 1200 | 0.1351          |
| 0.867         | 0.0742 | 1300 | 0.1130          |
| 0.7506        | 0.0799 | 1400 | 0.0950          |
| 0.5764        | 0.0856 | 1500 | 0.0801          |
| 0.5123        | 0.0913 | 1600 | 0.0716          |
| 0.558         | 0.0970 | 1700 | 0.0650          |
| 0.5242        | 0.1027 | 1800 | 0.0616          |
| 0.4229        | 0.1084 | 1900 | 0.0556          |
| 0.3721        | 0.1142 | 2000 | 0.0545          |
| 0.3388        | 0.1199 | 2100 | 0.0519          |
| 0.4042        | 0.1256 | 2200 | 0.0499          |
| 0.3593        | 0.1313 | 2300 | 0.0449          |
| 0.3837        | 0.1370 | 2400 | 0.0421          |
| 0.3291        | 0.1427 | 2500 | 0.0407          |
| 0.3092        | 0.1484 | 2600 | 0.0388          |
| 0.2762        | 0.1541 | 2700 | 0.0380          |
| 0.3073        | 0.1598 | 2800 | 0.0422          |
| 0.2577        | 0.1655 | 2900 | 0.0340          |
| 0.2596        | 0.1712 | 3000 | 0.0331          |
| 0.3397        | 0.1769 | 3100 | 0.0328          |
| 0.3019        | 0.1826 | 3200 | 0.0307          |
| 0.2522        | 0.1884 | 3300 | 0.0314          |
| 0.2546        | 0.1941 | 3400 | 0.0289          |
| 0.1972        | 0.1998 | 3500 | 0.0282          |
| 0.2231        | 0.2055 | 3600 | 0.0300          |
| 0.2342        | 0.2112 | 3700 | 0.0278          |
| 0.2152        | 0.2169 | 3800 | 0.0276          |
| 0.2059        | 0.2226 | 3900 | 0.0260          |
| 0.2165        | 0.2283 | 4000 | 0.0257          |
| 0.1919        | 0.2340 | 4100 | 0.0253          |
| 0.1608        | 0.2397 | 4200 | 0.0244          |
| 0.1673        | 0.2454 | 4300 | 0.0242          |
| 0.2004        | 0.2511 | 4400 | 0.0248          |
| 0.2277        | 0.2568 | 4500 | 0.0230          |
| 0.1831        | 0.2625 | 4600 | 0.0228          |
| 0.1905        | 0.2683 | 4700 | 0.0221          |
| 0.0996        | 0.2740 | 4800 | 0.0215          |
| 0.1596        | 0.2797 | 4900 | 0.0213          |
| 0.168         | 0.2854 | 5000 | 0.0208          |
| 0.2119        | 0.2911 | 5100 | 0.0215          |
| 0.1436        | 0.2968 | 5200 | 0.0202          |
| 0.1656        | 0.3025 | 5300 | 0.0202          |
| 0.1183        | 0.3082 | 5400 | 0.0194          |
| 0.1397        | 0.3139 | 5500 | 0.0202          |
| 0.1248        | 0.3196 | 5600 | 0.0191          |
| 0.1202        | 0.3253 | 5700 | 0.0191          |
| 0.1175        | 0.3310 | 5800 | 0.0207          |
| 0.1427        | 0.3367 | 5900 | 0.0183          |
| 0.1487        | 0.3425 | 6000 | 0.0178          |
| 0.1597        | 0.3482 | 6100 | 0.0174          |
| 0.1363        | 0.3539 | 6200 | 0.0172          |
| 0.1266        | 0.3596 | 6300 | 0.0171          |
| 0.1288        | 0.3653 | 6400 | 0.0170          |
| 0.1202        | 0.3710 | 6500 | 0.0170          |
| 0.1174        | 0.3767 | 6600 | 0.0164          |
| 0.1334        | 0.3824 | 6700 | 0.0168          |
| 0.1627        | 0.3881 | 6800 | 0.0164          |
| 0.0982        | 0.3938 | 6900 | 0.0161          |
| 0.1038        | 0.3995 | 7000 | 0.0160          |
| 0.1523        | 0.4052 | 7100 | 0.0160          |
| 0.1337        | 0.4109 | 7200 | 0.0157          |
| 0.2063        | 0.4167 | 7300 | 0.0153          |
| 0.1476        | 0.4224 | 7400 | 0.0156          |
| 0.0838        | 0.4281 | 7500 | 0.0150          |
| 0.082         | 0.4338 | 7600 | 0.0158          |
| 0.1269        | 0.4395 | 7700 | 0.0159          |
| 0.1168        | 0.4452 | 7800 | 0.0147          |
| 0.1024        | 0.4509 | 7900 | 0.0147          |
| 0.1138        | 0.4566 | 8000 | 0.0145          |
| 0.1188        | 0.4623 | 8100 | 0.0146          |
| 0.0881        | 0.4680 | 8200 | 0.0142          |
| 0.0752        | 0.4737 | 8300 | 0.0138          |
| 0.1165        | 0.4794 | 8400 | 0.0141          |
| 0.1017        | 0.4851 | 8500 | 0.0137          |
| 0.0971        | 0.4909 | 8600 | 0.0135          |
| 0.135         | 0.4966 | 8700 | 0.0136          |
| 0.0732        | 0.5023 | 8800 | 0.0137          |
| 0.1217        | 0.5080 | 8900 | 0.0142          |


### Framework versions

- Transformers 4.47.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0