File size: 6,826 Bytes
5e404f1
cd0ff7a
5e404f1
cd0ff7a
 
5e404f1
 
 
 
c77689d
5e404f1
 
 
 
 
 
 
a4c13bc
 
 
c77689d
cd0ff7a
 
 
f41419b
cd0ff7a
 
 
d837861
f41419b
e4e5c3d
5e404f1
a4c13bc
4d3f9e4
5e404f1
 
e8ca74c
e4e5c3d
d837861
a4c13bc
5e404f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f41419b
5e404f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cd0ff7a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
license: openrail
tags:
- document-image-binarization
- image-segmentation
- generated_from_trainer
model-index:
- name: binarization-segformer-b3
  results: []
pipeline_tag: image-segmentation
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# binarization-segformer-b3

This model is a fine-tuned version of [nvidia/segformer-b3](https://huggingface.co/nvidia/segformer-b3-finetuned-cityscapes-1024-1024) 
on the same ensemble of 13 datasets as the [SauvolaNet](https://arxiv.org/pdf/2105.05521.pdf) work publicly available 
in their GitHub [repository](https://github.com/Leedeng/SauvolaNet#datasets).

It achieves the following results on the evaluation set on DIBCO metrics:
- loss: 0.1017
- F-measure: 0.9776
- pseudo F-measure: 0.9531
- PSNR: 14.5040
- DRD: 5.3749

with PSNR the peak signal-to-noise ratio and DRD the distance reciprocal distortion.

For more information on the above DIBCO metrics, see the 2017 introductory [paper](https://ieeexplore.ieee.org/document/8270159).

**Warning:** This model only accepts images with a resolution of 640 due to GPU compute constraints on Colab free tier during training.

## Model description

This model is part of on-going research on pure semantic segmentation models as a formulation of document image binarization (DIBCO).
This is in contrast to the late trend of adapting classic binarization algorithms with neural networks, 
such as [DeepOtsu](https://arxiv.org/abs/1901.06081) or the aforementioned SauvolaNet work
as extensions of the classical Otsu's method and Sauvola thresholding algorithm, respectively.

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 10
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 50
- num_epochs: 50

### Training results

| training loss | epoch | step | validation loss | F-measure | pseudo F-measure | PSNR    | DRD      |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:---------:|:-------:|:--------:|
| 0.6667        | 1.03  | 10   | 0.6683          | 0.7127   | 0.6831    | 4.8248  | 107.2894 |
| 0.6371        | 2.05  | 20   | 0.6390          | 0.8173   | 0.7360    | 6.1079  | 69.7770  |
| 0.587         | 3.08  | 30   | 0.5652          | 0.8934   | 0.8187    | 7.9143  | 40.5464  |
| 0.5288        | 4.1   | 40   | 0.4926          | 0.9240   | 0.8554    | 9.2247  | 27.4220  |
| 0.4601        | 5.13  | 50   | 0.4244          | 0.9490   | 0.8944    | 10.8830 | 16.8051  |
| 0.3864        | 6.15  | 60   | 0.3446          | 0.9638   | 0.9218    | 12.3460 | 10.6997  |
| 0.3331        | 7.18  | 70   | 0.3055          | 0.9693   | 0.9317    | 13.0531 | 8.5298   |
| 0.2821        | 8.21  | 80   | 0.2512          | 0.9736   | 0.9427    | 13.6929 | 6.8343   |
| 0.2392        | 9.23  | 90   | 0.2112          | 0.9744   | 0.9462    | 13.8825 | 6.4094   |
| 0.2126        | 10.26 | 100  | 0.1948          | 0.9743   | 0.9433    | 13.8424 | 6.5637   |
| 0.1889        | 11.28 | 110  | 0.1710          | 0.9749   | 0.9499    | 13.9784 | 6.1757   |
| 0.1662        | 12.31 | 120  | 0.1604          | 0.9753   | 0.9495    | 14.0450 | 6.0929   |
| 0.1506        | 13.33 | 130  | 0.1451          | 0.9750   | 0.9550    | 14.0028 | 6.1031   |
| 0.1359        | 14.36 | 140  | 0.1362          | 0.9759   | 0.9501    | 14.1383 | 5.9699   |
| 0.1321        | 15.38 | 150  | 0.1351          | 0.9761   | 0.9485    | 14.1907 | 5.9045   |
| 0.1283        | 16.41 | 160  | 0.1266          | 0.9758   | 0.9541    | 14.1515 | 5.8287   |
| 0.1198        | 17.44 | 170  | 0.1232          | 0.9763   | 0.9535    | 14.2411 | 5.7300   |
| 0.1151        | 18.46 | 180  | 0.1232          | 0.9765   | 0.9482    | 14.2788 | 5.8266   |
| 0.1146        | 19.49 | 190  | 0.1183          | 0.9764   | 0.9530    | 14.2363 | 5.7922   |
| 0.1027        | 20.51 | 200  | 0.1162          | 0.9765   | 0.9535    | 14.2867 | 5.6246   |
| 0.1051        | 21.54 | 210  | 0.1146          | 0.9766   | 0.9551    | 14.2963 | 5.6159   |
| 0.1095        | 22.56 | 220  | 0.1159          | 0.9767   | 0.9497    | 14.3153 | 5.8966   |
| 0.1076        | 23.59 | 230  | 0.1106          | 0.9768   | 0.9533    | 14.3267 | 5.6436   |
| 0.1006        | 24.62 | 240  | 0.1113          | 0.9769   | 0.9483    | 14.3683 | 5.6679   |
| 0.1077        | 25.64 | 250  | 0.1086          | 0.9770   | 0.9544    | 14.3843 | 5.4949   |
| 0.0966        | 26.67 | 260  | 0.1077          | 0.9770   | 0.9553    | 14.3660 | 5.5337   |
| 0.0958        | 27.69 | 270  | 0.1071          | 0.9773   | 0.9529    | 14.4405 | 5.4582   |
| 0.0984        | 28.72 | 280  | 0.1055          | 0.9772   | 0.9536    | 14.4405 | 5.4365   |
| 0.0936        | 29.74 | 290  | 0.1056          | 0.9774   | 0.9528    | 14.4634 | 5.4066   |
| 0.0958        | 30.77 | 300  | 0.1049          | 0.9772   | 0.9544    | 14.4138 | 5.4854   |
| 0.0896        | 31.79 | 310  | 0.1043          | 0.9774   | 0.9533    | 14.4593 | 5.4351   |
| 0.0973        | 32.82 | 320  | 0.1035          | 0.9774   | 0.9528    | 14.4633 | 5.4430   |
| 0.0943        | 33.85 | 330  | 0.1033          | 0.9775   | 0.9527    | 14.4809 | 5.4193   |
| 0.0956        | 34.87 | 340  | 0.1026          | 0.9774   | 0.9543    | 14.4576 | 5.4070   |
| 0.0936        | 35.9  | 350  | 0.1031          | 0.9775   | 0.9531    | 14.4827 | 5.4137   |
| 0.0937        | 36.92 | 360  | 0.1028          | 0.9773   | 0.9551    | 14.4420 | 5.4084   |
| 0.0952        | 37.95 | 370  | 0.1023          | 0.9775   | 0.9541    | 14.4809 | 5.3769   |
| 0.0952        | 38.97 | 380  | 0.1023          | 0.9776   | 0.9525    | 14.5086 | 5.3839   |
| 0.0948        | 40.0  | 390  | 0.1020          | 0.9774   | 0.9546    | 14.4667 | 5.3800   |
| 0.0931        | 41.03 | 400  | 0.1020          | 0.9776   | 0.9534    | 14.5043 | 5.3728   |
| 0.0906        | 42.05 | 410  | 0.1023          | 0.9774   | 0.9544    | 14.4771 | 5.3773   |
| 0.0974        | 43.08 | 420  | 0.1019          | 0.9776   | 0.9536    | 14.5024 | 5.3718   |
| 0.0908        | 44.1  | 430  | 0.1025          | 0.9776   | 0.9536    | 14.4995 | 5.3730   |
| 0.0935        | 45.13 | 440  | 0.1024          | 0.9775   | 0.9537    | 14.4978 | 5.3715   |
| 0.0927        | 46.15 | 450  | 0.1017          | 0.9776   | 0.9531    | 14.5040 | 5.3749   |


### Framework versions

- Transformers 4.27.4
- Pytorch 2.0.0+cu118
- Datasets 2.11.0
- Tokenizers 0.13.3