File size: 5,014 Bytes
b6193e9
 
 
 
 
 
 
 
 
 
 
 
 
2fbd37b
 
 
 
 
 
 
b6193e9
 
2fbd37b
b6193e9
 
2fbd37b
b6193e9
d260353
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b6193e9
dbf642e
b6193e9
 
 
 
 
 
 
2b54c9e
b6193e9
 
70d7ec3
b6193e9
 
 
2b54c9e
b6193e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
license: mit
language:
- la
- fr
- esp
datasets:
- CATMuS/medieval
tags:
- trocr
- image-to-text
widget:
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/caroline-1.png
  example_title: Caroline 1
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/caroline-2.png
  example_title: Caroline 2
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/print-1.png
  example_title: Print 1
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/print-2.png
  example_title: Print 2
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/print-3.png
  example_title: Print 3
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/textualis-1.png
  example_title: Textualis 1
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/textualis-2.png
  example_title: Textualis 2
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/semitextualis-1.png
  example_title: Semitextualis 1
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/semitextualis-2.png
  example_title: Semitextualis 2
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/hybrida-1.png
  example_title: Hybrida 1
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/hybrida-2.png
  example_title: Hybrida 2
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/humanistica-praegothica-semihybrida-1.png
  example_title: Humanistica Praegothica Semihybrida 1
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/humanistica-praegothica-semihybrida-2.png
  example_title: Humanistica Praegothica Semihybrida 2
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/cursiva-1.png
  example_title: Cursiva 1
- src: >-
    https://huggingface.co/medieval-data/trocr-medieval-base/resolve/main/images/cursiva-2.png
  example_title: Cursiva 2

model-index:
- name: trc-medieval-base
  results:
  - task:
      name: HTR
      type: image-to-text
    metrics:
    - name: CER
      type: CER
      value: 0.035
---

![logo](logo-base.png)

# About

CER: 0.035

This is a TrOCR model for medieval scripts in the CATMuS Dataset. The base model was [microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten).

The dataset used for training was [CATMuS](https://huggingface.co/datasets/CATMuS/medieval).

The model has not been formally tested. Preliminary examination indicates that further finetuning is needed.

Finetuning was done with finetune.py found in this repository.

# Usage

```python
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

# load image from the IAM database
url = 'https://huggingface.co/medieval-data/trocr-medieval-print/resolve/main/images/print-1.png'
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

processor = TrOCRProcessor.from_pretrained('medieval-data/trocr-medieval-base')
model = VisionEncoderDecoderModel.from_pretrained('medieval-data/trocr-medieval-base')
pixel_values = processor(images=image, return_tensors="pt").pixel_values

generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

# BibTeX entry and citation info

## TrOCR Paper

```tex
@misc{li2021trocr,
      title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models}, 
      author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei},
      year={2021},
      eprint={2109.10282},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

## CATMuS Paper

```tex
@unpublished{clerice:hal-04453952,
  TITLE = {{CATMuS Medieval: A multilingual large-scale cross-century dataset in Latin script for handwritten text recognition and beyond}},
  AUTHOR = {Cl{\'e}rice, Thibault and Pinche, Ariane and Vlachou-Efstathiou, Malamatenia and Chagu{\'e}, Alix and Camps, Jean-Baptiste and Gille-Levenson, Matthias and Brisville-Fertin, Olivier and Fischer, Franz and Gervers, Michaels and Boutreux, Agn{\`e}s and Manton, Avery and Gabay, Simon and O'Connor, Patricia and Haverals, Wouter and Kestemont, Mike and Vandyck, Caroline and Kiessling, Benjamin},
  URL = {https://inria.hal.science/hal-04453952},
  NOTE = {working paper or preprint},
  YEAR = {2024},
  MONTH = Feb,
  KEYWORDS = {Historical sources ; medieval manuscripts ; Latin scripts ; benchmarking dataset ; multilingual ; handwritten text recognition},
  PDF = {https://inria.hal.science/hal-04453952/file/ICDAR24___CATMUS_Medieval-1.pdf},
  HAL_ID = {hal-04453952},
  HAL_VERSION = {v1},
}
```