Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,20 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- la
|
6 |
+
base_model:
|
7 |
+
- google/mt5-small
|
8 |
+
---
|
9 |
+
Demonstration of fine-tuning of mt5-small for C17th English (and Latin) legal depositions
|
10 |
+
Uses mt5-small, which is trained on the mC4 common crawal dataset containing 101 languages, including some Latin
|
11 |
+
mt5-small is the smallest of five variants of mt5 (small; base; large; XL; XXL)
|
12 |
+
fine-tuned with text to text pairs of raw-HTR from C17th English High Court of Admiralty depositions
|
13 |
+
|
14 |
+
A series of fine-tuned mt5-small models will be created with ascending version numbers
|
15 |
+
|
16 |
+
Fine-tuning experiments will include:
|
17 |
+
|
18 |
+
* Using 1000 lines of raw-HTR paired with 1000 lines of hand corrected Ground Truth
|
19 |
+
* Using 2000 lines of raw-HTR paired with 1000 lines of hand corrected Ground Truth
|
20 |
+
* Using 1000 and 2000 lines of synthetic raw-HTR paired with 1000 lines of handcorrected Ground Truth
|