Addaci commited on
Commit
a0cc5e5
·
verified ·
1 Parent(s): 9418357

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -3
README.md CHANGED
@@ -1,3 +1,20 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - la
6
+ base_model:
7
+ - google/mt5-small
8
+ ---
9
+ Demonstration of fine-tuning of mt5-small for C17th English (and Latin) legal depositions
10
+ Uses mt5-small, which is trained on the mC4 common crawal dataset containing 101 languages, including some Latin
11
+ mt5-small is the smallest of five variants of mt5 (small; base; large; XL; XXL)
12
+ fine-tuned with text to text pairs of raw-HTR from C17th English High Court of Admiralty depositions
13
+
14
+ A series of fine-tuned mt5-small models will be created with ascending version numbers
15
+
16
+ Fine-tuning experiments will include:
17
+
18
+ * Using 1000 lines of raw-HTR paired with 1000 lines of hand corrected Ground Truth
19
+ * Using 2000 lines of raw-HTR paired with 1000 lines of hand corrected Ground Truth
20
+ * Using 1000 and 2000 lines of synthetic raw-HTR paired with 1000 lines of handcorrected Ground Truth