jordimas commited on
Commit
cb17685
1 Parent(s): 5b9a3fd
Files changed (1) hide show
  1. README.md +57 -3
README.md CHANGED
@@ -1,3 +1,57 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ca
4
+ - es
5
+
6
+ tags:
7
+ - translation
8
+
9
+ library_name: opennmt
10
+ license: mit
11
+ metrics:
12
+ - bleu
13
+
14
+ inference: false
15
+ ---
16
+
17
+ ### Introduction
18
+
19
+ Catalan - Spanish translation model for OpenNMT. The models are quantified for low latency.
20
+
21
+ ### Usage
22
+
23
+ Install the necessary dependencies:
24
+
25
+
26
+ ```bash
27
+ pip3 install ctranslate2 pyonmttok
28
+ ```
29
+
30
+
31
+ Simple tokenization & translation using Python:
32
+
33
+
34
+ ```python
35
+ import ctranslate2
36
+ import pyonmttok
37
+ from huggingface_hub import snapshot_download
38
+ model_dir = snapshot_download(repo_id="softcatala/translate-cat-fra", revision="main")
39
+
40
+ tokenizer=pyonmttok.Tokenizer(mode="none", sp_model_path = model_dir + "/sp_m.model")
41
+ tokenized=tokenizer.tokenize("Hola amics")
42
+
43
+ translator = ctranslate2.Translator(model_dir)
44
+ translated = translator.translate_batch([tokenized[0]])
45
+ print(tokenizer.detokenize(translated[0][0]['tokens']))
46
+ ```
47
+
48
+ ## Benchmarks
49
+
50
+ | testset | BLEU |
51
+ |---------------------------------------|-------|
52
+ | test dataset (from train/dev/test) | 87.5 |
53
+ | Flores200 dataset | 24.2 |
54
+
55
+ ## Additional information
56
+ * https://github.com/Softcatala/nmt-models
57
+ * https://github.com/Softcatala/parallel-catalan-corpus