masakhane
/

africomet-qe-stl-1.1

Model card Files Files and versions Community

jwang214 commited on Oct 1, 2024

Commit

c7f2297

·

verified ·

1 Parent(s): 1879c3a

Update README.md

Files changed (1) hide show

README.md +137 -3

README.md CHANGED Viewed

@@ -1,3 +1,137 @@
----
-license: apache-2.0
----

+---
+pipeline_tag: translation
+language:
+  - multilingual
+  - en
+  - am
+  - ar
+  - so
+  - sw
+  - pt
+  - af
+  - fr
+  - zu
+  - mg
+  - ha
+  - sn
+  - arz
+  - ny
+  - ig
+  - xh
+  - yo
+  - st
+  - rw
+  - tn
+  - ti
+  - ts
+  - om
+  - run
+  - nso
+  - ee
+  - ln
+  - tw
+  - pcm
+  - gaa
+  - loz
+  - lg
+  - guw
+  - bem
+  - efi
+  - lue
+  - lua
+  - toi
+  - ve
+  - tum
+  - tll
+  - iso
+  - kqn
+  - zne
+  - umb
+  - mos
+  - tiv
+  - lu
+  - ff
+  - kwy
+  - bci
+  - rnd
+  - luo
+  - wal
+  - ss
+  - lun
+  - wo
+  - nyk
+  - kj
+  - ki
+  - fon
+  - bm
+  - cjk
+  - din
+  - dyu
+  - kab
+  - kam
+  - kbp
+  - kr
+  - kmb
+  - kg
+  - nus
+  - sg
+  - taq
+  - tzm
+  - nqo
+license: apache-2.0
+---
+This is an improved version of [AfriCOMET-QE-STL (quality estimation single task)](https://github.com/masakhane-io/africomet) evaluation model: It receives a source sentence, and a translation, and returns a score that reflects the quality of the translation compared to the source.
+Different from the original AfriCOMET-QE-STL, this QE model is based on an improved African enhanced encoder, [afro-xlmr-large-76L](https://huggingface.co/Davlan/afro-xlmr-large-76L), which leads better performance on quality estimation of African-related machine translation, verified in WMT 2024 Metrics Shared Task.
+# Paper
+[AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages](https://arxiv.org/abs/2311.09828) (Wang et al., arXiv 2023)
+# License
+Apache-2.0
+# Usage (AfriCOMET)
+Using this model requires unbabel-comet to be installed:
+```bash
+pip install --upgrade pip  # ensures that pip is current
+pip install unbabel-comet
+```
+Then you can use it through comet CLI:
+```bash
+comet-score -s {source-inputs}.txt -t {translation-outputs}.txt --model masakhane/africomet-qe-stl
+```
+Or using Python:
+```python
+from comet import download_model, load_from_checkpoint
+model_path = download_model("masakhane/africomet-qe-stl")
+model = load_from_checkpoint(model_path)
+data = [
+    {
+        "src": "Nadal sàkọọ́lẹ̀ ìforígbárí o ní àmì méje sóódo pẹ̀lú ilẹ̀ Canada.",
+        "mt": "Nadal's head to head record against the Canadian is 7–2.",
+    },
+    {
+        "src": "Laipe yi o padanu si Raoniki ni ere Sisi Brisbeni.",
+        "mt": "He recently lost against Raonic in the Brisbane Open.",
+    }
+]
+model_output = model.predict(data, batch_size=8, gpus=1)
+print (model_output)
+```
+# Intended uses
+Our model is intented to be used for **MT quality estimation**.
+Given a source sentence and a translation outputs a single score between 0 and 1 where 1 represents a perfect translation.