translate-to-any-language

Sleeping

App Files Files Community

MihaiHuggingFace commited on Oct 3, 2024

Commit

6d5dae6

verified ·

1 Parent(s): 0dcd084

Update app.py

Browse files

Files changed (1) hide show

app.py +24 -25

app.py CHANGED Viewed

@@ -9,9 +9,23 @@ tokenizer = AutoTokenizer.from_pretrained("facebook/m2m100_418M")
 LANG_CODES = {
     "English":"en",
-    "toki pona":"tl"
 }
 def translate(text, src_lang, tgt_lang, candidates:int):
     """
     Translate the text from source lang to target lang
@@ -43,37 +57,22 @@ def translate(text, src_lang, tgt_lang, candidates:int):
 with gr.Blocks() as app:
     markdown="""
-    # An English / toki pona Neural Machine Translation App!
-    ### toki a! 💬
-    This is an english to toki pona / toki pona to english neural machine translation app.
-    Input your text to translate, a source language and target language, and desired number of return sequences!
-    ### Grammar Regularization
-    An interesting quirk of training a many-to-many translation model is that pseudo-grammar correction
-    can be achieved by translating *from* **language A** *to* **language A**
-    Remember, this can ***approximate*** grammaticality, but it isn't always the best.
-    For example, "mi li toki e toki pona" (Source Language: toki pona & Target Language: toki pona) will result in:
-    - ['mi toki e toki pona.', 'mi toki pona.', 'mi toki e toki pona']
-    - (Thus, the ungrammatical "li" is dropped)
     ### Model and Data
-    This app utilizes a fine-tuned version of Facebook/Meta AI's M2M100 418M param model.
-    By leveraging the pretrained weights of the massively multilingual M2M100 model,
-    we can jumpstart our transfer learning to accomplish machine translation for toki pona!
-    The model was fine-tuned on the English/toki pona bitexts found at [https://tatoeba.org/](https://tatoeba.org/)
-    ### This app is a work in progress and obviously not all translations will be perfect.
-    In addition to parameter quantity and the hyper-parameters used while training,
-    the *quality of data* found on Tatoeba directly influences the perfomance of projects like this!
-    If you wish to contribute, please add high quality and diverse translations to Tatoeba!
     """
     with gr.Row():
@@ -82,7 +81,7 @@ with gr.Blocks() as app:
             input_text = gr.components.Textbox(label="Input Text", value="Toad (Pit Crew) is a fun character you can try in Mario Kart Tour! Wow!")
             source_lang = gr.components.Dropdown(label="Source Language", value="English", choices=list(LANG_CODES.keys()))
             target_lang = gr.components.Dropdown(label="Target Language", value="toki pona", choices=list(LANG_CODES.keys()))
-            return_seqs = gr.Slider(label="Number of return sequences", value=3, minimum=1, maximum=24, step=1)
             inputs=[input_text, source_lang, target_lang, return_seqs]
             outputs = gr.Textbox()

 LANG_CODES = {
     "English":"en",
+    "Toki Pona":"tl"
+    "Romanian":"ro"
 }
+if tgt == tl and src == en:
+    model = AutoModelForSeq2SeqLM.from_pretrained("Jayyydyyy/m2m100_418m_tokipona").to(device)
+else if tgt == en and src == tl:
+    model = AutoModelForSeq2SeqLM.from_pretrained("Jayyydyyy/m2m100_418m_tokipona").to(device)
+else if tgt == en and src == en:
+    model = AutoModelForSeq2SeqLM.from_pretrained("Jayyydyyy/m2m100_418m_tokipona").to(device)
+else if tgt == tl and src == tl:
+    model = AutoModelForSeq2SeqLM.from_pretrained("Jayyydyyy/m2m100_418m_tokipona").to(device)
+else if tgt == en and src == ro:
+    model = AutoModelForSeq2SeqLM.from_pretrained("facebook/m2m100_418M").to(device)
+else if tgt == ro and src == en:
+    model = AutoModelForSeq2SeqLM.from_pretrained("facebook/m2m100_418M").to(device)
 def translate(text, src_lang, tgt_lang, candidates:int):
     """
     Translate the text from source lang to target lang
 with gr.Blocks() as app:
     markdown="""
+    # Translate any text to ANY language!
+    ### Bună! 💬
+    This is an english to any language / any language to english neural machine translation app.
+    Input your text to translate, a source language and target language, and desired number of return sequences!
+    Right now, this only supports 3 languages. I will add more later! So stay tuned!
     ### Model and Data
+    This app utilizes BOTH a fine-tuned version of Facebook/Meta AI's M2M100 418M param model for Toki Pona and the original for other languages.
+    The Toki Pona variant of the model was fine-tuned on the English/toki pona bitexts found at [https://tatoeba.org/](https://tatoeba.org/)
+    ### This app is a machine and not all translations will be perfect.
     """
     with gr.Row():
             input_text = gr.components.Textbox(label="Input Text", value="Toad (Pit Crew) is a fun character you can try in Mario Kart Tour! Wow!")
             source_lang = gr.components.Dropdown(label="Source Language", value="English", choices=list(LANG_CODES.keys()))
             target_lang = gr.components.Dropdown(label="Target Language", value="toki pona", choices=list(LANG_CODES.keys()))
+            return_seqs = gr.Slider(label="Number of return sequences", value=3, minimum=1, maximum=128, step=1)
             inputs=[input_text, source_lang, target_lang, return_seqs]
             outputs = gr.Textbox()