bandjoun-cameroon
Collection
Host assets related to my experiments around the use of LLMs to enable auto translation between ghomala (bandjoun native language) and french/english
•
4 items
•
Updated
Translates sentences from French to Ghomala, native language of Bandjoun, a cameroonian village.
Example:
from transformers import AutoTokenizer, AutoModelForCausalLM
MAX_TOKENS = 256
tokenizer = AutoTokenizer.from_pretrained("stfotso/deepseek-R1-qween-french-ghomala-bandjoun-1.5B")
model = AutoModelForCausalLM.from_pretrained("stfotso/deepseek-R1-qween-french-ghomala-bandjoun-1.5B")
test_sentence = "bonjour Adam"
print(test_sentence)
system_prompt = """
1. You are a helpful specialist in linguistic, especially african language and you are required to provide the rightfull translation of a french expression into the ghomala language, the native language of bandjoun, a village of Cameroon.
2. Your ghomala translation should use correct phonetic signs.
"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Sentence (in french): vieil homme"},
{"role": "assistant", "content": "Sentence (in ghomala): bvo"},
{"role": "user", "content": f"Sentence (in french): {test_sentence}"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(inputs, max_length=MAX_TOKENS, tokenizer=tokenizer, do_sample=True, temperature=0.5, top_p=1, top_k=50, stop_strings=["Sentence (in french)"], pad_token_id=tokenizer.eos_token_id)
generated_text = tokenizer.batch_decode(outputs[:, inputs.shape[1]:])[0]
print(f'generated text: {generated_text}')
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B