asif00 commited on
Commit
7a2f281
·
verified ·
1 Parent(s): e7eabeb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -14,7 +14,7 @@ tags:
14
  ---
15
  # Bengali Sentence Error Correction
16
 
17
- The goal here is to train a model that could fix grammatical and syntax errors in Bengali text. The approach was similar to how a language translator works, where the incorrect sentence is transformed into a correct one. We fine-tune a pertained model, namely [mBart50](https://huggingface.co/facebook/mbart-large-50) with a [dataset](https://github.com/hishab-nlp/BNSECData) of 1.3 M samples for 6500 steps and achieve a score of ```BLEU: 0.443, CER:0.159, WER:0.406, Meteor: 0.655``` when tested on unseen data. Clone/download this repo, run the `correction.py` script, and type the sentence after the prompt and you are all set.
18
 
19
  ## Usage
20
 
@@ -34,8 +34,8 @@ outputs = model.generate(inputs, max_new_tokens=len(incorrect_bengali_sentence),
34
  correct_bengali_sentence = tokenizer.decode(outputs[0], skip_special_tokens=True)
35
  # আপনি কেমন আছেন?
36
  ```
37
- Example notebook can be found here: [Official Notebook](https://www.kaggle.com/code/asif00/bengali-sentence-error-correction-custom-model)
38
 
 
39
  # Model Characteristics
40
 
41
  We fine-tuned a [mBART Large 50](https://huggingface.co/facebook/mbart-large-50) with custom data. [mBART Large 50](https://huggingface.co/facebook/mbart-large-50) is a 600M parameter multilingual Sequence-to-Sequence model. It was introduced to show that multilingual translation models can be created through multilingual fine-tuning. Instead of fine-tuning in one direction, a pre-trained model is fine-tuned in many directions simultaneously. mBART-50 is created using the original mBART model and extended to add an extra 25 languages to support multilingual machine translation models of 50 languages. More about the base model can be found in [Official Documentation](https://huggingface.co/docs/transformers/model_doc/mbart)
 
14
  ---
15
  # Bengali Sentence Error Correction
16
 
17
+ The goal here is to train a model that could fix grammatical and syntax errors in Bengali text. The approach was similar to how a language translator works, where the incorrect sentence is transformed into a correct one. We fine-tune a pertained model, namely [mBart50](https://huggingface.co/facebook/mbart-large-50) with a [dataset](https://github.com/hishab-nlp/BNSECData) of 1.3 M samples for 6500 steps and achieve a score of ```BLEU: 0.443, CER:0.159, WER:0.406, Meteor: 0.655``` when tested on unseen data. Clone/download this repo, run the `correction.py` script, and type the sentence after the prompt and you are all set. Here is a live [Demo Space](https://huggingface.co/spaces/asif00/Bengali_Sentence_Error_Correction__mbart_bn_error_correction) of the finetune model in action.
18
 
19
  ## Usage
20
 
 
34
  correct_bengali_sentence = tokenizer.decode(outputs[0], skip_special_tokens=True)
35
  # আপনি কেমন আছেন?
36
  ```
 
37
 
38
+
39
  # Model Characteristics
40
 
41
  We fine-tuned a [mBART Large 50](https://huggingface.co/facebook/mbart-large-50) with custom data. [mBART Large 50](https://huggingface.co/facebook/mbart-large-50) is a 600M parameter multilingual Sequence-to-Sequence model. It was introduced to show that multilingual translation models can be created through multilingual fine-tuning. Instead of fine-tuning in one direction, a pre-trained model is fine-tuned in many directions simultaneously. mBART-50 is created using the original mBART model and extended to add an extra 25 languages to support multilingual machine translation models of 50 languages. More about the base model can be found in [Official Documentation](https://huggingface.co/docs/transformers/model_doc/mbart)