Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ tags:
|
|
14 |
---
|
15 |
# Bengali Sentence Error Correction
|
16 |
|
17 |
-
The goal here is to train a model that could fix grammatical and syntax errors in Bengali text. The approach was similar to how a language translator works, where the incorrect sentence is transformed into a correct one. We fine-tune a pertained model, namely [mBart50](https://huggingface.co/facebook/mbart-large-50) with a [dataset](https://github.com/hishab-nlp/BNSECData) of 1.3 M samples for 6500 steps and achieve a score of ```BLEU: 0.443, CER:0.159, WER:0.406, Meteor: 0.655``` when tested on unseen data. Clone/download this repo, run the `correction.py` script, and type the sentence after the prompt and you are all set.
|
18 |
|
19 |
## Usage
|
20 |
|
@@ -34,8 +34,8 @@ outputs = model.generate(inputs, max_new_tokens=len(incorrect_bengali_sentence),
|
|
34 |
correct_bengali_sentence = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
35 |
# আপনি কেমন আছেন?
|
36 |
```
|
37 |
-
Example notebook can be found here: [Official Notebook](https://www.kaggle.com/code/asif00/bengali-sentence-error-correction-custom-model)
|
38 |
|
|
|
39 |
# Model Characteristics
|
40 |
|
41 |
We fine-tuned a [mBART Large 50](https://huggingface.co/facebook/mbart-large-50) with custom data. [mBART Large 50](https://huggingface.co/facebook/mbart-large-50) is a 600M parameter multilingual Sequence-to-Sequence model. It was introduced to show that multilingual translation models can be created through multilingual fine-tuning. Instead of fine-tuning in one direction, a pre-trained model is fine-tuned in many directions simultaneously. mBART-50 is created using the original mBART model and extended to add an extra 25 languages to support multilingual machine translation models of 50 languages. More about the base model can be found in [Official Documentation](https://huggingface.co/docs/transformers/model_doc/mbart)
|
|
|
14 |
---
|
15 |
# Bengali Sentence Error Correction
|
16 |
|
17 |
+
The goal here is to train a model that could fix grammatical and syntax errors in Bengali text. The approach was similar to how a language translator works, where the incorrect sentence is transformed into a correct one. We fine-tune a pertained model, namely [mBart50](https://huggingface.co/facebook/mbart-large-50) with a [dataset](https://github.com/hishab-nlp/BNSECData) of 1.3 M samples for 6500 steps and achieve a score of ```BLEU: 0.443, CER:0.159, WER:0.406, Meteor: 0.655``` when tested on unseen data. Clone/download this repo, run the `correction.py` script, and type the sentence after the prompt and you are all set. Here is a live [Demo Space](https://huggingface.co/spaces/asif00/Bengali_Sentence_Error_Correction__mbart_bn_error_correction) of the finetune model in action.
|
18 |
|
19 |
## Usage
|
20 |
|
|
|
34 |
correct_bengali_sentence = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
35 |
# আপনি কেমন আছেন?
|
36 |
```
|
|
|
37 |
|
38 |
+
|
39 |
# Model Characteristics
|
40 |
|
41 |
We fine-tuned a [mBART Large 50](https://huggingface.co/facebook/mbart-large-50) with custom data. [mBART Large 50](https://huggingface.co/facebook/mbart-large-50) is a 600M parameter multilingual Sequence-to-Sequence model. It was introduced to show that multilingual translation models can be created through multilingual fine-tuning. Instead of fine-tuning in one direction, a pre-trained model is fine-tuned in many directions simultaneously. mBART-50 is created using the original mBART model and extended to add an extra 25 languages to support multilingual machine translation models of 50 languages. More about the base model can be found in [Official Documentation](https://huggingface.co/docs/transformers/model_doc/mbart)
|