asif00 commited on
Commit
b24deac
·
verified ·
1 Parent(s): abfb9d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -4,12 +4,17 @@ language:
4
  - bn
5
  metrics:
6
  - bleu
 
 
 
7
  library_name: transformers
8
  pipeline_tag: text2text-generation
 
 
9
  ---
10
- # Project Overview: Bengali Text Correction
11
 
12
- The goal of this project was to develop a software model that could fix grammatical and syntax errors in Bengali text. The approach was similar to how a language translator works, where the incorrect sentence is transformed into a correct one. We fine tune a pertained model, namely [mBart50] with a [dataset] of 1.M samples for 6500 steps and achieve a result of 0.44 bleu Score.
13
 
14
  ## Initial Testing:
15
 
@@ -67,12 +72,13 @@ correct_bengali_sentence = tokenizer.decode(outputs[0], skip_special_tokens=True
67
  # আপনি কেমন আছেন?
68
  ```
69
 
 
70
 
71
  #### Important note: You need to make sure if have used `use_safetensors =True` parameter during loading the model.
72
 
73
  # General issues faced during the entire journey:
74
 
75
- - Issue: The system is not printing any evaluation functions.
76
  Solution: The GPU that I am training on doesn't support FP16/BF16 precision. Commenting out `fp16 =True` in the Seq2SeqTrainingArguments solved the issue.
77
 
78
  - Issue: Training on TPU crashes on both Colab and Kaggle.
@@ -85,5 +91,6 @@ The model is clearly overfitting, and we can reduce that. My best guess is that
85
  I'm also planning to run a 4-bit quantization on the same model to see how it performs against the base model. It should be a fun experiment.
86
 
87
  ## Resources and References:
 
88
  [Dataset Source](https://github.com/hishab-nlp/BNSECData)
89
  [Model Documentation and Troubleshooting](https://huggingface.co/docs/transformers/model_doc/mbart)
 
4
  - bn
5
  metrics:
6
  - bleu
7
+ - cer
8
+ - wer
9
+ - meteor
10
  library_name: transformers
11
  pipeline_tag: text2text-generation
12
+ tags:
13
+ - text-generation-inference
14
  ---
15
+ # Bengali Text Correction Overview:
16
 
17
+ The goal of this project was to develop a software model that could fix grammatical and syntax errors in Bengali text. The approach was similar to how a language translator works, where the incorrect sentence is transformed into a correct one. We fine tune a pertained model, namely [mBart50](https://huggingface.co/facebook/mbart-large-50) with a [dataset](https://github.com/hishab-nlp/BNSECData) of 1.3 M samples for 6500 steps and achieve a score of `{BLEU: 0.443, CER:0.159, WER:0.406, Meteor: 0.655}`when tested on unseen data. Clone/download this [repo](https://github.com/himisir/Bengali-Sentence-Error-Correction), run the `correction.py` script and type the sentence after the prompt and you are all set.
18
 
19
  ## Initial Testing:
20
 
 
72
  # আপনি কেমন আছেন?
73
  ```
74
 
75
+ If you want to test this model from the terminal, run the `python correction.py` script and type the sentence after the prompt and you are all set. you'll need the `transformers` library to run this script. Install the `transformers` model using `pip install -q transformers[torch] -U`.
76
 
77
  #### Important note: You need to make sure if have used `use_safetensors =True` parameter during loading the model.
78
 
79
  # General issues faced during the entire journey:
80
 
81
+ - Issue: The system is not printing any evaluation function.
82
  Solution: The GPU that I am training on doesn't support FP16/BF16 precision. Commenting out `fp16 =True` in the Seq2SeqTrainingArguments solved the issue.
83
 
84
  - Issue: Training on TPU crashes on both Colab and Kaggle.
 
91
  I'm also planning to run a 4-bit quantization on the same model to see how it performs against the base model. It should be a fun experiment.
92
 
93
  ## Resources and References:
94
+
95
  [Dataset Source](https://github.com/hishab-nlp/BNSECData)
96
  [Model Documentation and Troubleshooting](https://huggingface.co/docs/transformers/model_doc/mbart)