priyank-m commited on
Commit
01f884d
·
1 Parent(s): 6d7d747

updated tag

Browse files
Files changed (1) hide show
  1. README.md +2 -9
README.md CHANGED
@@ -7,6 +7,7 @@ tags:
7
  - Image-to-Text
8
  - OCR
9
  - Image-Captioning
 
10
  datasets:
11
  - priyank-m/text_recognition_en_zh_clean
12
  metrics:
@@ -36,12 +37,4 @@ Notes and observations:
36
  12. Streaming dataset might be another good option if the dataset size were to increase any further.
37
  13. Free GPU on colab seem not enough for this experiment, as keeping two models in GPU and training forces to keep batch size small and also the free GPUs (T4) are not fast enough.
38
  14. A very important data cleaning step was to just check if the sample image and text can be converted to the input format expected by the model, the text should be non-empty value when converted back from the input IDs to text (some characters are not identified by the tokenizer and get converted to special token and we usually skip the special tokens when converting input IDs back to text) as it is required to be non-empty while doing the CER calculation.
39
- 15. Resuming model training was taking almost 1 or sometimes 2 hours in just skipping the batches, to avoid this wastage one possible solution would be to shuffle the training dataset before starting the training and then avoid the skipping of batches. This would be particularly useful when we increse the dataset size further.
40
-
41
-
42
-
43
-
44
-
45
-
46
-
47
-
 
7
  - Image-to-Text
8
  - OCR
9
  - Image-Captioning
10
+ - Text-Recognition
11
  datasets:
12
  - priyank-m/text_recognition_en_zh_clean
13
  metrics:
 
37
  12. Streaming dataset might be another good option if the dataset size were to increase any further.
38
  13. Free GPU on colab seem not enough for this experiment, as keeping two models in GPU and training forces to keep batch size small and also the free GPUs (T4) are not fast enough.
39
  14. A very important data cleaning step was to just check if the sample image and text can be converted to the input format expected by the model, the text should be non-empty value when converted back from the input IDs to text (some characters are not identified by the tokenizer and get converted to special token and we usually skip the special tokens when converting input IDs back to text) as it is required to be non-empty while doing the CER calculation.
40
+ 15. Resuming model training was taking almost 1 or sometimes 2 hours in just skipping the batches, to avoid this wastage one possible solution would be to shuffle the training dataset before starting the training and then avoid the skipping of batches. This would be particularly useful when we increse the dataset size further.