uartimcs commited on
Commit
7ecd7bb
·
verified ·
1 Parent(s): c0a286b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -7
README.md CHANGED
@@ -3,22 +3,31 @@ language:
3
  - en
4
  metrics:
5
  - accuracy
6
- pipeline_tag: image-to-text
7
  base_model:
8
  - naver-clova-ix/donut-base-finetuned-cord-v2
 
 
 
9
  ---
10
  This is a FYP project for document parsing of logistics shipping documents for system integration.
11
 
12
- Latest update on the version of modules used to continue run the program.
13
 
14
- Use case:
15
- Extract key datafields from shipping documents generated from ten different shipping lines.
16
 
17
- Repo & Datasets
18
  - donut.zip (Original Donut Repo + Labelled Booking Dummy Datasets with JSONL files + Config Files)
19
  - sample-image-to-play.zip (Excess dummy datasets used to play and test the model)
20
  https://huggingface.co/spaces/uartimcs/donut-booking-gradio
21
 
22
- Colab Notebooks
23
  - donut-booking-train.ipynb (Train the model in Colab using T4 TPU / A100 GPU environemnt)
24
- - donut-booking-run.ipynb (Run the model in Colab using gradio using T4 TPU / A100 GPU environemnt)
 
 
 
 
 
 
 
3
  - en
4
  metrics:
5
  - accuracy
6
+ pipeline_tag: image-text-to-text
7
  base_model:
8
  - naver-clova-ix/donut-base-finetuned-cord-v2
9
+ tags:
10
+ - logistics
11
+ - document-parsing
12
  ---
13
  This is a FYP project for document parsing of logistics shipping documents for system integration.
14
 
15
+ Latest update on the version of modules used to continue run the program because there is no recent update for the donut pretrained model.
16
 
17
+ **My use case:**
18
+ Extract common key datafields from shipping documents generated from ten different shipping lines.
19
 
20
+ **Repo & Datasets**
21
  - donut.zip (Original Donut Repo + Labelled Booking Dummy Datasets with JSONL files + Config Files)
22
  - sample-image-to-play.zip (Excess dummy datasets used to play and test the model)
23
  https://huggingface.co/spaces/uartimcs/donut-booking-gradio
24
 
25
+ **Colab Notebooks**
26
  - donut-booking-train.ipynb (Train the model in Colab using T4 TPU / A100 GPU environemnt)
27
+ - donut-booking-run.ipynb (Run the model in Colab using gradio using T4 TPU / A100 GPU environemnt)
28
+
29
+ **Size of dataset**
30
+ Follow the CORD-v2 dataset ratio:
31
+ - train: 800 (80 pics x 10 classes)
32
+ - validation: 100 (10 pics x 10 classes)
33
+ - test: 100 (10 pics x 10 classes)