Update README.md
Browse files
README.md
CHANGED
@@ -3,22 +3,31 @@ language:
|
|
3 |
- en
|
4 |
metrics:
|
5 |
- accuracy
|
6 |
-
pipeline_tag: image-to-text
|
7 |
base_model:
|
8 |
- naver-clova-ix/donut-base-finetuned-cord-v2
|
|
|
|
|
|
|
9 |
---
|
10 |
This is a FYP project for document parsing of logistics shipping documents for system integration.
|
11 |
|
12 |
-
Latest update on the version of modules used to continue run the program.
|
13 |
|
14 |
-
|
15 |
-
Extract key datafields from shipping documents generated from ten different shipping lines.
|
16 |
|
17 |
-
Repo & Datasets
|
18 |
- donut.zip (Original Donut Repo + Labelled Booking Dummy Datasets with JSONL files + Config Files)
|
19 |
- sample-image-to-play.zip (Excess dummy datasets used to play and test the model)
|
20 |
https://huggingface.co/spaces/uartimcs/donut-booking-gradio
|
21 |
|
22 |
-
Colab Notebooks
|
23 |
- donut-booking-train.ipynb (Train the model in Colab using T4 TPU / A100 GPU environemnt)
|
24 |
-
- donut-booking-run.ipynb (Run the model in Colab using gradio using T4 TPU / A100 GPU environemnt)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
- en
|
4 |
metrics:
|
5 |
- accuracy
|
6 |
+
pipeline_tag: image-text-to-text
|
7 |
base_model:
|
8 |
- naver-clova-ix/donut-base-finetuned-cord-v2
|
9 |
+
tags:
|
10 |
+
- logistics
|
11 |
+
- document-parsing
|
12 |
---
|
13 |
This is a FYP project for document parsing of logistics shipping documents for system integration.
|
14 |
|
15 |
+
Latest update on the version of modules used to continue run the program because there is no recent update for the donut pretrained model.
|
16 |
|
17 |
+
**My use case:**
|
18 |
+
Extract common key datafields from shipping documents generated from ten different shipping lines.
|
19 |
|
20 |
+
**Repo & Datasets**
|
21 |
- donut.zip (Original Donut Repo + Labelled Booking Dummy Datasets with JSONL files + Config Files)
|
22 |
- sample-image-to-play.zip (Excess dummy datasets used to play and test the model)
|
23 |
https://huggingface.co/spaces/uartimcs/donut-booking-gradio
|
24 |
|
25 |
+
**Colab Notebooks**
|
26 |
- donut-booking-train.ipynb (Train the model in Colab using T4 TPU / A100 GPU environemnt)
|
27 |
+
- donut-booking-run.ipynb (Run the model in Colab using gradio using T4 TPU / A100 GPU environemnt)
|
28 |
+
|
29 |
+
**Size of dataset**
|
30 |
+
Follow the CORD-v2 dataset ratio:
|
31 |
+
- train: 800 (80 pics x 10 classes)
|
32 |
+
- validation: 100 (10 pics x 10 classes)
|
33 |
+
- test: 100 (10 pics x 10 classes)
|