File size: 1,271 Bytes
b04e418 7ecd7bb efbc2c6 7ecd7bb 0e8a4cb fdc8a4b a0216c6 fb607ba 7ecd7bb 80fbb12 7ecd7bb cc6124f 7ecd7bb efbc2c6 cc6124f c0a286b efbc2c6 7ecd7bb eb9468e 7ecd7bb eb9468e 7ecd7bb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
---
language:
- en
metrics:
- accuracy
pipeline_tag: image-text-to-text
base_model:
- naver-clova-ix/donut-base-finetuned-cord-v2
tags:
- logistics
- document-parsing
---
๐๏ธ This is a FYP project topic on document parsing of ๐ logistics ๐ shipping documents for system integration.
- https://huggingface.co/uartimcs/donut-booking-extract/blob/main/FYP.pdf
Latest update on the version of modules used to continue run the program because there is no recent update for the donut pretrained model.
**My use case:**
Extract common key datafields from shipping documents generated from ten different shipping lines.
**Repo & Datasets**
- donut.zip (Original Donut Repo + Labelled Booking Dummy Datasets with JSONL files + Config Files)
- sample-image-to-play.zip (Excess dummy datasets used to play and test the model)
https://huggingface.co/spaces/uartimcs/donut-booking-gradio
**Colab Notebooks**
- donut-booking-train.ipynb (Train the model in Colab using T4 TPU / A100 GPU environment)
- donut-booking-run.ipynb (Run the model in Colab using gradio using T4 TPU / A100 GPU environment)
**Size of dataset**
Follow the CORD-v2 dataset ratio:
- train: 800 (80 pics x 10 classes)
- validation: 100 (10 pics x 10 classes)
- test: 100 (10 pics x 10 classes)
|