VictorChew commited on
Commit
8110f25
·
verified ·
1 Parent(s): 8233220

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +115 -0
  2. config.json +1 -1
README.md ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+ <h1>StructEqTable-Deploy: A High-efficiency Open-source Toolkit for Table-to-Latex Transformation</h1>
3
+
4
+
5
+ [[ Related Paper ]](https://arxiv.org/abs/2406.11633) [[ Website ]](https://unimodal4reasoning.github.io/DocGenome_page/) [[ Dataset (Google Drive)]](https://drive.google.com/drive/folders/1OIhnuQdIjuSSDc_QL2nP4NwugVDgtItD) [[ Dataset (Hugging Face) ]](https://huggingface.co/datasets/U4R/DocGenome/tree/main)
6
+
7
+ [[Models 🤗(Hugging Face)]](https://huggingface.co/U4R/StructTable-base/tree/main)
8
+
9
+
10
+ </div>
11
+
12
+ Welcome to the official repository of StructEqTable-Deploy, a solution that converts images of Table into LaTeX, powered by scalable data from [DocGenome benchmark](https://unimodal4reasoning.github.io/DocGenome_page/).
13
+
14
+
15
+ ## Overview
16
+ Table is an effective way to represent structured data in scientific publications, financial statements, invoices, web pages, and many other scenarios. Extracting tabular data from a visual table image and performing the downstream reasoning tasks according to the extracted data is challenging, mainly due to that tables often present complicated column and row headers with spanning cell operation. To address these challenges, we present TableX, a large-scale multi-modal table benchmark extracted from [DocGenome benchmark](https://unimodal4reasoning.github.io/DocGenome_page/) for table pre-training, comprising more than 2 million high-quality Image-LaTeX pair data covering 156 disciplinary classes. Besides, benefiting from such large-scale data, we train an end-to-end model, StructEqTable, which provides the capability to precisely obtain the corresponding LaTeX description from a visual table image and perform multiple table-related reasoning tasks, including structural extraction and question answering, broadening its application scope and potential.
17
+
18
+ ## Changelog
19
+ Tips: Current version of StructEqTable is able to process table images from scientific documents such as arXiv, Scihub papers. Times New Roman And Songti(宋体) are main fonts used in table image, other fonts may decrease the accuracy of the model's output.
20
+ - [2024/8/08] 🔥 We have released the TensorRT accelerated version, which only takes about 1 second for most images on GPU A100. Please follow the tutorial to install the environment and compile the model weights.
21
+ - [2024/7/30] We have released the first version of StructEqTable.
22
+
23
+ ## TODO
24
+
25
+ - [x] Release inference code and checkpoints of StructEqTable.
26
+ - [x] Support Chinese version of StructEqTable.
27
+ - [x] Accelerated version of StructEqTable using TensorRT-LLM.
28
+ - [ ] Expand more domains of table image to improve the model's general capabilities.
29
+ - [ ] Release our table pre-training and fine-tuning code
30
+
31
+ ## Efficient Inference
32
+ Our model now supports TensorRT-LLM deployment, achieving a 10x or more speedup in during inference.
33
+ Please refer to [GETTING_STARTED.md](docs/GETTING_STARTED.md) to learn how to depoly.
34
+
35
+ ## Installation
36
+ ``` bash
37
+ conda create -n structeqtable python>=3.10
38
+ conda activate structeqtable
39
+
40
+ # Install from Source code (Suggested)
41
+ git clone https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git
42
+ cd StructEqTable-Deploy
43
+ python setup develop
44
+
45
+ # or Install from Github repo
46
+ pip install "git+https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git"
47
+
48
+ # or Install from PyPI
49
+ pip install struct-eqtable==0.1.0
50
+ ```
51
+
52
+ ## Quick Demo
53
+ - Run the demo/demo.py
54
+ ```shell script
55
+ cd tools/demo
56
+
57
+ python demo.py \
58
+ --image_path ./demo.png \
59
+ --ckpt_path ${CKPT_PATH} \
60
+ --output_format latex
61
+ ```
62
+
63
+ - HTML or Markdown format output
64
+
65
+ Our model output Latex format code by default.
66
+ If you want to get other format like HTML or Markdown,
67
+ `pypandoc` support convert latex format code into HTML and Markdown format for simple table (table has no merge cell ).
68
+
69
+ ```shell script
70
+ sudo apt install pandoc
71
+ pip install pypandoc
72
+
73
+ cd tools/demo
74
+
75
+ python demo.py \
76
+ --image_path ./demo.png \
77
+ --ckpt_path ${CKPT_PATH} \
78
+ --output_format html markdown
79
+ ```
80
+
81
+
82
+ - Visualization Results
83
+ - The input data are sampled from SciHub domain.
84
+
85
+ ![](docs/demo_1.png)
86
+
87
+ ![](docs/demo_2.png)
88
+
89
+
90
+ ## Acknowledgements
91
+ - [DocGenome](https://github.com/UniModal4Reasoning/DocGenome). An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models.
92
+ - [ChartVLM](https://github.com/UniModal4Reasoning/ChartVLM). A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning.
93
+ - [Pix2Struct](https://github.com/google-research/pix2struct). Screenshot Parsing as Pretraining for Visual Language Understanding.
94
+ - [UniMERNet](https://github.com/opendatalab/UniMERNet). A Universal Network for Real-World Mathematical Expression Recognition.
95
+ - [Donut](https://huggingface.co/naver-clova-ix/donut-base). The UniMERNet's Transformer Encoder-Decoder are referenced from Donut.
96
+ - [Nougat](https://github.com/facebookresearch/nougat). The tokenizer uses Nougat.
97
+ - [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). Model inference acceleration uses TensorRT-LLM.
98
+
99
+
100
+ ## License
101
+ StructEqTable is released under the [Apache License 2.0](LICENSE)
102
+
103
+ ## Citation
104
+ If you find our models / code / papers useful in your research, please consider giving ⭐ and citations 📝, thx :)
105
+ ```bibtex
106
+ @article{xia2024docgenome,
107
+ title={DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models},
108
+ author={Xia, Renqiu and Mao, Song and Yan, Xiangchao and Zhou, Hongbin and Zhang, Bo and Peng, Haoyang and Pi, Jiahao and Fu, Daocheng and Wu, Wenjie and Ye, Hancheng and others},
109
+ journal={arXiv preprint arXiv:2406.11633},
110
+ year={2024}
111
+ }
112
+ ```
113
+
114
+ ## Contact Us
115
+ If you encounter any issues or have questions, please feel free to contact us via [email protected].
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "/cpfs01/user/zhouhongbin/code/StructEqTable-deepspeed/ckpt/pretrained/pix2struct-base-zh",
3
  "architectures": [
4
  "Pix2StructForConditionalGeneration"
5
  ],
 
1
  {
2
+ "_name_or_path": "ckpts/StructTable-base",
3
  "architectures": [
4
  "Pix2StructForConditionalGeneration"
5
  ],